From felix.yang at huawei.com Mon Apr 1 01:19:16 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 1 Apr 2019 01:19:16 +0000 Subject: [aarch64-port-dev ] RFR: 8221658: aarch64: add necessary predicate for ubfx patterns In-Reply-To: <130fbe62-4fac-5a8d-aade-74e340459e23@redhat.com> References: <130fbe62-4fac-5a8d-aade-74e340459e23@redhat.com> Message-ID: The patch adds the following three constraints for 'rshift' and 'mask' operands: 1. 0 <= rshift <=31/63 2. mask != 0 3. rshift + width <= 32/64 (width = exact_log2(mask+1)) Constraint 3 needs to be implemented by adding a predicate as we are checking both 'rshift' and 'mask' operands. Do you want me to implement constraint 1 & 2 using a match operand? Thanks, Felix > > On 3/30/19 12:58 AM, Yangfei (Felix) wrote: > > Please review this patch adding necessary predicate for ubfx patterns in > aarch64.ad. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8221658 > > Webrev: http://cr.openjdk.java.net/~fyang/8221658/webrev.00 > > > > Currently, this issue is only reproduced with an aarch64 8u jdk. > > Although it is not reproduced with aarch64 jdk11 or newer versions, it's > better for them to have this fix. > > Jtreg tested with aarch64 jdk8u & jdk13 fastdebug build. Also passed the > private fuzz test. > > Can't this be done by using a match operand? > From aph at redhat.com Mon Apr 1 09:15:57 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 1 Apr 2019 10:15:57 +0100 Subject: [aarch64-port-dev ] RFR: 8221658: aarch64: add necessary predicate for ubfx patterns In-Reply-To: References: <130fbe62-4fac-5a8d-aade-74e340459e23@redhat.com> Message-ID: <79987274-97d6-bde0-8577-e5046864bdde@redhat.com> On 4/1/19 2:19 AM, Yangfei (Felix) wrote: > The patch adds the following three constraints for 'rshift' and 'mask' operands: > > 1. 0 <= rshift <=31/63 > 2. mask != 0 > 3. rshift + width <= 32/64 (width = exact_log2(mask+1)) > > Constraint 3 needs to be implemented by adding a predicate as we are checking both 'rshift' and 'mask' operands. > > Do you want me to implement constraint 1 & 2 using a match operand? Yes. Please do so wherever possible. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Mon Apr 1 21:44:54 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 1 Apr 2019 14:44:54 -0700 Subject: [13] RFR(S) 8221782: [Graal] Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.jdk.vm.ci.services Message-ID: https://bugs.openjdk.java.net/browse/JDK-8221782 Recent 'Update Graal' JDK-8221341 added import jdk.vm.ci.services.Services class to HotSpotGraalRuntimeMBean.java. JVMCI module-info.java was updated accordingly [1] but src/java.base/share/lib/security/default.policy file was not. Ran failed java/lang/SecurityManager/CheckAccessClassInPackagePermissions.java test. diff -r 18547cad9ec6 src/java.base/share/lib/security/default.policy --- a/src/java.base/share/lib/security/default.policy +++ b/src/java.base/share/lib/security/default.policy @@ -160,6 +160,7 @@ grant codeBase "jrt:/jdk.internal.vm.compiler.management" { permission java.lang.RuntimePermission "accessClassInPackage.jdk.internal.vm.compiler.collections"; permission java.lang.RuntimePermission "accessClassInPackage.jdk.vm.ci.runtime"; + permission java.lang.RuntimePermission "accessClassInPackage.jdk.vm.ci.services"; permission java.lang.RuntimePermission "accessClassInPackage.org.graalvm.compiler.core.common"; permission java.lang.RuntimePermission "accessClassInPackage.org.graalvm.compiler.debug"; permission java.lang.RuntimePermission "accessClassInPackage.org.graalvm.compiler.hotspot"; -- Thanks, Vladimir [1] http://hg.openjdk.java.net/jdk/jdk/file/dd5c64326027/src/jdk.internal.vm.ci/share/classes/module-info.java#l27 From dean.long at oracle.com Mon Apr 1 21:51:41 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 1 Apr 2019 14:51:41 -0700 Subject: [13] RFR(S) 8221782: [Graal] Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.jdk.vm.ci.services In-Reply-To: References: Message-ID: Looks good to me. dl On 4/1/19 2:44 PM, Vladimir Kozlov wrote: > https://bugs.openjdk.java.net/browse/JDK-8221782 > > Recent 'Update Graal' JDK-8221341 added import > jdk.vm.ci.services.Services class to HotSpotGraalRuntimeMBean.java. > JVMCI module-info.java was updated accordingly [1] but > src/java.base/share/lib/security/default.policy file was not. > > Ran failed > java/lang/SecurityManager/CheckAccessClassInPackagePermissions.java test. > > diff -r 18547cad9ec6 src/java.base/share/lib/security/default.policy > --- a/src/java.base/share/lib/security/default.policy > +++ b/src/java.base/share/lib/security/default.policy > @@ -160,6 +160,7 @@ > ?grant codeBase "jrt:/jdk.internal.vm.compiler.management" { > ???? permission java.lang.RuntimePermission > "accessClassInPackage.jdk.internal.vm.compiler.collections"; > ???? permission java.lang.RuntimePermission > "accessClassInPackage.jdk.vm.ci.runtime"; > +??? permission java.lang.RuntimePermission > "accessClassInPackage.jdk.vm.ci.services"; > ???? permission java.lang.RuntimePermission > "accessClassInPackage.org.graalvm.compiler.core.common"; > ???? permission java.lang.RuntimePermission > "accessClassInPackage.org.graalvm.compiler.debug"; > ???? permission java.lang.RuntimePermission > "accessClassInPackage.org.graalvm.compiler.hotspot"; > From vladimir.kozlov at oracle.com Mon Apr 1 22:00:00 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 1 Apr 2019 15:00:00 -0700 Subject: [13] RFR(S) 8221782: [Graal] Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.jdk.vm.ci.services In-Reply-To: References: Message-ID: <3e399120-b5cd-d3b3-2814-75bd838e690f@oracle.com> Thank you, Dean Vladimir On 4/1/19 2:51 PM, dean.long at oracle.com wrote: > Looks good to me. > > dl > > On 4/1/19 2:44 PM, Vladimir Kozlov wrote: >> https://bugs.openjdk.java.net/browse/JDK-8221782 >> >> Recent 'Update Graal' JDK-8221341 added import jdk.vm.ci.services.Services class to HotSpotGraalRuntimeMBean.java. >> JVMCI module-info.java was updated accordingly [1] but src/java.base/share/lib/security/default.policy file was not. >> >> Ran failed java/lang/SecurityManager/CheckAccessClassInPackagePermissions.java test. >> >> diff -r 18547cad9ec6 src/java.base/share/lib/security/default.policy >> --- a/src/java.base/share/lib/security/default.policy >> +++ b/src/java.base/share/lib/security/default.policy >> @@ -160,6 +160,7 @@ >> ?grant codeBase "jrt:/jdk.internal.vm.compiler.management" { >> ???? permission java.lang.RuntimePermission "accessClassInPackage.jdk.internal.vm.compiler.collections"; >> ???? permission java.lang.RuntimePermission "accessClassInPackage.jdk.vm.ci.runtime"; >> +??? permission java.lang.RuntimePermission "accessClassInPackage.jdk.vm.ci.services"; >> ???? permission java.lang.RuntimePermission "accessClassInPackage.org.graalvm.compiler.core.common"; >> ???? permission java.lang.RuntimePermission "accessClassInPackage.org.graalvm.compiler.debug"; >> ???? permission java.lang.RuntimePermission "accessClassInPackage.org.graalvm.compiler.hotspot"; >> > From Alan.Bateman at oracle.com Tue Apr 2 06:56:27 2019 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Tue, 2 Apr 2019 07:56:27 +0100 Subject: [13] RFR(S) 8221782: [Graal] Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.jdk.vm.ci.services In-Reply-To: References: Message-ID: <52d17eea-d7f8-45f1-6277-05b97f9a4af3@oracle.com> On 01/04/2019 22:44, Vladimir Kozlov wrote: > https://bugs.openjdk.java.net/browse/JDK-8221782 > > Recent 'Update Graal' JDK-8221341 added import > jdk.vm.ci.services.Services class to HotSpotGraalRuntimeMBean.java. > JVMCI module-info.java was updated accordingly [1] but > src/java.base/share/lib/security/default.policy file was not. > > Ran failed > java/lang/SecurityManager/CheckAccessClassInPackagePermissions.java test. > > diff -r 18547cad9ec6 src/java.base/share/lib/security/default.policy > --- a/src/java.base/share/lib/security/default.policy > +++ b/src/java.base/share/lib/security/default.policy > @@ -160,6 +160,7 @@ > ?grant codeBase "jrt:/jdk.internal.vm.compiler.management" { > ???? permission java.lang.RuntimePermission > "accessClassInPackage.jdk.internal.vm.compiler.collections"; > ???? permission java.lang.RuntimePermission > "accessClassInPackage.jdk.vm.ci.runtime"; > +??? permission java.lang.RuntimePermission > "accessClassInPackage.jdk.vm.ci.services"; > ???? permission java.lang.RuntimePermission > "accessClassInPackage.org.graalvm.compiler.core.common"; > ???? permission java.lang.RuntimePermission > "accessClassInPackage.org.graalvm.compiler.debug"; > ???? permission java.lang.RuntimePermission > "accessClassInPackage.org.graalvm.compiler.hotspot"; This looks okay. -Alan From sean.mullan at oracle.com Tue Apr 2 12:54:09 2019 From: sean.mullan at oracle.com (Sean Mullan) Date: Tue, 2 Apr 2019 08:54:09 -0400 Subject: [13] RFR(S) 8221782: [Graal] Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.jdk.vm.ci.services In-Reply-To: References: Message-ID: <9c7d7965-fdb7-26b7-ff1c-8c0597592aea@oracle.com> Looks good. For future reference, can you add a link to the bug (8221341) that introduced this regression? --Sean On 4/1/19 5:44 PM, Vladimir Kozlov wrote: > https://bugs.openjdk.java.net/browse/JDK-8221782 > > Recent 'Update Graal' JDK-8221341 added import > jdk.vm.ci.services.Services class to HotSpotGraalRuntimeMBean.java. > JVMCI module-info.java was updated accordingly [1] but > src/java.base/share/lib/security/default.policy file was not. > > Ran failed > java/lang/SecurityManager/CheckAccessClassInPackagePermissions.java test. > > diff -r 18547cad9ec6 src/java.base/share/lib/security/default.policy > --- a/src/java.base/share/lib/security/default.policy > +++ b/src/java.base/share/lib/security/default.policy > @@ -160,6 +160,7 @@ > ?grant codeBase "jrt:/jdk.internal.vm.compiler.management" { > ???? permission java.lang.RuntimePermission > "accessClassInPackage.jdk.internal.vm.compiler.collections"; > ???? permission java.lang.RuntimePermission > "accessClassInPackage.jdk.vm.ci.runtime"; > +??? permission java.lang.RuntimePermission > "accessClassInPackage.jdk.vm.ci.services"; > ???? permission java.lang.RuntimePermission > "accessClassInPackage.org.graalvm.compiler.core.common"; > ???? permission java.lang.RuntimePermission > "accessClassInPackage.org.graalvm.compiler.debug"; > ???? permission java.lang.RuntimePermission > "accessClassInPackage.org.graalvm.compiler.hotspot"; > From vladimir.kozlov at oracle.com Tue Apr 2 16:04:50 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 2 Apr 2019 09:04:50 -0700 Subject: [13] RFR(S) 8221782: [Graal] Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.jdk.vm.ci.services In-Reply-To: <9c7d7965-fdb7-26b7-ff1c-8c0597592aea@oracle.com> References: <9c7d7965-fdb7-26b7-ff1c-8c0597592aea@oracle.com> Message-ID: <34696eba-53ca-a25e-ad26-48dedbd754ef@oracle.com> Thank you, Sean I added link to bug report as you suggested. Vladimir On 4/2/19 5:54 AM, Sean Mullan wrote: > Looks good. For future reference, can you add a link to the bug (8221341) that introduced this regression? > > --Sean > > On 4/1/19 5:44 PM, Vladimir Kozlov wrote: >> https://bugs.openjdk.java.net/browse/JDK-8221782 >> >> Recent 'Update Graal' JDK-8221341 added import jdk.vm.ci.services.Services class to HotSpotGraalRuntimeMBean.java. >> JVMCI module-info.java was updated accordingly [1] but src/java.base/share/lib/security/default.policy file was not. >> >> Ran failed java/lang/SecurityManager/CheckAccessClassInPackagePermissions.java test. >> >> diff -r 18547cad9ec6 src/java.base/share/lib/security/default.policy >> --- a/src/java.base/share/lib/security/default.policy >> +++ b/src/java.base/share/lib/security/default.policy >> @@ -160,6 +160,7 @@ >> ??grant codeBase "jrt:/jdk.internal.vm.compiler.management" { >> ????? permission java.lang.RuntimePermission "accessClassInPackage.jdk.internal.vm.compiler.collections"; >> ????? permission java.lang.RuntimePermission "accessClassInPackage.jdk.vm.ci.runtime"; >> +??? permission java.lang.RuntimePermission "accessClassInPackage.jdk.vm.ci.services"; >> ????? permission java.lang.RuntimePermission "accessClassInPackage.org.graalvm.compiler.core.common"; >> ????? permission java.lang.RuntimePermission "accessClassInPackage.org.graalvm.compiler.debug"; >> ????? permission java.lang.RuntimePermission "accessClassInPackage.org.graalvm.compiler.hotspot"; >> From vladimir.kozlov at oracle.com Tue Apr 2 15:43:53 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 2 Apr 2019 08:43:53 -0700 Subject: [13] RFR(S) 8221782: [Graal] Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.jdk.vm.ci.services In-Reply-To: <52d17eea-d7f8-45f1-6277-05b97f9a4af3@oracle.com> References: <52d17eea-d7f8-45f1-6277-05b97f9a4af3@oracle.com> Message-ID: <9442be6a-0444-9aba-15f3-9a7d074d35d5@oracle.com> Thank you, Alan Vladimir On 4/1/19 11:56 PM, Alan Bateman wrote: > On 01/04/2019 22:44, Vladimir Kozlov wrote: >> https://bugs.openjdk.java.net/browse/JDK-8221782 >> >> Recent 'Update Graal' JDK-8221341 added import jdk.vm.ci.services.Services class to HotSpotGraalRuntimeMBean.java. >> JVMCI module-info.java was updated accordingly [1] but src/java.base/share/lib/security/default.policy file was not. >> >> Ran failed java/lang/SecurityManager/CheckAccessClassInPackagePermissions.java test. >> >> diff -r 18547cad9ec6 src/java.base/share/lib/security/default.policy >> --- a/src/java.base/share/lib/security/default.policy >> +++ b/src/java.base/share/lib/security/default.policy >> @@ -160,6 +160,7 @@ >> ?grant codeBase "jrt:/jdk.internal.vm.compiler.management" { >> ???? permission java.lang.RuntimePermission "accessClassInPackage.jdk.internal.vm.compiler.collections"; >> ???? permission java.lang.RuntimePermission "accessClassInPackage.jdk.vm.ci.runtime"; >> +??? permission java.lang.RuntimePermission "accessClassInPackage.jdk.vm.ci.services"; >> ???? permission java.lang.RuntimePermission "accessClassInPackage.org.graalvm.compiler.core.common"; >> ???? permission java.lang.RuntimePermission "accessClassInPackage.org.graalvm.compiler.debug"; >> ???? permission java.lang.RuntimePermission "accessClassInPackage.org.graalvm.compiler.hotspot"; > This looks okay. > > -Alan > From jcbeyler at google.com Tue Apr 2 16:52:50 2019 From: jcbeyler at google.com (Jean Christophe Beyler) Date: Tue, 2 Apr 2019 09:52:50 -0700 Subject: RFR (S) 8221853: Data race in compile broker (set_last_compile) Message-ID: Hi all, While working on enabling Java TSAN, one non-goal is that if we let it do its work, it does thread sanitizing on the JVM. Though this is a non-goal, I saw this one pop up and wanted to know if you would like it cleaned up? Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 I'm not sure the webrev is the way you'd like to go but from what I can see: - This is benign as no one was using the data being raced - No one calls print_last_compiled, which uses data only set in set_last_compiled - Because it is debug, the whole code could be wrapped into non product builds - I did add a compile lock for both the printout and the set_last but I could make a new lock just for this code instead of using the general compile lock. Thanks and let me know, Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.ignatyev at oracle.com Tue Apr 2 20:09:09 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 2 Apr 2019 13:09:09 -0700 Subject: RFR(T/M) : 8221870 : use driver to run CtwRunner in applications/ctw tests Message-ID: <9E7CCC1C-6DF7-49B3-AA67-FC9DB3D69047@oracle.com> http://cr.openjdk.java.net/~iignatyev/8221870/webrev.00/ > 373 lines changed: 117 ins; 117 del; 139 mod Hi all, could you please review this trivial patch which updates all applications/ctw/modules tests to run CtwRunner in driver-mode? the patch also removes tests for modules which have been removed and adds tests for added modules. webrev: http://cr.openjdk.java.net/~iignatyev/8221870/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8221870 testing: :ctw[1-3] on linux-x64, windows-x64, macos-x64 and solaris-sparcv9 Thanks, -- Igor From shade at redhat.com Tue Apr 2 20:21:28 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 2 Apr 2019 22:21:28 +0200 Subject: RFR(T/M) : 8221870 : use driver to run CtwRunner in applications/ctw tests In-Reply-To: <9E7CCC1C-6DF7-49B3-AA67-FC9DB3D69047@oracle.com> References: <9E7CCC1C-6DF7-49B3-AA67-FC9DB3D69047@oracle.com> Message-ID: <9e7cdc06-1c68-96e5-612b-8e81ca983390@redhat.com> On 4/2/19 10:09 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8221870/webrev.00/ Thank you, looks good to me. It does seem to eliminate one AgentServer per TEST_JOB for me. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From igor.ignatyev at oracle.com Tue Apr 2 20:27:01 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 2 Apr 2019 13:27:01 -0700 Subject: RFR(T/M) : 8221870 : use driver to run CtwRunner in applications/ctw tests In-Reply-To: References: <9E7CCC1C-6DF7-49B3-AA67-FC9DB3D69047@oracle.com> Message-ID: <3A05763D-58A1-4D80-BD3B-E130E8DEAEA6@oracle.com> Katya, the two tests which failed in my testing are applications/ctw/modules/jdk_pack.java and jdk_jdwp_agent.java b/c the corresponding modules have 0 classes. the final webrev doesn't have these tests. so yes, it was kinda expected. -- Igor > On Apr 2, 2019, at 1:24 PM, Ekaterina Pavlova wrote: > > Looks good. > One question. ctw_1 tests failed in your testing. Is it expected? > > thanks, > -katya > > On 4/2/19 1:09 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev/8221870/webrev.00/ >>> 373 lines changed: 117 ins; 117 del; 139 mod >> Hi all, >> could you please review this trivial patch which updates all applications/ctw/modules tests to run CtwRunner in driver-mode? the patch also removes tests for modules which have been removed and adds tests for added modules. >> webrev: http://cr.openjdk.java.net/~iignatyev/8221870/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8221870 >> testing: :ctw[1-3] on linux-x64, windows-x64, macos-x64 and solaris-sparcv9 >> Thanks, >> -- Igor > From ekaterina.pavlova at oracle.com Tue Apr 2 20:24:09 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Tue, 2 Apr 2019 13:24:09 -0700 Subject: RFR(T/M) : 8221870 : use driver to run CtwRunner in applications/ctw tests In-Reply-To: <9E7CCC1C-6DF7-49B3-AA67-FC9DB3D69047@oracle.com> References: <9E7CCC1C-6DF7-49B3-AA67-FC9DB3D69047@oracle.com> Message-ID: Looks good. One question. ctw_1 tests failed in your testing. Is it expected? thanks, -katya On 4/2/19 1:09 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8221870/webrev.00/ >> 373 lines changed: 117 ins; 117 del; 139 mod > > Hi all, > > could you please review this trivial patch which updates all applications/ctw/modules tests to run CtwRunner in driver-mode? the patch also removes tests for modules which have been removed and adds tests for added modules. > > webrev: http://cr.openjdk.java.net/~iignatyev/8221870/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8221870 > testing: :ctw[1-3] on linux-x64, windows-x64, macos-x64 and solaris-sparcv9 > > Thanks, > -- Igor > From ekaterina.pavlova at oracle.com Tue Apr 2 20:29:22 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Tue, 2 Apr 2019 13:29:22 -0700 Subject: RFR(T/M) : 8221870 : use driver to run CtwRunner in applications/ctw tests In-Reply-To: <3A05763D-58A1-4D80-BD3B-E130E8DEAEA6@oracle.com> References: <9E7CCC1C-6DF7-49B3-AA67-FC9DB3D69047@oracle.com> <3A05763D-58A1-4D80-BD3B-E130E8DEAEA6@oracle.com> Message-ID: <29337bfd-1a8c-c263-7159-437de9fb8121@oracle.com> thanks for explanations Igor. -katya On 4/2/19 1:27 PM, Igor Ignatyev wrote: > Katya, > > the two tests which failed in my testing are applications/ctw/modules/jdk_pack.java and jdk_jdwp_agent.java b/c the corresponding modules have 0 classes. the final webrev doesn't have these tests. so yes, it was kinda expected. > > -- Igor > > >> On Apr 2, 2019, at 1:24 PM, Ekaterina Pavlova wrote: >> >> Looks good. >> One question. ctw_1 tests failed in your testing. Is it expected? >> >> thanks, >> -katya >> >> On 4/2/19 1:09 PM, Igor Ignatyev wrote: >>> http://cr.openjdk.java.net/~iignatyev/8221870/webrev.00/ >>>> 373 lines changed: 117 ins; 117 del; 139 mod >>> Hi all, >>> could you please review this trivial patch which updates all applications/ctw/modules tests to run CtwRunner in driver-mode? the patch also removes tests for modules which have been removed and adds tests for added modules. >>> webrev: http://cr.openjdk.java.net/~iignatyev/8221870/webrev.00/ >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8221870 >>> testing: :ctw[1-3] on linux-x64, windows-x64, macos-x64 and solaris-sparcv9 >>> Thanks, >>> -- Igor >> > From igor.ignatyev at oracle.com Tue Apr 2 20:39:43 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 2 Apr 2019 13:39:43 -0700 Subject: RFR(T/M) : 8221870 : use driver to run CtwRunner in applications/ctw tests In-Reply-To: <9e7cdc06-1c68-96e5-612b-8e81ca983390@redhat.com> References: <9E7CCC1C-6DF7-49B3-AA67-FC9DB3D69047@oracle.com> <9e7cdc06-1c68-96e5-612b-8e81ca983390@redhat.com> Message-ID: <190504DA-293D-4231-B3CB-4EE17FFF7D81@oracle.com> Aleksey, thanks for the review, pushed. -- Igor > On Apr 2, 2019, at 1:21 PM, Aleksey Shipilev wrote: > > On 4/2/19 10:09 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev/8221870/webrev.00/ > > Thank you, looks good to me. It does seem to eliminate one AgentServer per TEST_JOB for me. > > -Aleksey > From rkennke at redhat.com Tue Apr 2 21:15:44 2019 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 2 Apr 2019 23:15:44 +0200 Subject: RFR: JDK-8221766: Load-reference barriers for Shenandoah In-Reply-To: References: Message-ID: This was meant to go to hotspot-compiler-dev not compiler-dev. Looping in the correct ML. Sorry. Roman > (I am cross-posting this to build-dev and compiler-dev because this > contains some (trivial-ish) shared build and C2 changes. The C2 changes > are almost all reversals of Shenandoah-specific paths that have been > introduced in initial Shenandoah push.) > > I would like to propose that we switch to what we came to call 'load > reference barrier' as new barrier scheme for Shenandoah GC. > > The main difference is that instead of ensuring correct invariant when > we store anything into the heap (e.g. read-barrier before reads, > write-barrier before writes, plus a bunch of other stuff), we ensure the > strong invariance on objects when they get loaded, by employing what is > currently our write-barrier. > > The reason why I'm proposing it is: > - simpler barrier interface > - easier to get good performance out of it > ==> good for upcoming Graal (sup)port > - reduced maintenance burden (I intend to backport it all the way) > > This has a number of advantages: > - Strong invariant means it's a lot easier to reason about the state of > GC and objects > - Much simpler barrier interface. Infact, a lot of stuff that we added > to barrier interfaces after JDK11 will now become unused: no need for > barriers on primitives, no need for object equality barriers, no need > for resolve barriers, etc. Also, some C2 stuff that we added for > Shenandoah can now be removed again. (Those are what comprise most > shared C2 changes.) > - Optimization is much easier: we currently put barriers 'down low' > close to their uses (which might be inside a hot loop), and then work > hard to optimize barriers upwards, e.g. out of loops. By using > load-ref-barriers, we would place them at the outermost site already. > Look how much code is removed from shenandoahSupport.cpp! > - No more need for object equals barriers. > - No more need for 'resolve' barriers. > - All barriers are now conditional, which opens up opportunity for > further optimization later on. > - we can re-enable the fast JNI getfield stuff > - we no longer need the nmethod initializer that initializes embedded > oops to to-space > - We no longer have the problem to use two registers for 'the same' > value (pre- and post-barrier). > > The 'only' optimizations that we do in C2 are: > - Look upwards and see if barrier input indicates we don't actually need > the barrier. Would be the case for: constants, nulls, method parameters, > etc (anything that is not like a load). Even though we insert barriers > after loads, you'd be surprised to see how many loads actually disappear. > - Look downwards to check uses of the barrier. If it doesn't feed into > anything that requires a barrier, we can remove it. > > Performance doesn't seem to be negatively impacted at all. Some > benchmarks benefit positively from it. > > Testing: Testing: hotspot_gc_shenandoah, SPECjvm2008, SPECjbb2015, all > of them many times. This patch has baked in shenandoah/jdk for 1.5 > months, undergone our rigorous CI, received various bug-fixes, we have > had a close look at the generated code to verify it is sane. jdk/submit > job expected good before push. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8221766 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.00/ > > Can I please get reviews for this change? > > Roman > > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From vladimir.kozlov at oracle.com Tue Apr 2 22:17:02 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 2 Apr 2019 15:17:02 -0700 Subject: RFR: JDK-8221766: Load-reference barriers for Shenandoah In-Reply-To: References: Message-ID: This is nice cleanup :) 4294 lines changed: 977 ins; 2841 del; 476 mod First is general question. I don't understand why you need (diagnostic) ShenandoahLoadRefBarrier flag if it is new behavior and you can't use old one because you removed it. I am definitely missing something here. Thank you for thinking about Graal: > ==> good for upcoming Graal (sup)port opto/loopnode.cpp new is_Phi check was added. Please, explain. I don't see other issues in C2 code. Regards, Vladimir On 4/2/19 2:12 PM, Roman Kennke wrote: > (I am cross-posting this to build-dev and compiler-dev because this > contains some (trivial-ish) shared build and C2 changes. The C2 changes > are almost all reversals of Shenandoah-specific paths that have been > introduced in initial Shenandoah push.) > > I would like to propose that we switch to what we came to call 'load > reference barrier' as new barrier scheme for Shenandoah GC. > > The main difference is that instead of ensuring correct invariant when > we store anything into the heap (e.g. read-barrier before reads, > write-barrier before writes, plus a bunch of other stuff), we ensure the > strong invariance on objects when they get loaded, by employing what is > currently our write-barrier. > > The reason why I'm proposing it is: > - simpler barrier interface > - easier to get good performance out of it > ==> good for upcoming Graal (sup)port > - reduced maintenance burden (I intend to backport it all the way) > > This has a number of advantages: > - Strong invariant means it's a lot easier to reason about the state of > GC and objects > - Much simpler barrier interface. Infact, a lot of stuff that we added > to barrier interfaces after JDK11 will now become unused: no need for > barriers on primitives, no need for object equality barriers, no need > for resolve barriers, etc. Also, some C2 stuff that we added for > Shenandoah can now be removed again. (Those are what comprise most > shared C2 changes.) > - Optimization is much easier: we currently put barriers 'down low' > close to their uses (which might be inside a hot loop), and then work > hard to optimize barriers upwards, e.g. out of loops. By using > load-ref-barriers, we would place them at the outermost site already. > Look how much code is removed from shenandoahSupport.cpp! > - No more need for object equals barriers. > - No more need for 'resolve' barriers. > - All barriers are now conditional, which opens up opportunity for > further optimization later on. > - we can re-enable the fast JNI getfield stuff > - we no longer need the nmethod initializer that initializes embedded > oops to to-space > - We no longer have the problem to use two registers for 'the same' > value (pre- and post-barrier). > > The 'only' optimizations that we do in C2 are: > - Look upwards and see if barrier input indicates we don't actually need > the barrier. Would be the case for: constants, nulls, method parameters, > etc (anything that is not like a load). Even though we insert barriers > after loads, you'd be surprised to see how many loads actually disappear. > - Look downwards to check uses of the barrier. If it doesn't feed into > anything that requires a barrier, we can remove it. > > Performance doesn't seem to be negatively impacted at all. Some > benchmarks benefit positively from it. > > Testing: Testing: hotspot_gc_shenandoah, SPECjvm2008, SPECjbb2015, all > of them many times. This patch has baked in shenandoah/jdk for 1.5 > months, undergone our rigorous CI, received various bug-fixes, we have > had a close look at the generated code to verify it is sane. jdk/submit > job expected good before push. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8221766 > Webrev: > http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.00/ > > Can I please get reviews for this change? > > Roman > > From rkennke at redhat.com Tue Apr 2 22:41:00 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 3 Apr 2019 00:41:00 +0200 Subject: RFR: JDK-8221766: Load-reference barriers for Shenandoah In-Reply-To: References: Message-ID: <2702cb33-4065-d917-d046-41d590a20b93@redhat.com> Hi Vladimir, > This is nice cleanup :) > > 4294 lines changed: 977 ins; 2841 del; 476 mod Yeah, right? :-) > First is general question. I don't understand why you need (diagnostic) > ShenandoahLoadRefBarrier flag if it is new behavior and you can't use > old one because you removed it. I am definitely missing something here. This is added for the same purpose that we had e.g. +/-ShenandoahWriteBarrier before: in order to selectively disable the barrier generation, for testing and diagnostics. > Thank you for thinking about Graal: > > >??? ==> good for upcoming Graal (sup)port :-) > opto/loopnode.cpp new is_Phi check was added. Please, explain. I'm not sure. I believe Roland did this. I'll let him comment on it. > I don't see other issues in C2 code. :-) Thanks, Roman > Regards, > Vladimir > > On 4/2/19 2:12 PM, Roman Kennke wrote: >> (I am cross-posting this to build-dev and compiler-dev because this >> contains some (trivial-ish) shared build and C2 changes. The C2 changes >> are almost all reversals of Shenandoah-specific paths that have been >> introduced in initial Shenandoah push.) >> >> I would like to propose that we switch to what we came to call 'load >> reference barrier' as new barrier scheme for Shenandoah GC. >> >> The main difference is that instead of ensuring correct invariant when >> we store anything into the heap (e.g. read-barrier before reads, >> write-barrier before writes, plus a bunch of other stuff), we ensure the >> strong invariance on objects when they get loaded, by employing what is >> currently our write-barrier. >> >> The reason why I'm proposing it is: >> - simpler barrier interface >> - easier to get good performance out of it >> ?? ==> good for upcoming Graal (sup)port >> - reduced maintenance burden (I intend to backport it all the way) >> >> This has a number of advantages: >> - Strong invariant means it's a lot easier to reason about the state of >> GC and objects >> - Much simpler barrier interface. Infact, a lot of stuff that we added >> to barrier interfaces after JDK11 will now become unused: no need for >> barriers on primitives, no need for object equality barriers, no need >> for resolve barriers, etc. Also, some C2 stuff that we added for >> Shenandoah can now be removed again. (Those are what comprise most >> shared C2 changes.) >> - Optimization is much easier: we currently put barriers 'down low' >> close to their uses (which might be inside a hot loop), and then work >> hard to optimize barriers upwards, e.g. out of loops. By using >> load-ref-barriers, we would place them at the outermost site already. >> Look how much code is removed from shenandoahSupport.cpp! >> - No more need for object equals barriers. >> - No more need for 'resolve' barriers. >> - All barriers are now conditional, which opens up opportunity for >> further optimization later on. >> - we can re-enable the fast JNI getfield stuff >> - we no longer need the nmethod initializer that initializes embedded >> oops to to-space >> - We no longer have the problem to use two registers for 'the same' >> value (pre- and post-barrier). >> >> The 'only' optimizations that we do in C2 are: >> - Look upwards and see if barrier input indicates we don't actually need >> the barrier. Would be the case for: constants, nulls, method parameters, >> etc (anything that is not like a load). Even though we insert barriers >> after loads, you'd be surprised to see how many loads actually disappear. >> - Look downwards to check uses of the barrier. If it doesn't feed into >> anything that requires a barrier, we can remove it. >> >> Performance doesn't seem to be negatively impacted at all. Some >> benchmarks benefit positively from it. >> >> Testing: Testing: hotspot_gc_shenandoah, SPECjvm2008, SPECjbb2015, all >> of them many times. This patch has baked in shenandoah/jdk for 1.5 >> months, undergone our rigorous CI, received various bug-fixes, we have >> had a close look at the generated code to verify it is sane. jdk/submit >> job expected good before push. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8221766 >> Webrev: >> http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.00/ >> >> Can I please get reviews for this change? >> >> Roman >> >> From vladimir.x.ivanov at oracle.com Wed Apr 3 00:02:11 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 2 Apr 2019 17:02:11 -0700 Subject: [13] RFR: 8202414: Unsafe write after primitive array creation may result in array length change In-Reply-To: <959abf54-d1da-95ee-9cf6-6c6d8ec5e4a1@oracle.com> References: <7e900022-4e16-2ab9-1f4d-89e1510e2646@oracle.com> <392c665f-869c-29af-4fc5-e6f844820846@oracle.com> <3db5d7ab-ad99-310b-e891-fc36d25da338@oracle.com> <7b03a213-7fee-a87f-b48d-250662e730ef@oracle.com> <959abf54-d1da-95ee-9cf6-6c6d8ec5e4a1@oracle.com> Message-ID: <18115aa8-edaa-31b9-02a6-06721d9fbfc9@oracle.com> > I agree that we need better regression tests if we go this route. Do we > have enough regression tests for the is_unaligned_access() case to > enable that optimization first? I haven't done any extensive research, but I believe existing tests provide poor coverage for initializing stores. The tests I encountered under test/hotspot/jtreg/compiler/unsafe/ don't look applicable here. Best regards, Vladimir Ivanov >> Forbidding mismatched accesses in InitializeNode::can_capture_store >> (both marked as such and based on actual offset) looks like a safer >> fix to me: it keeps InitializeNode::complete_stores() exposed only to >> well-behaved accessed. >> >> How much do we lose by not capturing mismatched/unaligned initialized >> stores? Does it worth optimizing for it? >> > > It does seem like it would be rare that optimizing it would make a > difference, unless we had a microbenchmark that focuses on it. > > dl > >> Best regards, >> Vladimir Ivanov >> >>> On 3/27/19 10:12 AM, Vladimir Ivanov wrote: >>>> First, I'd like to note that it's a good practice to include problem >>>> & root cause descriptions in the request. Otherwise, reviewers have >>>> to find that information themselves which complicates review process. >>>> >>>> (In this particular case, I found some analysis from the submitter >>>> [1] in the bug only after carefully reading through it.) >>>> >>>> On 27/03/2019 06:44, Rahul Raghavan wrote: >>>>> Hi, >>>>> >>>>> Thank you Vladimir. >>>>> >>>>> Yes, tried following fix. >>>>> (needed to add checks to avoid SIGFPE crash). >>>>> >>>>> +? int size_in_bytes = st->memory_size(); >>>>> +? if ((size_in_bytes != 0) && (get_store_offset(st, phase) % >>>>> size_in_bytes) != 0) { >>>>> +??? return FAIL; >>>>> +? } >>>>> >>>>> >>>>> - http://cr.openjdk.java.net/~rraghavan/8202414/webrev.02/ >>>> >>>> It seems the problem is due to mismatched unsafe store being >>>> captured as a initializing one. Why not check for it explicitly? >>>> >>>> ?? if (st->is_unaligned_access() || st->is_mismatched_access()) { >>>> ???? return FAIL; >>>> ?? } >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> >>>> [1] >>>> >>>> For your convenience, our analysis shows the problem may relate to >>>> array InitializeNode logic. >>>> It `capture_store` the the memory write of Unsafe.putInt. >>>> Since the putInt occupied offset range [17, 21] from the array pointer, >>>> then it decided to `clear_memory` of offset range [16, 17] of the >>>> array pointer. >>>> This range actually cannot pass the assert "assert((end_offset % >>>> BytesPerInt) == 0, "odd end offset")". >>>> While in jvm product mode, without the assert, the compiler falsely >>>> calculated to clear range [13, 17], >>>> which will clear the three most significant bytes of the `length` of >>>> this array. >>>> >>>> >>>>> >>>>> Confirmed no issues with testing for this revised fix. >>>>> >>>>> Thanks, >>>>> Rahul >>>>> >>>>> On 26/03/19 1:03 AM, Vladimir Kozlov wrote: >>>>>> >>>>>> Suggestion: >>>>>> >>>>>> if ((get_store_offset(st, phase) % st->memory_size()) != 0) { >>>>>> >>>>>> Vladimir >>>>>> >>>>>> >>> > From felix.yang at huawei.com Wed Apr 3 01:15:28 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 3 Apr 2019 01:15:28 +0000 Subject: [aarch64-port-dev ] RFR: 8221658: aarch64: add necessary predicate for ubfx patterns In-Reply-To: <79987274-97d6-bde0-8577-e5046864bdde@redhat.com> References: <130fbe62-4fac-5a8d-aade-74e340459e23@redhat.com> <79987274-97d6-bde0-8577-e5046864bdde@redhat.com> Message-ID: Updated webrev: http://cr.openjdk.java.net/~fyang/8221658/webrev.01 Is this one better? Still some constraints can be removed from predicate by using a match operand for other patterns, say ubfizwI/ubfiz. I can propose a separate patch to handle that if you want. Thanks, Felix > > On 4/1/19 2:19 AM, Yangfei (Felix) wrote: > > The patch adds the following three constraints for 'rshift' and 'mask' operands: > > > > 1. 0 <= rshift <=31/63 > > 2. mask != 0 > > 3. rshift + width <= 32/64 (width = exact_log2(mask+1)) > > > > Constraint 3 needs to be implemented by adding a predicate as we are > checking both 'rshift' and 'mask' operands. > > > > Do you want me to implement constraint 1 & 2 using a match operand? > > Yes. Please do so wherever possible. From aph at redhat.com Wed Apr 3 10:29:51 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 3 Apr 2019 11:29:51 +0100 Subject: [aarch64-port-dev ] RFR: 8221658: aarch64: add necessary predicate for ubfx patterns In-Reply-To: References: <130fbe62-4fac-5a8d-aade-74e340459e23@redhat.com> <79987274-97d6-bde0-8577-e5046864bdde@redhat.com> Message-ID: On 4/3/19 2:15 AM, Yangfei (Felix) wrote: > Updated webrev: http://cr.openjdk.java.net/~fyang/8221658/webrev.01 > Is this one better? It still doesn't look quite right. According to the Java Language Standard, shifts are all taken mod 32 or mod 64. Therefore, it is not possible for the shift size to be out of range, surely? Have you got a reproducer for this failure that I can see? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From robbin.ehn at oracle.com Wed Apr 3 10:36:02 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 3 Apr 2019 12:36:02 +0200 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls Message-ID: Hi all, please review. If a JavaThread in native both gets selected for java-stack sampling and a handshake both VMThread and JFR sampler will call make_walkable. There is an assert making sure we do not do this twice. Since we only store _last_Java_pc from sp, we can allow it be executed multiple times for both aarch64/x64 which have the assert. The asserts comes from: 8161598: Kitchensink fails: assert(nm->insts_contains(original_pc)) failed: original PC must be in nmethod/CompiledMethod They seems not to be directly connected to the bug. Issue: https://bugs.openjdk.java.net/browse/JDK-8218147 Webrev: http://cr.openjdk.java.net/~rehn/8218147/webrev/ Compiled aarch64, x64 passes t1-3. Thanks, Robbin From tobias.hartmann at oracle.com Wed Apr 3 11:17:31 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 3 Apr 2019 13:17:31 +0200 Subject: RFR (S) 8221853: Data race in compile broker (set_last_compile) In-Reply-To: References: Message-ID: <4168dea9-8964-3b44-af2b-e30cc1ffe144@oracle.com> Hi Jc, I would actually prefer to just remove this unused code if no one objects. Best regards, Tobias On 02.04.19 18:52, Jean Christophe Beyler wrote: > Hi all, > > While working on enabling Java TSAN, one non-goal is that if we let it do its work, it does thread > sanitizing on the JVM. Though this is a non-goal, I saw this one pop up and wanted to know if you > would like it cleaned up? > > Webrev:?http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.00/ > Bug:?https://bugs.openjdk.java.net/browse/JDK-8221853 > > I'm not sure the webrev is the way you'd like to go but from what I can see: > > ? ?- This is benign as no one was using the data being raced > ? ?- No one calls print_last_compiled, which uses data only set in set_last_compiled > ? ?- Because it is debug, the whole code could be wrapped into non product builds > ? ?- I did add a compile lock for both the printout and the set_last but I could make a new lock > just for this code instead of using the general compile lock. > > Thanks and let me know, > Jc From rwestrel at redhat.com Wed Apr 3 11:18:55 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 03 Apr 2019 13:18:55 +0200 Subject: RFR: JDK-8221766: Load-reference barriers for Shenandoah In-Reply-To: References: Message-ID: <87h8bfqqu8.fsf@redhat.com> Hi Vladimir, > opto/loopnode.cpp new is_Phi check was added. Please, explain. When we expand barriers, if we find a null check nearby we move the barrier close to the null check so there's a better chance of converting it to an implicit null check. That happens as part of a pass of loop opts. I think that's where that change comes from but I don't remember the details. In general we need the control that's assigned to a load to not be too conservative. Anyway, that change is not required for correctness. But it looks reasonable to me. Roland. From lutz.schmidt at sap.com Wed Apr 3 13:18:21 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Wed, 3 Apr 2019 13:18:21 +0000 Subject: RFR(XS): 8221482: Initialize VMRegImpl::regName[] earlier to prevent assert during PrintStubCode In-Reply-To: <0dfb7424-3595-4709-b6ed-33db4bdfc34d@oracle.com> References: <30549BAC-6DA9-45D5-A4AC-32BC243E7F2B@sap.com> <0dfb7424-3595-4709-b6ed-33db4bdfc34d@oracle.com> Message-ID: Hi Vladimir, thanks so much for your clarifying comments. And sorry for reacting with such delay. I was distracted by other tasks. I'll go ahead now and push - after rebasing, of course. Thanks, Lutz ?On 29.03.19, 19:53, "Vladimir Kozlov" wrote: On 3/28/19 8:59 PM, David Holmes wrote: > Hi Lutz, > > cc'd the compiler team > > On 28/03/2019 9:14 pm, Schmidt, Lutz wrote: >> Dear Community, >> >> may I please request reviews for this tiny change. The purpose is to initialize the regName[] >> array earlier during VM init. > > I can see that will fix the assertion for you, but then begs the question as to whether > VMRegImpl::set_regName itself has any initialization dependencies. The answer to that is not obvious > to me. I _think_ the Register setup only depends on C++ static initialization. > > Hopefully someone from compiler team can confirm this change is in fact safe. The array is static: http://hg.openjdk.java.net/jdk/jdk/file/6a1406c718ec/src/hotspot/share/code/vmreg.cpp#l37 And register's names are encoded: http://hg.openjdk.java.net/jdk/jdk/file/6a1406c718ec/src/hotspot/cpu/x86/register_x86.cpp#l41 There are no initialization dependencies. Vladimir > > Thanks, > David > >> Bug: https://bugs.openjdk.java.net/browse/JDK-8221482 >> Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8221482.01/ >> >> Submit-repo tests pending... >> >> Thanks, >> Lutz >> >> From dms at samersoff.net Wed Apr 3 13:21:48 2019 From: dms at samersoff.net (Dmitry Samersoff) Date: Wed, 3 Apr 2019 16:21:48 +0300 Subject: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered In-Reply-To: <62b9e1c3-7c76-c3a2-0a8e-4e3ce4f79d9b@arm.com> References: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> <4a09e8b7-9990-aa66-0afb-bf4e41cab831@arm.com> <79118967-c5b6-ca5c-7c6b-4adb80a4ed60@arm.com> <62b9e1c3-7c76-c3a2-0a8e-4e3ce4f79d9b@arm.com> Message-ID: <22b74488-fe8c-c00e-2c00-4bc8c5ef7dd5@samersoff.net> Hello Nick, Glad to see this cleanup. 3528 __ cmp(rscratch1, zr); // Sets flags for result 3529 __ cbnz(rscratch1, cont); Should __ br(Assembler::NE, cont); be at line 3529 instead of cbnz ? -Dmitry On 22.01.2019 12:10, Nick Gasson (Arm Technology China) wrote: > Hi, > > On 21/01/2019 20:27, Andrew Haley wrote: >> >> OK, if that's your position: you're writing the patch. Using cmpxhg >> everywhere will make that rather twisted code much easier to read. >> > > Please see the updated webrev to use cmpxchg in both the lock and unlock > functions: > > http://cr.openjdk.java.net/~ngasson/8217368/webrev.1/ > > Also includes Derek's cleanup suggestions (although some of them are not > applicable now). > > Testing I've done on this: > > * Ran jtreg with assertions enabled (+UseLSE) > > * Ran jcstress with both +UseLSE and -UseLSE > > * Ran the JMH LockUnlock benchmarks with -UseBiasedLocking to check for > performance regressions. > > The directory below contains the the generated assembly from each webrev > and current hg tip for this simple method: > > http://cr.openjdk.java.net/~ngasson/8217368/generated/ > > private Object obj = new Object(); > public int x; > > private void incX() { > synchronized (obj) { > x++; > } > } > > The output of webrev.1 looks OK to me. Any other suggestions of things > to test? > > Thanks, > Nick > From jcbeyler at google.com Wed Apr 3 16:05:28 2019 From: jcbeyler at google.com (Jean Christophe Beyler) Date: Wed, 3 Apr 2019 09:05:28 -0700 Subject: RFR (S) 8221853: Data race in compile broker (set_last_compile) In-Reply-To: <4168dea9-8964-3b44-af2b-e30cc1ffe144@oracle.com> References: <4168dea9-8964-3b44-af2b-e30cc1ffe144@oracle.com> Message-ID: Hi Tobias, Sounds good to me, here is a webrev that removes it entirely: Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.01/ Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 Let me know what you think, Jc On Wed, Apr 3, 2019 at 4:17 AM Tobias Hartmann wrote: > Hi Jc, > > I would actually prefer to just remove this unused code if no one objects. > > Best regards, > Tobias > > On 02.04.19 18:52, Jean Christophe Beyler wrote: > > Hi all, > > > > While working on enabling Java TSAN, one non-goal is that if we let it > do its work, it does thread > > sanitizing on the JVM. Though this is a non-goal, I saw this one pop up > and wanted to know if you > > would like it cleaned up? > > > > Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.00/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 > > > > I'm not sure the webrev is the way you'd like to go but from what I can > see: > > > > - This is benign as no one was using the data being raced > > - No one calls print_last_compiled, which uses data only set in > set_last_compiled > > - Because it is debug, the whole code could be wrapped into non > product builds > > - I did add a compile lock for both the printout and the set_last but > I could make a new lock > > just for this code instead of using the general compile lock. > > > > Thanks and let me know, > > Jc > -- Thanks, Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Wed Apr 3 16:37:40 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 3 Apr 2019 09:37:40 -0700 Subject: RFR (S) 8221853: Data race in compile broker (set_last_compile) In-Reply-To: References: <4168dea9-8964-3b44-af2b-e30cc1ffe144@oracle.com> Message-ID: <3eea0593-fe96-5894-05da-c0c5f5ef38d6@oracle.com> Hi Jc, I agree with removal of print_last_compiled() method and related code. But you need to keep part of set_last_compiled() code (guarded by UsePerfData) which set values of CompilerCounters. It is used. Thanks, Vladimir On 4/3/19 9:05 AM, Jean Christophe Beyler wrote: > Hi Tobias, > > Sounds good to me, here is a webrev that removes it entirely: > > Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.01/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 > > Let me know what you think, > Jc > > On Wed, Apr 3, 2019 at 4:17 AM Tobias Hartmann > wrote: > > Hi Jc, > > I would actually prefer to just remove this unused code if no one objects. > > Best regards, > Tobias > > On 02.04.19 18:52, Jean Christophe Beyler wrote: > > Hi all, > > > > While working on enabling Java TSAN, one non-goal is that if we let it do its work, it does thread > > sanitizing on the JVM. Though this is a non-goal, I saw this one pop up and wanted to know if you > > would like it cleaned up? > > > > Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.00/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 > > > > I'm not sure the webrev is the way you'd like to go but from what I can see: > > > > ? ?- This is benign as no one was using the data being raced > > ? ?- No one calls print_last_compiled, which uses data only set in set_last_compiled > > ? ?- Because it is debug, the whole code could be wrapped into non product builds > > ? ?- I did add a compile lock for both the printout and the set_last but I could make a new lock > > just for this code instead of using the general compile lock. > > > > Thanks and let me know, > > Jc > > > > -- > > Thanks, > Jc From vladimir.kozlov at oracle.com Wed Apr 3 16:45:42 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 3 Apr 2019 09:45:42 -0700 Subject: RFR: JDK-8221766: Load-reference barriers for Shenandoah In-Reply-To: <87h8bfqqu8.fsf@redhat.com> References: <87h8bfqqu8.fsf@redhat.com> Message-ID: <080b7d98-1144-8111-57b7-0c0334d9147c@oracle.com> I don't think it should be part of this cleanup. Please, file separate RFE to push this change with separate review and testing. Thanks, Vladimir On 4/3/19 4:18 AM, Roland Westrelin wrote: > > Hi Vladimir, > >> opto/loopnode.cpp new is_Phi check was added. Please, explain. > > When we expand barriers, if we find a null check nearby we move the > barrier close to the null check so there's a better chance of converting > it to an implicit null check. That happens as part of a pass of loop > opts. I think that's where that change comes from but I don't remember > the details. In general we need the control that's assigned to a load to > not be too conservative. > > Anyway, that change is not required for correctness. But it looks > reasonable to me. > > Roland. > From jcbeyler at google.com Wed Apr 3 17:13:27 2019 From: jcbeyler at google.com (Jean Christophe Beyler) Date: Wed, 3 Apr 2019 10:13:27 -0700 Subject: RFR (S) 8221853: Data race in compile broker (set_last_compile) In-Reply-To: <3eea0593-fe96-5894-05da-c0c5f5ef38d6@oracle.com> References: <4168dea9-8964-3b44-af2b-e30cc1ffe144@oracle.com> <3eea0593-fe96-5894-05da-c0c5f5ef38d6@oracle.com> Message-ID: Hi Vladimir, Sounds good to me: Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.02/ Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 I cleaned it up a bit and renamed it to "update_compile_perf_data" let me know what you think, Jc On Wed, Apr 3, 2019 at 9:37 AM Vladimir Kozlov wrote: > Hi Jc, > > I agree with removal of print_last_compiled() method and related code. > But you need to keep part of set_last_compiled() code (guarded by > UsePerfData) which set values of CompilerCounters. It > is used. > > Thanks, > Vladimir > > On 4/3/19 9:05 AM, Jean Christophe Beyler wrote: > > Hi Tobias, > > > > Sounds good to me, here is a webrev that removes it entirely: > > > > Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.01/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 > > > > Let me know what you think, > > Jc > > > > On Wed, Apr 3, 2019 at 4:17 AM Tobias Hartmann < > tobias.hartmann at oracle.com > wrote: > > > > Hi Jc, > > > > I would actually prefer to just remove this unused code if no one > objects. > > > > Best regards, > > Tobias > > > > On 02.04.19 18:52, Jean Christophe Beyler wrote: > > > Hi all, > > > > > > While working on enabling Java TSAN, one non-goal is that if we > let it do its work, it does thread > > > sanitizing on the JVM. Though this is a non-goal, I saw this one > pop up and wanted to know if you > > > would like it cleaned up? > > > > > > Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.00/ > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 > > > > > > I'm not sure the webrev is the way you'd like to go but from what > I can see: > > > > > > - This is benign as no one was using the data being raced > > > - No one calls print_last_compiled, which uses data only set > in set_last_compiled > > > - Because it is debug, the whole code could be wrapped into > non product builds > > > - I did add a compile lock for both the printout and the > set_last but I could make a new lock > > > just for this code instead of using the general compile lock. > > > > > > Thanks and let me know, > > > Jc > > > > > > > > -- > > > > Thanks, > > Jc > -- Thanks, Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkennke at redhat.com Wed Apr 3 17:13:04 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 3 Apr 2019 19:13:04 +0200 Subject: RFR: JDK-8221766: Load-reference barriers for Shenandoah In-Reply-To: <080b7d98-1144-8111-57b7-0c0334d9147c@oracle.com> References: <87h8bfqqu8.fsf@redhat.com> <080b7d98-1144-8111-57b7-0c0334d9147c@oracle.com> Message-ID: <0103f3a1-bf2e-bbcf-d21c-d186f8332921@redhat.com> > I don't think it should be part of this cleanup. Fair enough. I have run several tests today, and removing the is_Phi() call doesn't seem to negatively impact Shenandoah. Updated webrevs: Incremental: http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.01.diff/ Full: http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.01/ Ok now? Thanks, Roman > Please, file separate RFE to push this change with separate review and > testing. > > Thanks, > Vladimir > > On 4/3/19 4:18 AM, Roland Westrelin wrote: >> >> Hi Vladimir, >> >>> opto/loopnode.cpp new is_Phi check was added. Please, explain. >> >> When we expand barriers, if we find a null check nearby we move the >> barrier close to the null check so there's a better chance of converting >> it to an implicit null check. That happens as part of a pass of loop >> opts. I think that's where that change comes from but I don't remember >> the details. In general we need the control that's assigned to a load to >> not be too conservative. >> >> Anyway, that change is not required for correctness. But it looks >> reasonable to me. >> >> Roland. >> From dean.long at oracle.com Wed Apr 3 17:20:40 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 3 Apr 2019 10:20:40 -0700 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: References: Message-ID: Does the store need to be atomic? Any idea what problem the original assert was trying to catch? If it's already set, should we check that _last_Java_pc matches the new value? dl On 4/3/19 3:36 AM, Robbin Ehn wrote: > Hi all, please review. > > If a JavaThread in native both gets selected for java-stack sampling > and a > handshake both VMThread and JFR sampler will call make_walkable. There > is an > assert making sure we do not do this twice. Since we only store > _last_Java_pc > from sp, we can allow it be executed multiple times for both > aarch64/x64 which have the assert. > > The asserts comes from: > 8161598: Kitchensink fails: assert(nm->insts_contains(original_pc)) > failed: original PC must be in nmethod/CompiledMethod > > They seems not to be directly connected to the bug. > > Issue: > https://bugs.openjdk.java.net/browse/JDK-8218147 > > Webrev: > http://cr.openjdk.java.net/~rehn/8218147/webrev/ > > Compiled aarch64, x64 passes t1-3. > > Thanks, Robbin From vladimir.kozlov at oracle.com Wed Apr 3 17:26:47 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 3 Apr 2019 10:26:47 -0700 Subject: RFR (S) 8221853: Data race in compile broker (set_last_compile) In-Reply-To: References: <4168dea9-8964-3b44-af2b-e30cc1ffe144@oracle.com> <3eea0593-fe96-5894-05da-c0c5f5ef38d6@oracle.com> Message-ID: Looks good. Please, run submit testing before push. Thanks, Vladimir On 4/3/19 10:13 AM, Jean Christophe Beyler wrote: > Hi Vladimir, > > Sounds good to me: > Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.02/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 > > I cleaned it up a bit and renamed it to "update_compile_perf_data" let me?know what you think, > Jc > > > On Wed, Apr 3, 2019 at 9:37 AM Vladimir Kozlov > wrote: > > Hi Jc, > > I agree with removal of print_last_compiled() method and related code. > But you need to keep part of set_last_compiled() code (guarded by UsePerfData) which set values of CompilerCounters. It > is used. > > Thanks, > Vladimir > > On 4/3/19 9:05 AM, Jean Christophe Beyler wrote: > > Hi Tobias, > > > > Sounds good to me, here is a webrev that removes it entirely: > > > > Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.01/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 > > > > Let me know what you think, > > Jc > > > > On Wed, Apr 3, 2019 at 4:17 AM Tobias Hartmann > >> wrote: > > > >? ? ?Hi Jc, > > > >? ? ?I would actually prefer to just remove this unused code if no one objects. > > > >? ? ?Best regards, > >? ? ?Tobias > > > >? ? ?On 02.04.19 18:52, Jean Christophe Beyler wrote: > >? ? ? > Hi all, > >? ? ? > > >? ? ? > While working on enabling Java TSAN, one non-goal is that if we let it do its work, it does thread > >? ? ? > sanitizing on the JVM. Though this is a non-goal, I saw this one pop up and wanted to know if you > >? ? ? > would like it cleaned up? > >? ? ? > > >? ? ? > Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.00/ > >? ? ? > Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 > >? ? ? > > >? ? ? > I'm not sure the webrev is the way you'd like to go but from what I can see: > >? ? ? > > >? ? ? > ? ?- This is benign as no one was using the data being raced > >? ? ? > ? ?- No one calls print_last_compiled, which uses data only set in set_last_compiled > >? ? ? > ? ?- Because it is debug, the whole code could be wrapped into non product builds > >? ? ? > ? ?- I did add a compile lock for both the printout and the set_last but I could make a new lock > >? ? ? > just for this code instead of using the general compile lock. > >? ? ? > > >? ? ? > Thanks and let me know, > >? ? ? > Jc > > > > > > > > -- > > > > Thanks, > > Jc > > > > -- > > Thanks, > Jc From vladimir.kozlov at oracle.com Wed Apr 3 17:24:52 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 3 Apr 2019 10:24:52 -0700 Subject: RFR: JDK-8221766: Load-reference barriers for Shenandoah In-Reply-To: <0103f3a1-bf2e-bbcf-d21c-d186f8332921@redhat.com> References: <87h8bfqqu8.fsf@redhat.com> <080b7d98-1144-8111-57b7-0c0334d9147c@oracle.com> <0103f3a1-bf2e-bbcf-d21c-d186f8332921@redhat.com> Message-ID: <35a6ebef-f1ad-98c7-4f4c-a66552ee0ec8@oracle.com> Good (C2 part). Thanks, Vladimir On 4/3/19 10:13 AM, Roman Kennke wrote: >> I don't think it should be part of this cleanup. > > Fair enough. > I have run several tests today, and removing the is_Phi() call doesn't seem to negatively impact Shenandoah. > > Updated webrevs: > Incremental: > http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.01.diff/ > Full: > http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.01/ > > Ok now? > > Thanks, > Roman > > >> Please, file separate RFE to push this change with separate review and testing. >> >> Thanks, >> Vladimir >> >> On 4/3/19 4:18 AM, Roland Westrelin wrote: >>> >>> Hi Vladimir, >>> >>>> opto/loopnode.cpp new is_Phi check was added. Please, explain. >>> >>> When we expand barriers, if we find a null check nearby we move the >>> barrier close to the null check so there's a better chance of converting >>> it to an implicit null check. That happens as part of a pass of loop >>> opts. I think that's where that change comes from but I don't remember >>> the details. In general we need the control that's assigned to a load to >>> not be too conservative. >>> >>> Anyway, that change is not required for correctness. But it looks >>> reasonable to me. >>> >>> Roland. >>> From shade at redhat.com Wed Apr 3 17:33:33 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 3 Apr 2019 19:33:33 +0200 Subject: RFR: JDK-8221766: Load-reference barriers for Shenandoah In-Reply-To: <0103f3a1-bf2e-bbcf-d21c-d186f8332921@redhat.com> References: <87h8bfqqu8.fsf@redhat.com> <080b7d98-1144-8111-57b7-0c0334d9147c@oracle.com> <0103f3a1-bf2e-bbcf-d21c-d186f8332921@redhat.com> Message-ID: <966f7518-8f30-f978-4985-46011786d156@redhat.com> On 4/3/19 7:13 PM, Roman Kennke wrote: > Updated webrevs: > Incremental: > http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.01.diff/ > Full: > http://cr.openjdk.java.net/~rkennke/JDK-8221766/webrev.01/ Shenandoah parts look good. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From felix.yang at huawei.com Thu Apr 4 02:08:42 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Thu, 4 Apr 2019 02:08:42 +0000 Subject: [aarch64-port-dev ] RFR: 8221658: aarch64: add necessary predicate for ubfx patterns In-Reply-To: References: <130fbe62-4fac-5a8d-aade-74e340459e23@redhat.com> <79987274-97d6-bde0-8577-e5046864bdde@redhat.com> Message-ID: Comments inlined: > On 4/3/19 2:15 AM, Yangfei (Felix) wrote: > > Updated webrev: http://cr.openjdk.java.net/~fyang/8221658/webrev.01 > > Is this one better? > > It still doesn't look quite right. According to the Java Language Standard, > shifts are all taken mod 32 or mod 64. Therefore, it is not possible for > the shift size to be out of range, surely? I haven't looked into the details about how this is ensured in the JVM, but it deserves a look. One thing I noticed is that the jdk13 arm port also used a ' immU5 ' match operand as a shift: 1761 // Valid scale values for addressing modes and shifts 1762 operand immU5() %{ 1763 predicate(0 <= n->get_int() && (n->get_int() <= 31)); 1764 match(ConI); 1765 op_cost(0); 1766 1767 format %{ %} 1768 interface(CONST_INTER); 1769 %} ...... 5334 instruct addshrI_reg_imm_reg(iRegI dst, iRegI src1, immU5 src2, iRegI src3) %{ 5335 match(Set dst (AddI (URShiftI src1 src2) src3)); 5336 5337 size(4); 5338 format %{ "add_32 $dst,$src3,$src1>>>$src2\t! int" %} 5339 ins_encode %{ 5340 __ add_32($dst$$Register, $src3$$Register, AsmOperand($src1$$Register, lsr, $src2$$constant)); 5341 %} 5342 ins_pipe(ialu_reg_reg); 5343 %} > Have you got a reproducer for this failure that I can see? I have put the fuzz test case and call trace on the JBS: https://bugs.openjdk.java.net/secure/attachment/81950/fuzz-test.tar.bz2 Command line: $ java Start I can always reproduce the crash with a fastdebug or slowdebug jdk built from the latest aarch64 jdk8u-shenandoah repo. Thanks, Felix From tobias.hartmann at oracle.com Thu Apr 4 06:43:12 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 4 Apr 2019 08:43:12 +0200 Subject: RFR (S) 8221853: Data race in compile broker (set_last_compile) In-Reply-To: References: <4168dea9-8964-3b44-af2b-e30cc1ffe144@oracle.com> <3eea0593-fe96-5894-05da-c0c5f5ef38d6@oracle.com> Message-ID: Looks good to me too. Best regards, Tobias On 03.04.19 19:26, Vladimir Kozlov wrote: > Looks good. Please, run submit testing before push. > > Thanks, > Vladimir > > On 4/3/19 10:13 AM, Jean Christophe Beyler wrote: >> Hi Vladimir, >> >> Sounds good to me: >> Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.02/ >> Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 >> >> I cleaned it up a bit and renamed it to "update_compile_perf_data" let me?know what you think, >> Jc >> >> >> On Wed, Apr 3, 2019 at 9:37 AM Vladimir Kozlov > > wrote: >> >> ??? Hi Jc, >> >> ??? I agree with removal of print_last_compiled() method and related code. >> ??? But you need to keep part of set_last_compiled() code (guarded by UsePerfData) which set >> values of CompilerCounters. It >> ??? is used. >> >> ??? Thanks, >> ??? Vladimir >> >> ??? On 4/3/19 9:05 AM, Jean Christophe Beyler wrote: >> ???? > Hi Tobias, >> ???? > >> ???? > Sounds good to me, here is a webrev that removes it entirely: >> ???? > >> ???? > Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.01/ >> ???? > Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 >> ???? > >> ???? > Let me know what you think, >> ???? > Jc >> ???? > >> ???? > On Wed, Apr 3, 2019 at 4:17 AM Tobias Hartmann > >> ??? >> wrote: >> ???? > >> ???? >? ? ?Hi Jc, >> ???? > >> ???? >? ? ?I would actually prefer to just remove this unused code if no one objects. >> ???? > >> ???? >? ? ?Best regards, >> ???? >? ? ?Tobias >> ???? > >> ???? >? ? ?On 02.04.19 18:52, Jean Christophe Beyler wrote: >> ???? >? ? ? > Hi all, >> ???? >? ? ? > >> ???? >? ? ? > While working on enabling Java TSAN, one non-goal is that if we let it do its work, >> it does thread >> ???? >? ? ? > sanitizing on the JVM. Though this is a non-goal, I saw this one pop up and wanted >> to know if you >> ???? >? ? ? > would like it cleaned up? >> ???? >? ? ? > >> ???? >? ? ? > Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.00/ >> ???? >? ? ? > Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 >> ???? >? ? ? > >> ???? >? ? ? > I'm not sure the webrev is the way you'd like to go but from what I can see: >> ???? >? ? ? > >> ???? >? ? ? > ? ?- This is benign as no one was using the data being raced >> ???? >? ? ? > ? ?- No one calls print_last_compiled, which uses data only set in set_last_compiled >> ???? >? ? ? > ? ?- Because it is debug, the whole code could be wrapped into non product builds >> ???? >? ? ? > ? ?- I did add a compile lock for both the printout and the set_last but I could >> make a new lock >> ???? >? ? ? > just for this code instead of using the general compile lock. >> ???? >? ? ? > >> ???? >? ? ? > Thanks and let me know, >> ???? >? ? ? > Jc >> ???? > >> ???? > >> ???? > >> ???? > -- >> ???? > >> ???? > Thanks, >> ???? > Jc >> >> >> >> --? >> >> Thanks, >> Jc From robbin.ehn at oracle.com Thu Apr 4 10:29:10 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 4 Apr 2019 12:29:10 +0200 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: References: Message-ID: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> Hi, On 4/3/19 7:20 PM, dean.long at oracle.com wrote: > Does the store need to be atomic? Some people have started to use it for concurrently stored/read values. In this case VMThread and JFR sampler thread can execute this code at the same time when a JavaThread is in native. Since _last_Java_pc is volatile aligned word-size there is no issue, just a gesture. (to make the gesture better it should also be read with Atomic::read) Shall I remove it? > > Any idea what problem the original assert was trying to catch? No... you push it as part of your fix for 8161598 :) I do not see it related, several assert which made sense was added. > > If it's already set, should we check that _last_Java_pc matches the new value? We manually set the pc in several places, so if it's set, it's not certain that it should be the same as in last sp. I can't distinguish between the cases. Thanks, Robbin > > dl > > On 4/3/19 3:36 AM, Robbin Ehn wrote: >> Hi all, please review. >> >> If a JavaThread in native both gets selected for java-stack sampling and a >> handshake both VMThread and JFR sampler will call make_walkable. There is an >> assert making sure we do not do this twice. Since we only store _last_Java_pc >> from sp, we can allow it be executed multiple times for both aarch64/x64 which >> have the assert. >> >> The asserts comes from: >> 8161598: Kitchensink fails: assert(nm->insts_contains(original_pc)) failed: >> original PC must be in nmethod/CompiledMethod >> >> They seems not to be directly connected to the bug. >> >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8218147 >> >> Webrev: >> http://cr.openjdk.java.net/~rehn/8218147/webrev/ >> >> Compiled aarch64, x64 passes t1-3. >> >> Thanks, Robbin > From aph at redhat.com Thu Apr 4 18:11:07 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 4 Apr 2019 19:11:07 +0100 Subject: [aarch64-port-dev ] RFR: 8221658: aarch64: add necessary predicate for ubfx patterns In-Reply-To: References: <130fbe62-4fac-5a8d-aade-74e340459e23@redhat.com> <79987274-97d6-bde0-8577-e5046864bdde@redhat.com> Message-ID: <6c9e81fa-5f69-e601-8507-b82d8bf96beb@redhat.com> Try this: diff -r c763810a9bf5 src/cpu/aarch64/vm/aarch64.ad --- a/src/cpu/aarch64/vm/aarch64.ad Fri Sep 28 08:48:26 2018 +0800 +++ b/src/cpu/aarch64/vm/aarch64.ad Thu Apr 04 13:47:03 2019 -0400 @@ -12340,7 +12340,7 @@ ins_cost(INSN_COST); format %{ "ubfxw $dst, $src, $mask" %} ins_encode %{ - int rshift = $rshift$$constant; + int rshift = $rshift$$constant & 31; long mask = $mask$$constant; int width = exact_log2(mask+1); __ ubfxw(as_Register($dst$$reg), @@ -12355,7 +12355,7 @@ ins_cost(INSN_COST); format %{ "ubfx $dst, $src, $mask" %} ins_encode %{ - int rshift = $rshift$$constant; + int rshift = $rshift$$constant & 63; long mask = $mask$$constant; int width = exact_log2(mask+1); __ ubfx(as_Register($dst$$reg), @@ -12373,7 +12373,7 @@ ins_cost(INSN_COST * 2); format %{ "ubfx $dst, $src, $mask" %} ins_encode %{ - int rshift = $rshift$$constant; + int rshift = $rshift$$constant & 31; long mask = $mask$$constant; int width = exact_log2(mask+1); __ ubfx(as_Register($dst$$reg), diff -r c763810a9bf5 src/cpu/aarch64/vm/aarch64_ad.m4 --- a/src/cpu/aarch64/vm/aarch64_ad.m4 Fri Sep 28 08:48:26 2018 +0800 +++ b/src/cpu/aarch64/vm/aarch64_ad.m4 Thu Apr 04 13:47:03 2019 -0400 @@ -185,7 +185,7 @@ ins_cost(INSN_COST); format %{ "$3 $dst, $src, $mask" %} ins_encode %{ - int rshift = $rshift$$constant; + int rshift = $rshift$$constant & $4; long mask = $mask$$constant; int width = exact_log2(mask+1); __ $3(as_Register($dst$$reg), @@ -193,8 +193,8 @@ %} ins_pipe(ialu_reg_shift); %}') -BFX_INSN(I,URShift,ubfxw) -BFX_INSN(L,URShift,ubfx) +BFX_INSN(I,URShift,ubfxw,31) +BFX_INSN(L,URShift,ubfx,63) // We can use ubfx when extending an And with a mask when we know mask // is positive. We know that because immI_bitmask guarantees it. @@ -205,7 +205,7 @@ ins_cost(INSN_COST * 2); format %{ "ubfx $dst, $src, $mask" %} ins_encode %{ - int rshift = $rshift$$constant; + int rshift = $rshift$$constant & 31; long mask = $mask$$constant; int width = exact_log2(mask+1); __ ubfx(as_Register($dst$$reg), -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ekaterina.pavlova at oracle.com Thu Apr 4 19:50:38 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Thu, 4 Apr 2019 12:50:38 -0700 Subject: RFR (T/S) 8216551: GraalUnitTestLauncher should be executed as '@run driver' Message-ID: <9a0d9d58-d385-7a16-54c4-70ce3175b023@oracle.com> Hi All, GraalUnitTestLauncher doesn't do real testing but spawns a new JVM to run graal unit tests. There is no big sense to run GraalUnitTestLauncher in JDK under test and with extra JVM flags used for real testing. So, the idea was to use '@run driver' to launch GraalUnitTestLauncher. However GraalUnitTestLauncher has a code which look for jdk.internal.vm.compiler and jdk.internal.vm.ci modules and this code will not work without -XX:+EnableJVMCI. So, replacing @run main/othervm compiler.graalunit.common.GraalUnitTestLauncher to @run driver compiler.graalunit.common.GraalUnitTestLauncher doesn't not work. The current fix just removes '/othervm' so jtreg will be able to use agent VMs from a pool to run compiler.graalunit.common.GraalUnitTestLauncher. Also updated 2 problem list files to match latest Graal bugs status: test/hotspot/jtreg/ProblemList-graal.txt test/jdk/ProblemList-graal.txt Please review the changes. JBS: https://bugs.openjdk.java.net/browse/JDK-8216551 webrev: http://cr.openjdk.java.net/~epavlova//8216551/webrev.00/index.html testing: run graalunit tests in mach5 thanks, -katya From vladimir.kozlov at oracle.com Thu Apr 4 21:37:24 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 4 Apr 2019 14:37:24 -0700 Subject: RFR (T/S) 8216551: GraalUnitTestLauncher should be executed as '@run driver' In-Reply-To: <9a0d9d58-d385-7a16-54c4-70ce3175b023@oracle.com> References: <9a0d9d58-d385-7a16-54c4-70ce3175b023@oracle.com> Message-ID: Looks good. Should we also problem list tests for 8221514? Thanks, Vladimir On 4/4/19 12:50 PM, Ekaterina Pavlova wrote: > Hi All, > > > GraalUnitTestLauncher doesn't do real testing but spawns a new JVM to run graal unit tests. > There is no big sense to run GraalUnitTestLauncher in JDK under test and with extra JVM flags > used for real testing. So, the idea was to use '@run driver' to launch GraalUnitTestLauncher. > > However GraalUnitTestLauncher has a code which look for jdk.internal.vm.compiler and > jdk.internal.vm.ci modules and this code will not work without -XX:+EnableJVMCI. So, replacing > ?@run main/othervm compiler.graalunit.common.GraalUnitTestLauncher > to > ?@run driver compiler.graalunit.common.GraalUnitTestLauncher > doesn't not work. > > The current fix just removes '/othervm' so jtreg will be able to use agent VMs from a pool > to run compiler.graalunit.common.GraalUnitTestLauncher. > > Also updated 2 problem list files to match latest Graal bugs status: > ?test/hotspot/jtreg/ProblemList-graal.txt > ? test/jdk/ProblemList-graal.txt > > > Please review the changes. > > ??? JBS: https://bugs.openjdk.java.net/browse/JDK-8216551 > ?webrev: http://cr.openjdk.java.net/~epavlova//8216551/webrev.00/index.html > testing: run graalunit tests in mach5 > > thanks, > -katya From igor.ignatyev at oracle.com Thu Apr 4 21:43:01 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 4 Apr 2019 14:43:01 -0700 Subject: RFR (T/S) 8216551: GraalUnitTestLauncher should be executed as '@run driver' In-Reply-To: <9a0d9d58-d385-7a16-54c4-70ce3175b023@oracle.com> References: <9a0d9d58-d385-7a16-54c4-70ce3175b023@oracle.com> Message-ID: <87353AE7-E62F-4A28-A035-FE98832DC224@oracle.com> looks good. -- Igor > On Apr 4, 2019, at 12:50 PM, Ekaterina Pavlova wrote: > > Hi All, > > > GraalUnitTestLauncher doesn't do real testing but spawns a new JVM to run graal unit tests. > There is no big sense to run GraalUnitTestLauncher in JDK under test and with extra JVM flags > used for real testing. So, the idea was to use '@run driver' to launch GraalUnitTestLauncher. > > However GraalUnitTestLauncher has a code which look for jdk.internal.vm.compiler and > jdk.internal.vm.ci modules and this code will not work without -XX:+EnableJVMCI. So, replacing > @run main/othervm compiler.graalunit.common.GraalUnitTestLauncher > to > @run driver compiler.graalunit.common.GraalUnitTestLauncher > doesn't not work. > > The current fix just removes '/othervm' so jtreg will be able to use agent VMs from a pool > to run compiler.graalunit.common.GraalUnitTestLauncher. > > Also updated 2 problem list files to match latest Graal bugs status: > test/hotspot/jtreg/ProblemList-graal.txt > test/jdk/ProblemList-graal.txt > > > Please review the changes. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8216551 > webrev: http://cr.openjdk.java.net/~epavlova//8216551/webrev.00/index.html > testing: run graalunit tests in mach5 > > thanks, > -katya From dean.long at oracle.com Fri Apr 5 00:16:19 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 4 Apr 2019 17:16:19 -0700 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> Message-ID: <587d1afe-9f86-f801-92de-31711b711034@oracle.com> On 4/4/19 3:29 AM, Robbin Ehn wrote: > Hi, > > On 4/3/19 7:20 PM, dean.long at oracle.com wrote: >> Does the store need to be atomic? > > Some people have started to use it for concurrently stored/read values. > In this case VMThread and JFR sampler thread can execute this code at > the same time when a JavaThread is in native. > Since _last_Java_pc is volatile aligned word-size there is no issue, > just a gesture. (to make the gesture better it should also be read > with Atomic::read) > > Shall I remove it? > I would say yes, unless you want to add the Atomic::read now. >> >> Any idea what problem the original assert was trying to catch? > > No... you push it as part of your fix for 8161598 :) > I do not see it related, several assert which made sense was added. > I don't think I took into account concurrent access when I added those asserts :-) >> >> If it's already set, should we check that _last_Java_pc matches the >> new value? > > We manually set the pc in several places, so if it's set, it's not > certain that > it should be the same as in last sp. > I can't distinguish between the cases. > If we get pc from sp[-1] then it should match, but you're right, we sometimes get pc from somewhere else. dl > Thanks, Robbin > >> >> dl >> >> On 4/3/19 3:36 AM, Robbin Ehn wrote: >>> Hi all, please review. >>> >>> If a JavaThread in native both gets selected for java-stack sampling >>> and a >>> handshake both VMThread and JFR sampler will call make_walkable. >>> There is an >>> assert making sure we do not do this twice. Since we only store >>> _last_Java_pc >>> from sp, we can allow it be executed multiple times for both >>> aarch64/x64 which have the assert. >>> >>> The asserts comes from: >>> 8161598: Kitchensink fails: assert(nm->insts_contains(original_pc)) >>> failed: original PC must be in nmethod/CompiledMethod >>> >>> They seems not to be directly connected to the bug. >>> >>> Issue: >>> https://bugs.openjdk.java.net/browse/JDK-8218147 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~rehn/8218147/webrev/ >>> >>> Compiled aarch64, x64 passes t1-3. >>> >>> Thanks, Robbin >> From dean.long at oracle.com Fri Apr 5 00:26:16 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 4 Apr 2019 17:26:16 -0700 Subject: RFR (T/S) 8216551: GraalUnitTestLauncher should be executed as '@run driver' In-Reply-To: <9a0d9d58-d385-7a16-54c4-70ce3175b023@oracle.com> References: <9a0d9d58-d385-7a16-54c4-70ce3175b023@oracle.com> Message-ID: <77bab14e-8456-d557-bd95-dd199a4d1bd9@oracle.com> The timeout overrides, such as timeout=300, need to be preserved. dl On 4/4/19 12:50 PM, Ekaterina Pavlova wrote: > Hi All, > > > GraalUnitTestLauncher doesn't do real testing but spawns a new JVM to > run graal unit tests. > There is no big sense to run GraalUnitTestLauncher in JDK under test > and with extra JVM flags > used for real testing. So, the idea was to use '@run driver' to launch > GraalUnitTestLauncher. > > However GraalUnitTestLauncher has a code which look for > jdk.internal.vm.compiler and > jdk.internal.vm.ci modules and this code will not work without > -XX:+EnableJVMCI. So, replacing > ?@run main/othervm compiler.graalunit.common.GraalUnitTestLauncher > to > ?@run driver compiler.graalunit.common.GraalUnitTestLauncher > doesn't not work. > > The current fix just removes '/othervm' so jtreg will be able to use > agent VMs from a pool > to run compiler.graalunit.common.GraalUnitTestLauncher. > > Also updated 2 problem list files to match latest Graal bugs status: > ?test/hotspot/jtreg/ProblemList-graal.txt > ? test/jdk/ProblemList-graal.txt > > > Please review the changes. > > ??? JBS: https://bugs.openjdk.java.net/browse/JDK-8216551 > ?webrev: > http://cr.openjdk.java.net/~epavlova//8216551/webrev.00/index.html > testing: run graalunit tests in mach5 > > thanks, > -katya From dean.long at oracle.com Fri Apr 5 07:22:24 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 5 Apr 2019 00:22:24 -0700 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: <587d1afe-9f86-f801-92de-31711b711034@oracle.com> References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> <587d1afe-9f86-f801-92de-31711b711034@oracle.com> Message-ID: On 4/4/19 5:16 PM, dean.long at oracle.com wrote: > >>> >>> If it's already set, should we check that _last_Java_pc matches the >>> new value? >> >> We manually set the pc in several places, so if it's set, it's not >> certain that >> it should be the same as in last sp. >> I can't distinguish between the cases. >> > > If we get pc from sp[-1] then it should match, but you're right, we > sometimes get pc from somewhere else. How about if we combine the !walkable check and the capture_last_Java_pc() logic into a single method? Then we can do something like: ??? if (!walkable()) { ??????? address pc = (address)_last_Java_sp[-1]; ??????? address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL); ??????? assert(a == NULL || a == pc, "unexpected PC %p", a); ??? } dl -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbin.ehn at oracle.com Fri Apr 5 12:14:01 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Fri, 5 Apr 2019 14:14:01 +0200 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: <587d1afe-9f86-f801-92de-31711b711034@oracle.com> References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> <587d1afe-9f86-f801-92de-31711b711034@oracle.com> Message-ID: <859fd182-ab25-86cf-76db-283a0a83037f@oracle.com> Hi, > I would say yes, unless you want to add the Atomic::read now. Sure, full: http://rehn-ws.se.oracle.com/cr_mirror/8218147/v2/webrev/index.html Thanks, Robbin > >>> >>> Any idea what problem the original assert was trying to catch? >> >> No... you push it as part of your fix for 8161598 :) >> I do not see it related, several assert which made sense was added. >> > > I don't think I took into account concurrent access when I added those asserts :-) > >>> >>> If it's already set, should we check that _last_Java_pc matches the new value? >> >> We manually set the pc in several places, so if it's set, it's not certain that >> it should be the same as in last sp. >> I can't distinguish between the cases. >> > > If we get pc from sp[-1] then it should match, but you're right, we sometimes > get pc from somewhere else. > > dl > >> Thanks, Robbin >> >>> >>> dl >>> >>> On 4/3/19 3:36 AM, Robbin Ehn wrote: >>>> Hi all, please review. >>>> >>>> If a JavaThread in native both gets selected for java-stack sampling and a >>>> handshake both VMThread and JFR sampler will call make_walkable. There is an >>>> assert making sure we do not do this twice. Since we only store _last_Java_pc >>>> from sp, we can allow it be executed multiple times for both aarch64/x64 >>>> which have the assert. >>>> >>>> The asserts comes from: >>>> 8161598: Kitchensink fails: assert(nm->insts_contains(original_pc)) failed: >>>> original PC must be in nmethod/CompiledMethod >>>> >>>> They seems not to be directly connected to the bug. >>>> >>>> Issue: >>>> https://bugs.openjdk.java.net/browse/JDK-8218147 >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~rehn/8218147/webrev/ >>>> >>>> Compiled aarch64, x64 passes t1-3. >>>> >>>> Thanks, Robbin >>> > From dmitrij.pochepko at bell-sw.com Fri Apr 5 14:03:54 2019 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Fri, 5 Apr 2019 17:03:54 +0300 Subject: RFR(XS): 8221995: AARCH64: problems with CAS instructions encoding Message-ID: Hi all, please review small patch for JDK-8221995: AARCH64: problems with CAS instructions encoding webrev: http://cr.openjdk.java.net/~dpochepk/8221995/webrev/ Patch fix 3 problems: - specification allows addressing register to be SP, while current hotspot encoding implementation hits assert in this case - specification allows data register(s) to be ZR, while current hotspot encoding implementation hits assert in this case - all pair CAS instructions? (CASP*) are encoded incorrectly in bit 23, which leads to another instructions generation instead (CAS*B and CAS*H) Current code shape doesn't generate CAS* instructions using affected cases. That is why these problems wasn't found before. Testing: I generated code with CAS and CASP with and without patch. Patched version hits no asserts while using zr and sp registers. And casp* instruction now generated correctly. CR: https://bugs.openjdk.java.net/browse/JDK-8221995 Thanks, Dmitrij From robbin.ehn at oracle.com Fri Apr 5 15:43:05 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Fri, 05 Apr 2019 17:43:05 +0200 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> <587d1afe-9f86-f801-92de-31711b711034@oracle.com> Message-ID: <24A2B80F-D3D4-486A-9E29-8D48D49DF030@oracle.com> Hi Dean, Sorry, I missed this mail. Yes we can do that. Ignore my other mail, I'll update. Thanks, Robbin dean.long at oracle.com skrev: (5 april 2019 09:22:24 CEST) >On 4/4/19 5:16 PM, dean.long at oracle.com wrote: >> >>>> >>>> If it's already set, should we check that _last_Java_pc matches the > >>>> new value? >>> >>> We manually set the pc in several places, so if it's set, it's not >>> certain that >>> it should be the same as in last sp. >>> I can't distinguish between the cases. >>> >> >> If we get pc from sp[-1] then it should match, but you're right, we >> sometimes get pc from somewhere else. > >How about if we combine the !walkable check and the >capture_last_Java_pc() logic into a single method? >Then we can do something like: > > ??? if (!walkable()) { > ??????? address pc = (address)_last_Java_sp[-1]; > ??????? address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL); > ??????? assert(a == NULL || a == pc, "unexpected PC %p", a); > ??? } > >dl From ekaterina.pavlova at oracle.com Fri Apr 5 16:00:21 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Fri, 5 Apr 2019 09:00:21 -0700 Subject: RFR (T/S) 8216551: GraalUnitTestLauncher should be executed as '@run driver' In-Reply-To: References: <9a0d9d58-d385-7a16-54c4-70ce3175b023@oracle.com> Message-ID: <15ab06fc-1988-2a7a-fdfb-985803921b5a@oracle.com> On 4/4/19 2:37 PM, Vladimir Kozlov wrote: > Looks good. Should we also problem list tests for 8221514? well, these failures are seen only when Graal is executed with -Xcomp flag. We have only ProblemList-graal.txt and ProblemList-Xcomp.txt, we don't have ProblemList-Graal-Xcomp.txt. I would prefer to don't create one more problem list file, Tom also said that 8221514 is going fixed soon. thanks, -katya > Thanks, > Vladimir > > On 4/4/19 12:50 PM, Ekaterina Pavlova wrote: >> Hi All, >> >> >> GraalUnitTestLauncher doesn't do real testing but spawns a new JVM to run graal unit tests. >> There is no big sense to run GraalUnitTestLauncher in JDK under test and with extra JVM flags >> used for real testing. So, the idea was to use '@run driver' to launch GraalUnitTestLauncher. >> >> However GraalUnitTestLauncher has a code which look for jdk.internal.vm.compiler and >> jdk.internal.vm.ci modules and this code will not work without -XX:+EnableJVMCI. So, replacing >> ??@run main/othervm compiler.graalunit.common.GraalUnitTestLauncher >> to >> ??@run driver compiler.graalunit.common.GraalUnitTestLauncher >> doesn't not work. >> >> The current fix just removes '/othervm' so jtreg will be able to use agent VMs from a pool >> to run compiler.graalunit.common.GraalUnitTestLauncher. >> >> Also updated 2 problem list files to match latest Graal bugs status: >> ??test/hotspot/jtreg/ProblemList-graal.txt >> ?? test/jdk/ProblemList-graal.txt >> >> >> Please review the changes. >> >> ???? JBS: https://bugs.openjdk.java.net/browse/JDK-8216551 >> ??webrev: http://cr.openjdk.java.net/~epavlova//8216551/webrev.00/index.html >> testing: run graalunit tests in mach5 >> >> thanks, >> -katya From aph at redhat.com Fri Apr 5 16:12:38 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 5 Apr 2019 17:12:38 +0100 Subject: [aarch64-port-dev ] RFR(XS): 8221995: AARCH64: problems with CAS instructions encoding In-Reply-To: References: Message-ID: Hi, On 4/5/19 3:03 PM, Dmitrij Pochepko wrote: > please review small patch for JDK-8221995: AARCH64: problems with CAS > instructions encoding > webrev: http://cr.openjdk.java.net/~dpochepk/8221995/webrev/ > > Patch fix 3 problems: > - specification allows addressing register to be SP, while current > hotspot encoding implementation hits assert in this case > > - specification allows data register(s) to be ZR, while current hotspot > encoding implementation hits assert in this case > - all pair CAS instructions (CASP*) are encoded incorrectly in bit 23, > which leads to another instructions generation instead (CAS*B and CAS*H) > > Current code shape doesn't generate CAS* instructions using affected > cases. That is why these problems wasn't found before. > > Testing: > > I generated code with CAS and CASP with and without patch. Patched > version hits no asserts while using zr and sp registers. And casp* > instruction now generated correctly. > CR: https://bugs.openjdk.java.net/browse/JDK-8221995 It's hard to find a use case for CAS into the stack. I guess that ZR as a compare register is reasonable enough, but I think we never generate it. I don't think we use CASP at all. So, I think this makes no difference to the VM, but it's a decent cleanup. OK. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dmitrij.pochepko at bell-sw.com Fri Apr 5 16:15:25 2019 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Fri, 5 Apr 2019 19:15:25 +0300 Subject: [aarch64-port-dev ] RFR(XS): 8221995: AARCH64: problems with CAS instructions encoding In-Reply-To: References: Message-ID: Thank you for review. On 05/04/2019 7:12 PM, Andrew Haley wrote: > Hi, > > On 4/5/19 3:03 PM, Dmitrij Pochepko wrote: > >> please review small patch for JDK-8221995: AARCH64: problems with CAS >> instructions encoding >> webrev: http://cr.openjdk.java.net/~dpochepk/8221995/webrev/ >> >> Patch fix 3 problems: >> - specification allows addressing register to be SP, while current >> hotspot encoding implementation hits assert in this case >> >> - specification allows data register(s) to be ZR, while current hotspot >> encoding implementation hits assert in this case >> - all pair CAS instructions (CASP*) are encoded incorrectly in bit 23, >> which leads to another instructions generation instead (CAS*B and CAS*H) >> >> Current code shape doesn't generate CAS* instructions using affected >> cases. That is why these problems wasn't found before. >> >> Testing: >> >> I generated code with CAS and CASP with and without patch. Patched >> version hits no asserts while using zr and sp registers. And casp* >> instruction now generated correctly. >> CR: https://bugs.openjdk.java.net/browse/JDK-8221995 > It's hard to find a use case for CAS into the stack. I guess that ZR > as a compare register is reasonable enough, but I think we never > generate it. I don't think we use CASP at all. > > So, I think this makes no difference to the VM, but it's a decent > cleanup. OK. > From ekaterina.pavlova at oracle.com Fri Apr 5 16:34:21 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Fri, 5 Apr 2019 09:34:21 -0700 Subject: RFR (T/S) 8216551: GraalUnitTestLauncher should be executed as '@run driver' In-Reply-To: <77bab14e-8456-d557-bd95-dd199a4d1bd9@oracle.com> References: <9a0d9d58-d385-7a16-54c4-70ce3175b023@oracle.com> <77bab14e-8456-d557-bd95-dd199a4d1bd9@oracle.com> Message-ID: <6c9755fe-3dcd-6795-b93c-1ef0228fe61d@oracle.com> Dean, thanks a lot for noticing this! I missed this because these timeouts were added by manually editing JttLangMathALTest.java and JttLangMathMZTest.java which should not be done as they are supposed to be automatically generated only. I added 'timeout' support in generateTests.sh and also added /* DO NOT MODIFY THIS FILE. GENERATED BY generateTests.sh */ to be put in generated tests. I have updated the webrev and retested: http://cr.openjdk.java.net/~epavlova//8216551/webrev.00/index.html thanks, -katya On 4/4/19 5:26 PM, dean.long at oracle.com wrote: > The timeout overrides, such as timeout=300, need to be preserved. > > dl > > On 4/4/19 12:50 PM, Ekaterina Pavlova wrote: >> Hi All, >> >> >> GraalUnitTestLauncher doesn't do real testing but spawns a new JVM to run graal unit tests. >> There is no big sense to run GraalUnitTestLauncher in JDK under test and with extra JVM flags >> used for real testing. So, the idea was to use '@run driver' to launch GraalUnitTestLauncher. >> >> However GraalUnitTestLauncher has a code which look for jdk.internal.vm.compiler and >> jdk.internal.vm.ci modules and this code will not work without -XX:+EnableJVMCI. So, replacing >> ?@run main/othervm compiler.graalunit.common.GraalUnitTestLauncher >> to >> ?@run driver compiler.graalunit.common.GraalUnitTestLauncher >> doesn't not work. >> >> The current fix just removes '/othervm' so jtreg will be able to use agent VMs from a pool >> to run compiler.graalunit.common.GraalUnitTestLauncher. >> >> Also updated 2 problem list files to match latest Graal bugs status: >> ?test/hotspot/jtreg/ProblemList-graal.txt >> ? test/jdk/ProblemList-graal.txt >> >> >> Please review the changes. >> >> ??? JBS: https://bugs.openjdk.java.net/browse/JDK-8216551 >> ?webrev: http://cr.openjdk.java.net/~epavlova//8216551/webrev.00/index.html >> testing: run graalunit tests in mach5 >> >> thanks, >> -katya > From vladimir.kozlov at oracle.com Fri Apr 5 17:24:59 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 5 Apr 2019 10:24:59 -0700 Subject: RFR (T/S) 8216551: GraalUnitTestLauncher should be executed as '@run driver' In-Reply-To: <15ab06fc-1988-2a7a-fdfb-985803921b5a@oracle.com> References: <9a0d9d58-d385-7a16-54c4-70ce3175b023@oracle.com> <15ab06fc-1988-2a7a-fdfb-985803921b5a@oracle.com> Message-ID: <653d385a-f432-80e4-361e-9db8036cca87@oracle.com> On 4/5/19 9:00 AM, Ekaterina Pavlova wrote: > On 4/4/19 2:37 PM, Vladimir Kozlov wrote: >> Looks good. Should we also problem list tests for 8221514? > > well, these failures are seen only when Graal is executed with -Xcomp flag. > We have only ProblemList-graal.txt and ProblemList-Xcomp.txt, we don't have ProblemList-Graal-Xcomp.txt. > I would prefer to don't create one more problem list file, Tom also said that 8221514 is going fixed soon. Okay. Thanks, Vladimir > > thanks, > -katya > >> Thanks, >> Vladimir >> >> On 4/4/19 12:50 PM, Ekaterina Pavlova wrote: >>> Hi All, >>> >>> >>> GraalUnitTestLauncher doesn't do real testing but spawns a new JVM to run graal unit tests. >>> There is no big sense to run GraalUnitTestLauncher in JDK under test and with extra JVM flags >>> used for real testing. So, the idea was to use '@run driver' to launch GraalUnitTestLauncher. >>> >>> However GraalUnitTestLauncher has a code which look for jdk.internal.vm.compiler and >>> jdk.internal.vm.ci modules and this code will not work without -XX:+EnableJVMCI. So, replacing >>> ??@run main/othervm compiler.graalunit.common.GraalUnitTestLauncher >>> to >>> ??@run driver compiler.graalunit.common.GraalUnitTestLauncher >>> doesn't not work. >>> >>> The current fix just removes '/othervm' so jtreg will be able to use agent VMs from a pool >>> to run compiler.graalunit.common.GraalUnitTestLauncher. >>> >>> Also updated 2 problem list files to match latest Graal bugs status: >>> ??test/hotspot/jtreg/ProblemList-graal.txt >>> ?? test/jdk/ProblemList-graal.txt >>> >>> >>> Please review the changes. >>> >>> ???? JBS: https://bugs.openjdk.java.net/browse/JDK-8216551 >>> ??webrev: http://cr.openjdk.java.net/~epavlova//8216551/webrev.00/index.html >>> testing: run graalunit tests in mach5 >>> >>> thanks, >>> -katya > From dean.long at oracle.com Fri Apr 5 19:37:53 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 5 Apr 2019 12:37:53 -0700 Subject: RFR (T/S) 8216551: GraalUnitTestLauncher should be executed as '@run driver' In-Reply-To: <6c9755fe-3dcd-6795-b93c-1ef0228fe61d@oracle.com> References: <9a0d9d58-d385-7a16-54c4-70ce3175b023@oracle.com> <77bab14e-8456-d557-bd95-dd199a4d1bd9@oracle.com> <6c9755fe-3dcd-6795-b93c-1ef0228fe61d@oracle.com> Message-ID: <441109d1-2d43-5697-8583-7dfc6388c723@oracle.com> Thanks for fixing the timeout customization.? That was on my list of things to do :-) dl On 4/5/19 9:34 AM, Ekaterina Pavlova wrote: > Dean, > > thanks a lot for noticing this! > I missed this because these timeouts were added by manually editing > ?JttLangMathALTest.java and JttLangMathMZTest.java which should not be > done as they are > supposed to be automatically generated only. > > I added 'timeout' support in generateTests.sh and also added > ?/* DO NOT MODIFY THIS FILE. GENERATED BY generateTests.sh */ > to be put in generated tests. > > I have updated the webrev and retested: > ?http://cr.openjdk.java.net/~epavlova//8216551/webrev.00/index.html > > thanks, > -katya > > On 4/4/19 5:26 PM, dean.long at oracle.com wrote: >> The timeout overrides, such as timeout=300, need to be preserved. >> >> dl >> >> On 4/4/19 12:50 PM, Ekaterina Pavlova wrote: >>> Hi All, >>> >>> >>> GraalUnitTestLauncher doesn't do real testing but spawns a new JVM >>> to run graal unit tests. >>> There is no big sense to run GraalUnitTestLauncher in JDK under test >>> and with extra JVM flags >>> used for real testing. So, the idea was to use '@run driver' to >>> launch GraalUnitTestLauncher. >>> >>> However GraalUnitTestLauncher has a code which look for >>> jdk.internal.vm.compiler and >>> jdk.internal.vm.ci modules and this code will not work without >>> -XX:+EnableJVMCI. So, replacing >>> ?@run main/othervm compiler.graalunit.common.GraalUnitTestLauncher >>> to >>> ?@run driver compiler.graalunit.common.GraalUnitTestLauncher >>> doesn't not work. >>> >>> The current fix just removes '/othervm' so jtreg will be able to use >>> agent VMs from a pool >>> to run compiler.graalunit.common.GraalUnitTestLauncher. >>> >>> Also updated 2 problem list files to match latest Graal bugs status: >>> ?test/hotspot/jtreg/ProblemList-graal.txt >>> ? test/jdk/ProblemList-graal.txt >>> >>> >>> Please review the changes. >>> >>> ??? JBS: https://bugs.openjdk.java.net/browse/JDK-8216551 >>> ?webrev: >>> http://cr.openjdk.java.net/~epavlova//8216551/webrev.00/index.html >>> testing: run graalunit tests in mach5 >>> >>> thanks, >>> -katya >> > From shade at redhat.com Fri Apr 5 23:50:06 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Sat, 6 Apr 2019 01:50:06 +0200 Subject: RFR (XS) 8222032: x86_32 fails with "wrong size of mach node" on AVX-512 machine Message-ID: <48b81158-5265-0fed-f063-90cf960ab3ef@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8222032 This apparently happens because MachSpillCopyNode size calculation is not correct when AVX-512 is involved. x86_64.ad resolves this by calling into MachNode::size, which emits the whole thing into the scratch buffer and thus cheats^W passes through size asserts fine: http://hg.openjdk.java.net/jdk/jdk/file/dfba4e321ab3/src/hotspot/cpu/x86/x86_64.ad#l1504 x86_32.ad can do the same: diff -r b75026a7ca95 src/hotspot/cpu/x86/x86_32.ad --- a/src/hotspot/cpu/x86/x86_32.ad Sat Apr 06 00:30:50 2019 +0200 +++ b/src/hotspot/cpu/x86/x86_32.ad Sat Apr 06 00:31:00 2019 +0200 @@ -1307,11 +1307,11 @@ uint MachSpillCopyNode::size(PhaseRegAlloc *ra_) const { - return implementation( NULL, ra_, true, NULL ); + return MachNode::size(ra_); } Testing: Linux x86_32 fastdebug tier{1,2}, jdk-submit (running) -- Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From vladimir.x.ivanov at oracle.com Fri Apr 5 23:54:11 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 5 Apr 2019 16:54:11 -0700 Subject: RFR (XS) 8222032: x86_32 fails with "wrong size of mach node" on AVX-512 machine In-Reply-To: <48b81158-5265-0fed-f063-90cf960ab3ef@redhat.com> References: <48b81158-5265-0fed-f063-90cf960ab3ef@redhat.com> Message-ID: Looks good. Best regards, Vladimir Ivanov On 05/04/2019 16:50, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8222032 > > This apparently happens because MachSpillCopyNode size calculation is not correct when AVX-512 is > involved. x86_64.ad resolves this by calling into MachNode::size, which emits the whole thing into > the scratch buffer and thus cheats^W passes through size asserts fine: > http://hg.openjdk.java.net/jdk/jdk/file/dfba4e321ab3/src/hotspot/cpu/x86/x86_64.ad#l1504 > > x86_32.ad can do the same: > > diff -r b75026a7ca95 src/hotspot/cpu/x86/x86_32.ad > --- a/src/hotspot/cpu/x86/x86_32.ad Sat Apr 06 00:30:50 2019 +0200 > +++ b/src/hotspot/cpu/x86/x86_32.ad Sat Apr 06 00:31:00 2019 +0200 > @@ -1307,11 +1307,11 @@ > uint MachSpillCopyNode::size(PhaseRegAlloc *ra_) const { > - return implementation( NULL, ra_, true, NULL ); > + return MachNode::size(ra_); > } > > Testing: Linux x86_32 fastdebug tier{1,2}, jdk-submit (running) > From vladimir.kozlov at oracle.com Sat Apr 6 00:44:25 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 5 Apr 2019 17:44:25 -0700 Subject: RFR (XS) 8222032: x86_32 fails with "wrong size of mach node" on AVX-512 machine In-Reply-To: References: <48b81158-5265-0fed-f063-90cf960ab3ef@redhat.com> Message-ID: <682f15f6-46f9-3a61-d197-082d72d75c74@oracle.com> +1 Vladimir K On 4/5/19 4:54 PM, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 05/04/2019 16:50, Aleksey Shipilev wrote: >> Bug: >> ?? https://bugs.openjdk.java.net/browse/JDK-8222032 >> >> This apparently happens because MachSpillCopyNode size calculation is not correct when AVX-512 is >> involved. x86_64.ad resolves this by calling into MachNode::size, which emits the whole thing into >> the scratch buffer and thus cheats^W passes through size asserts fine: >> ?? http://hg.openjdk.java.net/jdk/jdk/file/dfba4e321ab3/src/hotspot/cpu/x86/x86_64.ad#l1504 >> >> x86_32.ad can do the same: >> >> diff -r b75026a7ca95 src/hotspot/cpu/x86/x86_32.ad >> --- a/src/hotspot/cpu/x86/x86_32.ad???? Sat Apr 06 00:30:50 2019 +0200 >> +++ b/src/hotspot/cpu/x86/x86_32.ad???? Sat Apr 06 00:31:00 2019 +0200 >> @@ -1307,11 +1307,11 @@ >> ? uint MachSpillCopyNode::size(PhaseRegAlloc *ra_) const { >> -? return implementation( NULL, ra_, true, NULL ); >> +? return MachNode::size(ra_); >> ? } >> >> Testing: Linux x86_32 fastdebug tier{1,2}, jdk-submit (running) >> From vladimir.x.ivanov at oracle.com Sat Apr 6 00:47:55 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 5 Apr 2019 17:47:55 -0700 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: References: Message-ID: > I have updated the patch based on your advice. > Webrev: http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.01/ What you are proposing is to unconditionally inline constructor calls. I consider such change as too intrusive. I suggest to focus on "profile.count() == 0" check and make it smarter: when profile info is scarce, try to prove that the call site is actually reachable before giving up. Playing with a small microbenchmark, I observed the following: (lldb) p caller_method->print() (lldb caller_method->interpreter_invocation_count() == 1 caller_method->method_data() != NULL [1] caller_method->method_data()->is_mature() == true caller_method->method_data()->invocation_count() == 0 caller_method->method_data()->backedge_count() == 802816 (lldb) p callee_method->print() holder=java/util/Random signature=(J)V loaded=true arg_size=3 flags=public ident=1095 address=0x00000001008a5dd0> callee_method->was_executed_more_than(0) == true In addition, it's possible to prove the call is always executed by looking at CFG or checking that start block is being parsed. When "profile.count() == 0", but the call site has been reached before, it seems the following conditions should hold: caller_method->interpreter_invocation_count() > 0 AND caller_method->method_data()->invocation_count() == (0 OR 1) AND callee_method->was_executed_more_than(0) == true AND parse->block() == parse->start_block() Some of them can be turned into asserts (e.g., invocation_count() == 0). Best regards, Vladimir Ivanov [1] p caller_method->method_data()->print() 0 bci: 5 CounterData count(0) 16 bci: 15 BranchData taken(0) displacement(200) not taken(751616) 48 bci: 19 ciVirtualCallData count(0) entries(1) java/util/Random(751616) 104 bci: 25 ciVirtualCallData count(0) entries(1) java/util/Random(751616) 160 bci: 43 BranchData taken(161218) displacement(32) not taken(590398) 192 bci: 52 JumpData taken(751615) displacement(-176) --- Extra data: 264 bci: 0 ArgInfoData 0x0 [2] (lldb) p this (Parse *) $33 = 0x000070000eacc6e8 (lldb) p start_block() (Parse::Block *) $31 = 0x00000001008aef00 (lldb) p block() (Parse::Block *) $32 = 0x00000001008aef00 > Testing: > ?- Running scimark.monte_carlo on jdk/x64 and jdk8u/mips64 with > -XX:-TieredCompilation: no performance drop > ?- Running SPECjvm2008 on jdk8u/mips64 with -XX:-TieredCompilation: no > performance regression > ?- Running make test TEST="micro" on jdk/x64: no performance regression > ?- Running make test TEST="tier1 tier2 tier3" JTREG="JOBS=3" > CONF=release on jdk/x64: no regression > > Could you please review it and give me some advice? > Thanks a lot. > > Best regards, > Jie > > > On 2019/3/28 ??2:21, Vladimir Ivanov wrote: >> Hi Jie, >> >> The heuristic quirk looks very similar to the one Sergey reported >> recently: >> >> >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-February/032623.html >> >> >> Overall, tweaking the heuristic to favor inlining doesn't look the >> right thing here. profile.count=0 is a sign the profile isn't mature >> enough and it's likely the callee doesn't have enough profiling info >> as well. (And that's what Sergey observed on some of the >> microbenchmarks during his experiments.) >> >> In your particular case (Random::), tweaking the heuristic so >> is_init_with_ea [1] overrules "profile.count > 0" may be a more >> promising approach. After all, the fact that the call site is being >> considered for inlining (and not pruned along with the basic block it >> belongs to) is a strong signal in favor of "profile.count > 0" case. >> (Though it's not guaranteed due to the immaturity of profile data.) >> >> But IMO the root problem is that top-tier compilation happens too >> early: profile data isn't mature enough yet and it will easily lead to >> similar problems later (during compilation). >> >> Best regards, >> Vladimir Ivanov >> >> [1] >> http://hg.openjdk.java.net/jdk/jdk/file/9c84d2865c2d/src/hotspot/share/opto/bytecodeInfo.cpp#l81 >> >> >> On 27/03/2019 03:15, Jie Fu wrote: >>> Hi all, >>> >>> JBS:??? https://bugs.openjdk.java.net/browse/JDK-8221542 >>> Webrev: >>> http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.00/ >>> >>> ## Symptom >>> ~15% performance degradation (from 700 ops/m to 600 ops/m) was >>> observed randomly on x86 while running SPECjvm2008's >>> scimark.monte_carlo with -XX:-TieredCompilation. >>> >>> ## Reproduce >>> It can be always reproduced with the script[1] in less than 5 minutes. >>> >>> ## Reason >>> The drop was caused by a not-inline decision on >>> spec.benchmarks.scimark.utils.Random:: in >>> spec.benchmarks.scimark.monte_carlo.MonteCarlo::integrate. >>> >>> ## Fix >>> It might be better to make a little change to the inline heuristic[2]. >>> >>> For callers without loops, the original heuristic works fine. >>> But for callers with loops, it would be better to make a not-inline >>> decision more conservatively. >>> >>> ## Testing >>> - Running scimark.monte_carlo on jdk/x64 with -XX:-TieredCompilation >>> for about 5000 times, no performance drop >>> ?? Also on jdk8u/mips64 with -XX:-TieredCompilation, no performance drop >>> - Running make test TEST="micro" on jdk/x64, no performance regression >>> - Running SPECjvm2008 on jdk8u/x64 with -XX:-TieredCompilation, no >>> performance regression >>> >>> For more detailed info, please see the JBS. >>> >>> Could you please review it? >>> Thanks a lot. >>> >>> Best regards, >>> Jie >>> >>> [1] http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/reproduce.sh >>> [2] >>> http://hg.openjdk.java.net/jdk/jdk/file/0a2d73e02076/src/hotspot/share/opto/bytecodeInfo.cpp#l375 >>> >>> >>> > From sandhya.viswanathan at intel.com Sat Apr 6 01:18:17 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Sat, 6 Apr 2019 01:18:17 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> Please find below a link to the webrev which enhances super-word auto vectorization for x86. The following additional operations are supported: 1) Absolute for all data types 2) Shifts for byte data types 3) Shift right arithmetic for long data type 4) Byte multiply 5) Negate for float/double JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 Webrev: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.00/ The compiler jtreg tests pass with UseAVX=0,1,2,3 and KNL. Your review and comments are welcome. Best Regards, Sandhya -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.januschke at sap.com Mon Apr 8 08:18:59 2019 From: peter.januschke at sap.com (Januschke, Peter) Date: Mon, 8 Apr 2019 08:18:59 +0000 Subject: RFR(S): 8222103: [testbug] compiler/compilercontrol/jcmd/ClearDirectivesFileStackTest may exceed VM limit Message-ID: Hi, I propose the following fix to the test mentioned in the subject: Problem: The test generates a random number of compiler directives, which might be greater than the value of CompilerDirectivesLimit. This causes the VM to stop execution upon the corresponding capacity check. Fix: set CompilerDirectivesLimit to the max random number used. http://cr.openjdk.java.net/~goetz/wr19/peter/8222103-01 https://bugs.openjdk.java.net/browse/JDK-8222103 Please review, and I please need a sponsor. Best regards Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.ignatyev at oracle.com Mon Apr 8 17:24:52 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 8 Apr 2019 10:24:52 -0700 Subject: RFR(S): 8222103: [testbug] compiler/compilercontrol/jcmd/ClearDirectivesFileStackTest may exceed VM limit In-Reply-To: References: Message-ID: Hi Peter, I don't think it's a test bug, VM shouldn't stop execution if someone requests too many compiler directives, instead it should reject requests which will exceed its capacity. -- Igor > On Apr 8, 2019, at 1:18 AM, Januschke, Peter wrote: > > Hi, > > I propose the following fix to the test mentioned in the subject: > > Problem: > The test generates a random number of compiler directives, which might be greater than the value of CompilerDirectivesLimit. This causes the VM to stop execution upon the corresponding capacity check. > > Fix: set CompilerDirectivesLimit to the max random number used. > > http://cr.openjdk.java.net/~goetz/wr19/peter/8222103-01 > > https://bugs.openjdk.java.net/browse/JDK-8222103 > > Please review, and I please need a sponsor. > > Best regards > > Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Mon Apr 8 20:37:41 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 8 Apr 2019 13:37:41 -0700 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> Message-ID: <5b1b85a0-eb32-dc17-c1cb-b6fa5391b817@oracle.com> I tried to submit it for testing and spotted a build failure w/ clang: .../src/hotspot/cpu/x86/x86.ad:1515:26: error: '&&' within '||' [-Werror,-Wlogical-op-parentheses] (vlen == 64) && (VM_Version::supports_avx512bw() == false)) ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .../src/hotspot/cpu/x86/x86.ad:1515:26: note: place parentheses around the '&&' expression to silence this warning (vlen == 64) && (VM_Version::supports_avx512bw() == false)) ^ ( ) I'll let you know how the testing is going. Best regards, Vladimir Ivanov On 05/04/2019 18:18, Viswanathan, Sandhya wrote: > Please find below a link to the webrev which enhances super-word auto > vectorization for x86. > > The following additional operations are supported: > > 1)Absolute for all data types > > 2)Shifts for byte data types > > 3)Shift right arithmetic for long data type > > 4)Byte multiply > > 5)Negate for float/double > > JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 > > Webrev: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.00/ > > The compiler jtreg tests pass with UseAVX=0,1,2,3 and KNL. > > Your review and comments are welcome. > > Best Regards, > > Sandhya > From sandhya.viswanathan at intel.com Mon Apr 8 21:26:24 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Mon, 8 Apr 2019 21:26:24 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <5b1b85a0-eb32-dc17-c1cb-b6fa5391b817@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <5b1b85a0-eb32-dc17-c1cb-b6fa5391b817@oracle.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAAFE4@FMSMSX126.amr.corp.intel.com> Hi Vladimir, Yes the intent was to have it like below: if ((vlen == 32 && UseAVX < 2) || ((vlen == 64) && (VM_Version::supports_avx512bw() == false))) I will look forward to the test results. Best Regards, Sandhya -----Original Message----- From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] Sent: Monday, April 08, 2019 1:38 PM To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov Cc: Rukmannagari, Shravya ; Deshpande, Vivek R Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 I tried to submit it for testing and spotted a build failure w/ clang: .../src/hotspot/cpu/x86/x86.ad:1515:26: error: '&&' within '||' [-Werror,-Wlogical-op-parentheses] (vlen == 64) && (VM_Version::supports_avx512bw() == false)) ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .../src/hotspot/cpu/x86/x86.ad:1515:26: note: place parentheses around the '&&' expression to silence this warning (vlen == 64) && (VM_Version::supports_avx512bw() == false)) ^ ( ) I'll let you know how the testing is going. Best regards, Vladimir Ivanov On 05/04/2019 18:18, Viswanathan, Sandhya wrote: > Please find below a link to the webrev which enhances super-word auto > vectorization for x86. > > The following additional operations are supported: > > 1)Absolute for all data types > > 2)Shifts for byte data types > > 3)Shift right arithmetic for long data type > > 4)Byte multiply > > 5)Negate for float/double > > JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 > > Webrev: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.00/ > > The compiler jtreg tests pass with UseAVX=0,1,2,3 and KNL. > > Your review and comments are welcome. > > Best Regards, > > Sandhya > From felix.yang at huawei.com Tue Apr 9 01:36:02 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 9 Apr 2019 01:36:02 +0000 Subject: [aarch64-port-dev ] RFR: 8221658: aarch64: add necessary predicate for ubfx patterns In-Reply-To: <6c9e81fa-5f69-e601-8507-b82d8bf96beb@redhat.com> References: <130fbe62-4fac-5a8d-aade-74e340459e23@redhat.com> <79987274-97d6-bde0-8577-e5046864bdde@redhat.com> <6c9e81fa-5f69-e601-8507-b82d8bf96beb@redhat.com> Message-ID: Hi, This can pass the fuzz test and we got the same output as when the test is executed on the x86 platform. New webrev: http://cr.openjdk.java.net/~fyang/8221658/webrev.02/ This also incorporates the following two constraints: 2. mask != 0 3. rshift + width <= 32/64 (width = exact_log2(mask+1)) JTreg tested with a fastdebug build, OK? Thanks, Felix > > Try this: > > diff -r c763810a9bf5 src/cpu/aarch64/vm/aarch64.ad > --- a/src/cpu/aarch64/vm/aarch64.ad Fri Sep 28 08:48:26 2018 +0800 > +++ b/src/cpu/aarch64/vm/aarch64.ad Thu Apr 04 13:47:03 2019 -0400 > @@ -12340,7 +12340,7 @@ > ins_cost(INSN_COST); > format %{ "ubfxw $dst, $src, $mask" %} > ins_encode %{ > - int rshift = $rshift$$constant; > + int rshift = $rshift$$constant & 31; > long mask = $mask$$constant; > int width = exact_log2(mask+1); > __ ubfxw(as_Register($dst$$reg), > @@ -12355,7 +12355,7 @@ > ins_cost(INSN_COST); > format %{ "ubfx $dst, $src, $mask" %} > ins_encode %{ > - int rshift = $rshift$$constant; > + int rshift = $rshift$$constant & 63; > long mask = $mask$$constant; > int width = exact_log2(mask+1); > __ ubfx(as_Register($dst$$reg), > @@ -12373,7 +12373,7 @@ > ins_cost(INSN_COST * 2); > format %{ "ubfx $dst, $src, $mask" %} > ins_encode %{ > - int rshift = $rshift$$constant; > + int rshift = $rshift$$constant & 31; > long mask = $mask$$constant; > int width = exact_log2(mask+1); > __ ubfx(as_Register($dst$$reg), > diff -r c763810a9bf5 src/cpu/aarch64/vm/aarch64_ad.m4 > --- a/src/cpu/aarch64/vm/aarch64_ad.m4 Fri Sep 28 08:48:26 2018 +0800 > +++ b/src/cpu/aarch64/vm/aarch64_ad.m4 Thu Apr 04 13:47:03 2019 -0400 > @@ -185,7 +185,7 @@ > ins_cost(INSN_COST); > format %{ "$3 $dst, $src, $mask" %} > ins_encode %{ > - int rshift = $rshift$$constant; > + int rshift = $rshift$$constant & $4; > long mask = $mask$$constant; > int width = exact_log2(mask+1); > __ $3(as_Register($dst$$reg), > @@ -193,8 +193,8 @@ > %} > ins_pipe(ialu_reg_shift); > %}') > -BFX_INSN(I,URShift,ubfxw) > -BFX_INSN(L,URShift,ubfx) > +BFX_INSN(I,URShift,ubfxw,31) > +BFX_INSN(L,URShift,ubfx,63) > > // We can use ubfx when extending an And with a mask when we know mask > // is positive. We know that because immI_bitmask guarantees it. > @@ -205,7 +205,7 @@ > ins_cost(INSN_COST * 2); > format %{ "ubfx $dst, $src, $mask" %} > ins_encode %{ > - int rshift = $rshift$$constant; > + int rshift = $rshift$$constant & 31; > long mask = $mask$$constant; > int width = exact_log2(mask+1); > __ ubfx(as_Register($dst$$reg), > > -- > Andrew Haley > Java Platform Lead Engineer > Red Hat UK Ltd. > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From fujie at loongson.cn Tue Apr 9 03:51:25 2019 From: fujie at loongson.cn (Jie Fu) Date: Tue, 9 Apr 2019 11:51:25 +0800 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: References: Message-ID: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> Hi Vladimir, Thank you so much for your review and valuable suggestions. Here is the updated version: http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.02/ Please see comments inline and review. Thanks a lot. Best regards, Jie > What you are proposing is to unconditionally inline constructor calls. > I consider such change as too intrusive. > > I suggest to focus on "profile.count() == 0" check and make it > smarter: when profile info is scarce, try to prove that the call site > is actually reachable before giving up. OK, I agree. > In addition, it's possible to prove the call is always executed by > looking at CFG or checking that start block is being parsed. Very good idea! I have checked that if the call site belongs to a start block in the updated patch. I had tried to look at the CFG like this, but failed. ----------------------------------------------- diff -r 7383a17b4c65 src/hotspot/share/opto/bytecodeInfo.cpp --- a/src/hotspot/share/opto/bytecodeInfo.cpp?? Mon Apr 08 15:27:24 2019 +0800 +++ b/src/hotspot/share/opto/bytecodeInfo.cpp?? Mon Apr 08 17:25:04 2019 +0800 @@ -373,9 +373,11 @@ ???? } else if (forced_inline()) { ?????? // Inlining was forced by CompilerOracle, ciReplay or annotation ???? } else if (profile.count() == 0) { -????? // don't inline unreached call sites -?????? set_msg("call site not reached"); -?????? return false; +????? if (C->cfg()->get_block(jvms->bci()) != C->cfg()->get_block(0)) { +??????? // don't inline unreached call sites +??????? set_msg("call site not reached"); +??????? return false; +????? } ???? } ?? } ----------------------------------------------- Do you have any comments on how to look at the CFG for more info? Thanks. > > When "profile.count() == 0", but the call site has been reached > before, it seems the following conditions should hold: > > ? caller_method->interpreter_invocation_count() > 0 I think this condition is redundant since it always holds for a method to be compiled. > AND > ? caller_method->method_data()->invocation_count() == (0 OR 1) I'm not sure if this condition still holds for parallel execution of the caller. > AND > ?callee_method->was_executed_more_than(0) == true Even though this rule is true, it seems still hard to say that the particular call site had been reached before. > AND > ?parse->block() == parse->start_block() > Very nice! This rule is good enough to solve this particular issue. And it has been implemented in the patch. > Some of them can be turned into asserts (e.g., invocation_count() == 0). > > Best regards, > Vladimir Ivanov > > [2] > (lldb) p this > (Parse *) $33 = 0x000070000eacc6e8 > > (lldb) p start_block() > (Parse::Block *) $31 = 0x00000001008aef00 > > (lldb) p block() > (Parse::Block *) $32 = 0x00000001008aef00 Could you please also provide the backtrace info? I had tried to find the right place to directly call start_block() and block() in my patch, but failed. Any more comments? Thank you very much. From Yang.Zhang at arm.com Tue Apr 9 08:04:01 2019 From: Yang.Zhang at arm.com (Yang Zhang (Arm Technology China)) Date: Tue, 9 Apr 2019 08:04:01 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> Message-ID: Hi Sandhya Thanks for proposing this enhancement. I have tested this patch in our internal ci. There is a new failure. But I didn't check the details. java/math/BigDecimal/DivideMcTests.java In addition, there are trailing spaces in the following files. src/hotspot/cpu/x86/assembler_x86.cpp src/hotspot/cpu/x86/stubGenerator_x86_32.cpp src/hotspot/cpu/x86/x86.ad src/hotspot/cpu/x86/x86_32.ad src/hotspot/share/opto/superword.cpp In file src/hotspot/share/classfile/vmSymbols.hpp, there are some unaligned lines. In file test/hotspot/jtreg/compiler/c2/cr6340864/TestIntVect.java, there are new test functions. Are these new functions needed by byte/short/long? Regards, Yang -----Original Message----- From: hotspot-compiler-dev On Behalf Of Viswanathan, Sandhya Sent: Saturday, April 6, 2019 9:18 AM To: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov Subject: RFR (M) 8222074: Enhance auto vectorization for x86 Please find below a link to the webrev which enhances super-word auto vectorization for x86. The following additional operations are supported: 1) Absolute for all data types 2) Shifts for byte data types 3) Shift right arithmetic for long data type 4) Byte multiply 5) Negate for float/double JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 Webrev: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.00/ The compiler jtreg tests pass with UseAVX=0,1,2,3 and KNL. Your review and comments are welcome. Best Regards, Sandhya From aph at redhat.com Tue Apr 9 09:20:35 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 9 Apr 2019 10:20:35 +0100 Subject: [aarch64-port-dev ] RFR: 8221658: aarch64: add necessary predicate for ubfx patterns In-Reply-To: References: <130fbe62-4fac-5a8d-aade-74e340459e23@redhat.com> <79987274-97d6-bde0-8577-e5046864bdde@redhat.com> <6c9e81fa-5f69-e601-8507-b82d8bf96beb@redhat.com> Message-ID: <9639da61-26cc-791e-4731-ff833e903c7d@redhat.com> On 4/9/19 2:36 AM, Yangfei (Felix) wrote: > This can pass the fuzz test and we got the same output as when the test is executed on the x86 platform. > New webrev: http://cr.openjdk.java.net/~fyang/8221658/webrev.02/ > This also incorporates the following two constraints: > 2. mask != 0 > 3. rshift + width <= 32/64 (width = exact_log2(mask+1)) > > JTreg tested with a fastdebug build, OK? That looks right, thanks. Because C2 does not mask immediate shifts, there are other lurking bugs. It would be good to systematically mask all immediate shift counts throughout aarch64.ad in a separate patch. Would you like to do that work? Thanks. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From peter.januschke at sap.com Tue Apr 9 09:34:57 2019 From: peter.januschke at sap.com (Januschke, Peter) Date: Tue, 9 Apr 2019 09:34:57 +0000 Subject: RFR(S): 8222103: [testbug] compiler/compilercontrol/jcmd/ClearDirectivesFileStackTest may exceed VM limit In-Reply-To: References: Message-ID: Hi Igor, the current implementation prints a message and then stops: bool DirectivesStack::check_capacity(int request_size, outputStream* st) { if ((request_size + _depth) > CompilerDirectivesLimit) { st->print_cr("Could not add %i more directives. Currently %i/%i directives.", request_size, _depth, CompilerDirectivesLimit); return false; } return true; } Best regards Peter From: Igor Ignatyev Sent: Montag, 8. April 2019 19:25 To: Januschke, Peter Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR(S): 8222103: [testbug] compiler/compilercontrol/jcmd/ClearDirectivesFileStackTest may exceed VM limit Hi Peter, I don't think it's a test bug, VM shouldn't stop execution if someone requests too many compiler directives, instead it should reject requests which will exceed its capacity. -- Igor On Apr 8, 2019, at 1:18 AM, Januschke, Peter > wrote: Hi, I propose the following fix to the test mentioned in the subject: Problem: The test generates a random number of compiler directives, which might be greater than the value of CompilerDirectivesLimit. This causes the VM to stop execution upon the corresponding capacity check. Fix: set CompilerDirectivesLimit to the max random number used. http://cr.openjdk.java.net/~goetz/wr19/peter/8222103-01 https://bugs.openjdk.java.net/browse/JDK-8222103 Please review, and I please need a sponsor. Best regards Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From felix.yang at huawei.com Tue Apr 9 10:25:57 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 9 Apr 2019 10:25:57 +0000 Subject: [aarch64-port-dev ] RFR: 8221658: aarch64: add necessary predicate for ubfx patterns In-Reply-To: <9639da61-26cc-791e-4731-ff833e903c7d@redhat.com> References: <130fbe62-4fac-5a8d-aade-74e340459e23@redhat.com> <79987274-97d6-bde0-8577-e5046864bdde@redhat.com> <6c9e81fa-5f69-e601-8507-b82d8bf96beb@redhat.com> <9639da61-26cc-791e-4731-ff833e903c7d@redhat.com> Message-ID: Thanks for reviewing. Yes, I will take a look at other similar issues. > > On 4/9/19 2:36 AM, Yangfei (Felix) wrote: > > This can pass the fuzz test and we got the same output as when the test is > executed on the x86 platform. > > New webrev: http://cr.openjdk.java.net/~fyang/8221658/webrev.02/ > > This also incorporates the following two constraints: > > 2. mask != 0 > > 3. rshift + width <= 32/64 (width = exact_log2(mask+1)) > > > > JTreg tested with a fastdebug build, OK? > > That looks right, thanks. > > Because C2 does not mask immediate shifts, there are other lurking > bugs. It would be good to systematically mask all immediate shift > counts throughout aarch64.ad in a separate patch. > > Would you like to do that work? Thanks. > From nils.eliasson at oracle.com Tue Apr 9 11:55:19 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 9 Apr 2019 13:55:19 +0200 Subject: RFR(S): 8222103: [testbug] compiler/compilercontrol/jcmd/ClearDirectivesFileStackTest may exceed VM limit In-Reply-To: References: Message-ID: Hi, The design is that you add a number of directives together to get a desired behavior. Adding just some of them make no sense. If you add directives by command line, and get an error (syntax or limit) - it will print the error stop the startup. This allows the user to correct the problem and retry. If you add by jcmd, the error is printed on the jcmd console, but the VM continues on unaffected. The user can correct the problem and make another try. I think there is a separate test of the limit. And if that is already covered, testing it in this test too, seems unnecessary. Regards, // Nils On 2019-04-09 11:34, Januschke, Peter wrote: > > Hi Igor, > > the current implementation prints a message and then stops: > > bool DirectivesStack::check_capacity(int request_size, outputStream* st) { > > ? if ((request_size + _depth) > CompilerDirectivesLimit) { > > st->print_cr("Could not add %i more directives. Currently %i/%i > directives.", request_size, _depth, CompilerDirectivesLimit); > > return false; > > ? } > > ? return true; > > } > > Best regards > > Peter > > *From:*Igor Ignatyev > *Sent:* Montag, 8. April 2019 19:25 > *To:* Januschke, Peter > *Cc:* hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: RFR(S): 8222103: [testbug] > compiler/compilercontrol/jcmd/ClearDirectivesFileStackTest may exceed > VM limit > > Hi Peter, > > I don't think it's a test bug, VM shouldn't stop execution if someone > requests too many compiler directives, instead it should reject > requests which will exceed its capacity. > > -- Igor > > > > On Apr 8, 2019, at 1:18 AM, Januschke, Peter > > wrote: > > Hi, > > I propose the following fix to the test mentioned in the subject: > > Problem: > > The test generates a random number of compiler directives, which > might be greater than the value of CompilerDirectivesLimit. This > causes the VM to stop execution upon the corresponding capacity check. > > Fix: set CompilerDirectivesLimit to the max random number used. > > http://cr.openjdk.java.net/~goetz/wr19/peter/8222103-01 > > https://bugs.openjdk.java.net/browse/JDK-8222103 > > Please review, and I please need a sponsor. > > Best regards > > Peter > -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.ignatyev at oracle.com Tue Apr 9 16:30:24 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 9 Apr 2019 09:30:24 -0700 Subject: RFR(S): 8222103: [testbug] compiler/compilercontrol/jcmd/ClearDirectivesFileStackTest may exceed VM limit In-Reply-To: References: Message-ID: <9F82E0AB-45DB-45F4-B453-691F4A1AB244@oracle.com> Nils, thanks for the clarification. Peter, I haven't looked at these tests for quite a while, and, I guess, presence of 'jcmd' in the path got me confused, so I (now obviously) incorrectly assumed that the directives are added by jcmd. in this case, I agree that it is a test bug. I'd however prefer it to be fixed slightly different and instead of adding 'CompilerDirectivesLimit to command line I'd read its value using WhiteBox and limit ClearDirectivesFileStackTest::AMOUNT by it. you'll also need to update year in the copyright notice. Thanks, -- Igor > On Apr 9, 2019, at 4:55 AM, Nils Eliasson wrote: > > Hi, > > The design is that you add a number of directives together to get a desired behavior. Adding just some of them make no sense. > > If you add directives by command line, and get an error (syntax or limit) - it will print the error stop the startup. This allows the user to correct the problem and retry. > > If you add by jcmd, the error is printed on the jcmd console, but the VM continues on unaffected. The user can correct the problem and make another try. > > I think there is a separate test of the limit. And if that is already covered, testing it in this test too, seems unnecessary. > > Regards, > > // Nils > > > > On 2019-04-09 11:34, Januschke, Peter wrote: >> Hi Igor, >> >> the current implementation prints a message and then stops: >> >> bool DirectivesStack::check_capacity(int request_size, outputStream* st) { >> if ((request_size + _depth) > CompilerDirectivesLimit) { >> st->print_cr("Could not add %i more directives. Currently %i/%i directives.", request_size, _depth, CompilerDirectivesLimit); >> return false; >> } >> return true; >> } >> >> Best regards >> >> Peter >> >> From: Igor Ignatyev >> Sent: Montag, 8. April 2019 19:25 >> To: Januschke, Peter >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR(S): 8222103: [testbug] compiler/compilercontrol/jcmd/ClearDirectivesFileStackTest may exceed VM limit >> >> Hi Peter, >> >> I don't think it's a test bug, VM shouldn't stop execution if someone requests too many compiler directives, instead it should reject requests which will exceed its capacity. >> >> -- Igor >> >> >> On Apr 8, 2019, at 1:18 AM, Januschke, Peter > wrote: >> >> Hi, >> >> I propose the following fix to the test mentioned in the subject: >> >> Problem: >> The test generates a random number of compiler directives, which might be greater than the value of CompilerDirectivesLimit. This causes the VM to stop execution upon the corresponding capacity check. >> >> Fix: set CompilerDirectivesLimit to the max random number used. >> >> http://cr.openjdk.java.net/~goetz/wr19/peter/8222103-01 >> >> https://bugs.openjdk.java.net/browse/JDK-8222103 >> >> Please review, and I please need a sponsor. >> >> Best regards >> >> Peter >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandhya.viswanathan at intel.com Tue Apr 9 17:18:16 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 9 Apr 2019 17:18:16 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> Hi Yang, Thanks a lot for trying out the patch in your setup. Please do let me know when you check the details if you find the failure in DivideMvTests.java to be due to this patch. I will fix all the trailing space and unaligned line style issues that you pointed out. The TestInt is updated to cover for some additional support added for "int" in this patch like Absolute and subtraction from zero. There is an additional test for Not for which we plan to add support in a follow up patch. Best Regards, Sandhya -----Original Message----- From: Yang Zhang (Arm Technology China) [mailto:Yang.Zhang at arm.com] Sent: Tuesday, April 09, 2019 1:04 AM To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 Hi Sandhya Thanks for proposing this enhancement. I have tested this patch in our internal ci. There is a new failure. But I didn't check the details. java/math/BigDecimal/DivideMcTests.java In addition, there are trailing spaces in the following files. src/hotspot/cpu/x86/assembler_x86.cpp src/hotspot/cpu/x86/stubGenerator_x86_32.cpp src/hotspot/cpu/x86/x86.ad src/hotspot/cpu/x86/x86_32.ad src/hotspot/share/opto/superword.cpp In file src/hotspot/share/classfile/vmSymbols.hpp, there are some unaligned lines. In file test/hotspot/jtreg/compiler/c2/cr6340864/TestIntVect.java, there are new test functions. Are these new functions needed by byte/short/long? Regards, Yang -----Original Message----- From: hotspot-compiler-dev On Behalf Of Viswanathan, Sandhya Sent: Saturday, April 6, 2019 9:18 AM To: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov Subject: RFR (M) 8222074: Enhance auto vectorization for x86 Please find below a link to the webrev which enhances super-word auto vectorization for x86. The following additional operations are supported: 1) Absolute for all data types 2) Shifts for byte data types 3) Shift right arithmetic for long data type 4) Byte multiply 5) Negate for float/double JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 Webrev: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.00/ The compiler jtreg tests pass with UseAVX=0,1,2,3 and KNL. Your review and comments are welcome. Best Regards, Sandhya IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From vladimir.kozlov at oracle.com Tue Apr 9 19:33:51 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 9 Apr 2019 12:33:51 -0700 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> Message-ID: Hi Sandhya, I looked through changes and had discussion with Vladimir Ivanov about them. In general logic of changes follow out usual pattern - no problem here. There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? But my main concern now is JVM size significant grow. And it will be worse when we implementing the rest of Vector API instructions. The main reason for size grow is additional AD instructions which are compiled by APLC into multiply functions. I originally thought that moving 'ins_encode %{ %}' code which have several instructions into macro-assembler will help if it is used by several AD instructions. But Vladimir I. convinced me that it will be insignificant comparing to reducing number of AD instructions. We came up with several suggestion how we can address it and it will greatly help if you (Intel) investigate them. 1. I think you should provide JVM size increase data for changes like this. What is increase for this one? 2. How is important for Intel to support new vector instructions for CPU without AVX? May be we should stop new code for old CPUs such as vsll16B_reg, for exaxmple. It does not mean we can't use SSE instructions in implementation (for example, vabs8B_reg) - such cases are fine. 3. I still want to see some common instructions pattern in 'ins_encode %{ %}' be moved into macro-assembler. For example, the only difference between vs*_reg and vs*_reg_imm is one or 2 instructions, the rest is the same. 4. Most important. The main reason we have a lot AD instructions is to 'match' different vector types for corresponding different vector length. I think we should revisit this approach. Intel CPU does not use parts of vector registers separately - C2 does not use XMM0b, XMM0c, XMM0d parts of xmm0. Even when C2 uses VecS type it use whole zmm register in avx512 but narrowed it by passing length to assembler instruction (or we use an instruction which uses only part of 512 bit register). Vladimir I. suggested to have VecMAX type which can be used to match all different vector length implementation to have only one AD instruction. And use vector length to generate corresponding code. For example, vabs8B_reg() and vabs16B_reg() are almost the same except vectors type VecD vs VecX. There should be no difference in code generation (we need to modify vec_mov_helper() and other similar code to check vector length when it see VecMAX). We can use this approach for already existing instructions too to reduce code size generated from AD files. What do you think? Regards, Vladimir K On 4/9/19 10:18 AM, Viswanathan, Sandhya wrote: > Hi Yang, > > Thanks a lot for trying out the patch in your setup. > > Please do let me know when you check the details if you find the failure in DivideMvTests.java to be due to this patch. > > I will fix all the trailing space and unaligned line style issues that you pointed out. > > The TestInt is updated to cover for some additional support added for "int" in this patch like Absolute and subtraction from zero. > There is an additional test for Not for which we plan to add support in a follow up patch. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Yang Zhang (Arm Technology China) [mailto:Yang.Zhang at arm.com] > Sent: Tuesday, April 09, 2019 1:04 AM > To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 > > Hi Sandhya > > Thanks for proposing this enhancement. > I have tested this patch in our internal ci. There is a new failure. But I didn't check the details. > java/math/BigDecimal/DivideMcTests.java > > In addition, there are trailing spaces in the following files. > src/hotspot/cpu/x86/assembler_x86.cpp > src/hotspot/cpu/x86/stubGenerator_x86_32.cpp > src/hotspot/cpu/x86/x86.ad > src/hotspot/cpu/x86/x86_32.ad > src/hotspot/share/opto/superword.cpp > > In file src/hotspot/share/classfile/vmSymbols.hpp, there are some unaligned lines. > In file test/hotspot/jtreg/compiler/c2/cr6340864/TestIntVect.java, there are new test functions. Are these new functions needed by byte/short/long? > > Regards, > Yang > > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Viswanathan, Sandhya > Sent: Saturday, April 6, 2019 9:18 AM > To: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov > Subject: RFR (M) 8222074: Enhance auto vectorization for x86 > > > Please find below a link to the webrev which enhances super-word auto vectorization for x86. > The following additional operations are supported: > > 1) Absolute for all data types > > 2) Shifts for byte data types > > 3) Shift right arithmetic for long data type > > 4) Byte multiply > > 5) Negate for float/double > > JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 > Webrev: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.00/ > > The compiler jtreg tests pass with UseAVX=0,1,2,3 and KNL. > Your review and comments are welcome. > > Best Regards, > Sandhya > > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. > From john.r.rose at oracle.com Tue Apr 9 19:53:26 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 9 Apr 2019 12:53:26 -0700 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> Message-ID: <2317EEB0-C081-447C-BA29-40E69ACBC66B@oracle.com> I agree that the AD file combinatorics need to be tamed more aggressively in this way. I have to wonder if this could be done during incubation, as a cleanup of tech. debt, so we can get experience with the API at the same time. ? John On Apr 9, 2019, at 12:33 PM, Vladimir Kozlov wrote: > > 4. Most important. The main reason we have a lot AD instructions is to 'match' different vector types for corresponding different vector length. I think we should revisit this approach. > > Intel CPU does not use parts of vector registers separately - C2 does not use XMM0b, XMM0c, XMM0d parts of xmm0. Even when C2 uses VecS type it use whole zmm register in avx512 but narrowed it by passing length to assembler instruction (or we use an instruction which uses only part of 512 bit register). > > Vladimir I. suggested to have VecMAX type which can be used to match all different vector length implementation to have only one AD instruction. And use vector length to generate corresponding code. For example, vabs8B_reg() and vabs16B_reg() are almost the same except vectors type VecD vs VecX. There should be no difference in code generation (we need to modify vec_mov_helper() and other similar code to check vector length when it see VecMAX). > > We can use this approach for already existing instructions too to reduce code size generated from AD files. From sandhya.viswanathan at intel.com Tue Apr 9 19:56:18 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 9 Apr 2019 19:56:18 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <2317EEB0-C081-447C-BA29-40E69ACBC66B@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <2317EEB0-C081-447C-BA29-40E69ACBC66B@oracle.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB6B8@FMSMSX126.amr.corp.intel.com> Hi John/VladimirK/VladimitI, Thanks a lot for your inputs. Let me think about all these points and come back with a proposal. Best Regards, Sandhya -----Original Message----- From: John Rose [mailto:john.r.rose at oracle.com] Sent: Tuesday, April 09, 2019 12:53 PM To: Vladimir Kozlov Cc: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 I agree that the AD file combinatorics need to be tamed more aggressively in this way. I have to wonder if this could be done during incubation, as a cleanup of tech. debt, so we can get experience with the API at the same time. ? John On Apr 9, 2019, at 12:33 PM, Vladimir Kozlov wrote: > > 4. Most important. The main reason we have a lot AD instructions is to 'match' different vector types for corresponding different vector length. I think we should revisit this approach. > > Intel CPU does not use parts of vector registers separately - C2 does not use XMM0b, XMM0c, XMM0d parts of xmm0. Even when C2 uses VecS type it use whole zmm register in avx512 but narrowed it by passing length to assembler instruction (or we use an instruction which uses only part of 512 bit register). > > Vladimir I. suggested to have VecMAX type which can be used to match all different vector length implementation to have only one AD instruction. And use vector length to generate corresponding code. For example, vabs8B_reg() and vabs16B_reg() are almost the same except vectors type VecD vs VecX. There should be no difference in code generation (we need to modify vec_mov_helper() and other similar code to check vector length when it see VecMAX). > > We can use this approach for already existing instructions too to reduce code size generated from AD files. From daniel.daugherty at oracle.com Tue Apr 9 20:47:54 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 9 Apr 2019 16:47:54 -0400 Subject: RFR(T): 8222229: ProblemList compiler/jsr292/InvokerSignatureMismatch.java Message-ID: <98636a07-3b50-f0c3-21a7-546b0205a4f6@oracle.com> Greetings, I have a trivial fix for the following bug: ??? JDK-8222229 ProblemList compiler/jsr292/InvokerSignatureMismatch.java ??? https://bugs.openjdk.java.net/browse/JDK-8222229 in order to reduce the noise in the JDK13 CI... Here's the context diff: $ hg diff diff -r dc21be24a8ff closed/test/hotspot/jtreg/ProblemList-graal.txt --- a/closed/test/hotspot/jtreg/ProblemList-graal.txt??? Tue Apr 09 12:13:01 2019 -0700 +++ b/closed/test/hotspot/jtreg/ProblemList-graal.txt??? Tue Apr 09 16:39:45 2019 -0400 @@ -17,4 +17,6 @@ ?compiler/deoptimization/DeoptReturn.java??????????????? 8199484 generic-all +compiler/jsr292/InvokerSignatureMismatch.java?????????? 8221577 generic-all + ?applications/microbenchmarks/other/Test.java#id191????? 8221585 generic-all This will reduce the noise in JDK13 CI tier3 until: ??? JDK-8221577 [Graal] compiler/jsr292/InvokerSignatureMismatch.java fails with BytecodeParserError ??? https://bugs.openjdk.java.net/browse/JDK-8221577 is fixed. Thanks, in advance, for any questions, comments or suggestions. Dan From daniel.daugherty at oracle.com Tue Apr 9 20:55:42 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 9 Apr 2019 16:55:42 -0400 Subject: RFR(T): 8222229: ProblemList compiler/jsr292/InvokerSignatureMismatch.java In-Reply-To: <63176C1D-0815-4533-A20A-A916F291A02F@oracle.com> References: <98636a07-3b50-f0c3-21a7-546b0205a4f6@oracle.com> <63176C1D-0815-4533-A20A-A916F291A02F@oracle.com> Message-ID: <50c6c700-9bc7-2a3e-8a54-4f1b65476f2f@oracle.com> Thanks Igor! Dan On 4/9/19 4:55 PM, Igor Ignatyev wrote: > Dan, > > Looks good to me. > > -- Igor > >> On Apr 9, 2019, at 1:47 PM, Daniel D. Daugherty wrote: >> >> Greetings, >> >> I have a trivial fix for the following bug: >> >> JDK-8222229 ProblemList compiler/jsr292/InvokerSignatureMismatch.java >> https://bugs.openjdk.java.net/browse/JDK-8222229 >> >> in order to reduce the noise in the JDK13 CI... >> >> Here's the context diff: >> >> $ hg diff >> diff -r dc21be24a8ff closed/test/hotspot/jtreg/ProblemList-graal.txt >> --- a/closed/test/hotspot/jtreg/ProblemList-graal.txt Tue Apr 09 12:13:01 2019 -0700 >> +++ b/closed/test/hotspot/jtreg/ProblemList-graal.txt Tue Apr 09 16:39:45 2019 -0400 >> @@ -17,4 +17,6 @@ >> >> compiler/deoptimization/DeoptReturn.java 8199484 generic-all >> >> +compiler/jsr292/InvokerSignatureMismatch.java 8221577 generic-all >> + >> applications/microbenchmarks/other/Test.java#id191 8221585 generic-all >> >> This will reduce the noise in JDK13 CI tier3 until: >> >> JDK-8221577 [Graal] compiler/jsr292/InvokerSignatureMismatch.java fails with BytecodeParserError >> https://bugs.openjdk.java.net/browse/JDK-8221577 >> >> is fixed. >> >> Thanks, in advance, for any questions, comments or suggestions. >> >> Dan From igor.ignatyev at oracle.com Tue Apr 9 20:55:10 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 9 Apr 2019 13:55:10 -0700 Subject: RFR(T): 8222229: ProblemList compiler/jsr292/InvokerSignatureMismatch.java In-Reply-To: <98636a07-3b50-f0c3-21a7-546b0205a4f6@oracle.com> References: <98636a07-3b50-f0c3-21a7-546b0205a4f6@oracle.com> Message-ID: <63176C1D-0815-4533-A20A-A916F291A02F@oracle.com> Dan, Looks good to me. -- Igor > On Apr 9, 2019, at 1:47 PM, Daniel D. Daugherty wrote: > > Greetings, > > I have a trivial fix for the following bug: > > JDK-8222229 ProblemList compiler/jsr292/InvokerSignatureMismatch.java > https://bugs.openjdk.java.net/browse/JDK-8222229 > > in order to reduce the noise in the JDK13 CI... > > Here's the context diff: > > $ hg diff > diff -r dc21be24a8ff closed/test/hotspot/jtreg/ProblemList-graal.txt > --- a/closed/test/hotspot/jtreg/ProblemList-graal.txt Tue Apr 09 12:13:01 2019 -0700 > +++ b/closed/test/hotspot/jtreg/ProblemList-graal.txt Tue Apr 09 16:39:45 2019 -0400 > @@ -17,4 +17,6 @@ > > compiler/deoptimization/DeoptReturn.java 8199484 generic-all > > +compiler/jsr292/InvokerSignatureMismatch.java 8221577 generic-all > + > applications/microbenchmarks/other/Test.java#id191 8221585 generic-all > > This will reduce the noise in JDK13 CI tier3 until: > > JDK-8221577 [Graal] compiler/jsr292/InvokerSignatureMismatch.java fails with BytecodeParserError > https://bugs.openjdk.java.net/browse/JDK-8221577 > > is fixed. > > Thanks, in advance, for any questions, comments or suggestions. > > Dan From daniel.daugherty at oracle.com Tue Apr 9 21:08:51 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 9 Apr 2019 17:08:51 -0400 Subject: RFR(T): 8222229: ProblemList compiler/jsr292/InvokerSignatureMismatch.java In-Reply-To: <63176C1D-0815-4533-A20A-A916F291A02F@oracle.com> References: <98636a07-3b50-f0c3-21a7-546b0205a4f6@oracle.com> <63176C1D-0815-4533-A20A-A916F291A02F@oracle.com> Message-ID: <29d44d0d-5ddd-dcd0-7db5-5f6e1ac02cda@oracle.com> Just to be clear... I added these to the closed ProblemList-graal.txt file because this failure was introduced by a closed push that set options for JCK testing... I probably should have done the review on the closed list... :-( Dan On 4/9/19 4:55 PM, Igor Ignatyev wrote: > Dan, > > Looks good to me. > > -- Igor > >> On Apr 9, 2019, at 1:47 PM, Daniel D. Daugherty wrote: >> >> Greetings, >> >> I have a trivial fix for the following bug: >> >> JDK-8222229 ProblemList compiler/jsr292/InvokerSignatureMismatch.java >> https://bugs.openjdk.java.net/browse/JDK-8222229 >> >> in order to reduce the noise in the JDK13 CI... >> >> Here's the context diff: >> >> $ hg diff >> diff -r dc21be24a8ff closed/test/hotspot/jtreg/ProblemList-graal.txt >> --- a/closed/test/hotspot/jtreg/ProblemList-graal.txt Tue Apr 09 12:13:01 2019 -0700 >> +++ b/closed/test/hotspot/jtreg/ProblemList-graal.txt Tue Apr 09 16:39:45 2019 -0400 >> @@ -17,4 +17,6 @@ >> >> compiler/deoptimization/DeoptReturn.java 8199484 generic-all >> >> +compiler/jsr292/InvokerSignatureMismatch.java 8221577 generic-all >> + >> applications/microbenchmarks/other/Test.java#id191 8221585 generic-all >> >> This will reduce the noise in JDK13 CI tier3 until: >> >> JDK-8221577 [Graal] compiler/jsr292/InvokerSignatureMismatch.java fails with BytecodeParserError >> https://bugs.openjdk.java.net/browse/JDK-8221577 >> >> is fixed. >> >> Thanks, in advance, for any questions, comments or suggestions. >> >> Dan > From vladimir.x.ivanov at oracle.com Tue Apr 9 22:02:50 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 9 Apr 2019 15:02:50 -0700 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <2317EEB0-C081-447C-BA29-40E69ACBC66B@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <2317EEB0-C081-447C-BA29-40E69ACBC66B@oracle.com> Message-ID: <9bcc9040-e11a-eb72-9f32-38c381e709cf@oracle.com> On 09/04/2019 12:53, John Rose wrote: > I agree that the AD file combinatorics need to be tamed > more aggressively in this way. > > I have to wonder if this could be done during incubation, > as a cleanup of tech. debt, so we can get experience with > the API at the same time. I believe it depends on how severe static footprint increase will be. Last time I checked [1], Vector API-related changes (w/o SVML stubs) contributed ~3Mb/15% to libjvm.so on Linux. It would be very helpful to have more detailed and up-to-date information on that. Best regards, Vladimir Ivanov [1] http://mail.openjdk.java.net/pipermail/panama-dev/2018-October/002992.html default branch: 22384560 vectorIntrinsic -svml: 25635648 > On Apr 9, 2019, at 12:33 PM, Vladimir Kozlov wrote: >> >> 4. Most important. The main reason we have a lot AD instructions is to 'match' different vector types for corresponding different vector length. I think we should revisit this approach. >> >> Intel CPU does not use parts of vector registers separately - C2 does not use XMM0b, XMM0c, XMM0d parts of xmm0. Even when C2 uses VecS type it use whole zmm register in avx512 but narrowed it by passing length to assembler instruction (or we use an instruction which uses only part of 512 bit register). >> >> Vladimir I. suggested to have VecMAX type which can be used to match all different vector length implementation to have only one AD instruction. And use vector length to generate corresponding code. For example, vabs8B_reg() and vabs16B_reg() are almost the same except vectors type VecD vs VecX. There should be no difference in code generation (we need to modify vec_mov_helper() and other similar code to check vector length when it see VecMAX). >> >> We can use this approach for already existing instructions too to reduce code size generated from AD files. > From igor.ignatyev at oracle.com Tue Apr 9 22:03:32 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 9 Apr 2019 15:03:32 -0700 Subject: RFR(T): 8222229: ProblemList compiler/jsr292/InvokerSignatureMismatch.java In-Reply-To: <29d44d0d-5ddd-dcd0-7db5-5f6e1ac02cda@oracle.com> References: <98636a07-3b50-f0c3-21a7-546b0205a4f6@oracle.com> <63176C1D-0815-4533-A20A-A916F291A02F@oracle.com> <29d44d0d-5ddd-dcd0-7db5-5f6e1ac02cda@oracle.com> Message-ID: <9CC6868D-B5DC-4590-A4B3-6E5751D61BA4@oracle.com> oh, I actually expected it to be in open ProblemList b/c the test is open, but closed list should help in our setup too. hopefully 8221577 will be fixed soon, and openjdk community won't be significantly affected by it. -- Igor > On Apr 9, 2019, at 2:08 PM, Daniel D. Daugherty wrote: > > Just to be clear... I added these to the closed ProblemList-graal.txt > file because this failure was introduced by a closed push that set > options for JCK testing... > > I probably should have done the review on the closed list... :-( > > Dan > > > On 4/9/19 4:55 PM, Igor Ignatyev wrote: >> Dan, >> >> Looks good to me. >> >> -- Igor >> >>> On Apr 9, 2019, at 1:47 PM, Daniel D. Daugherty wrote: >>> >>> Greetings, >>> >>> I have a trivial fix for the following bug: >>> >>> JDK-8222229 ProblemList compiler/jsr292/InvokerSignatureMismatch.java >>> https://bugs.openjdk.java.net/browse/JDK-8222229 >>> >>> in order to reduce the noise in the JDK13 CI... >>> >>> Here's the context diff: >>> >>> $ hg diff >>> diff -r dc21be24a8ff closed/test/hotspot/jtreg/ProblemList-graal.txt >>> --- a/closed/test/hotspot/jtreg/ProblemList-graal.txt Tue Apr 09 12:13:01 2019 -0700 >>> +++ b/closed/test/hotspot/jtreg/ProblemList-graal.txt Tue Apr 09 16:39:45 2019 -0400 >>> @@ -17,4 +17,6 @@ >>> >>> compiler/deoptimization/DeoptReturn.java 8199484 generic-all >>> >>> +compiler/jsr292/InvokerSignatureMismatch.java 8221577 generic-all >>> + >>> applications/microbenchmarks/other/Test.java#id191 8221585 generic-all >>> >>> This will reduce the noise in JDK13 CI tier3 until: >>> >>> JDK-8221577 [Graal] compiler/jsr292/InvokerSignatureMismatch.java fails with BytecodeParserError >>> https://bugs.openjdk.java.net/browse/JDK-8221577 >>> >>> is fixed. >>> >>> Thanks, in advance, for any questions, comments or suggestions. >>> >>> Dan >> > From john.r.rose at oracle.com Tue Apr 9 22:24:13 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 9 Apr 2019 15:24:13 -0700 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <9bcc9040-e11a-eb72-9f32-38c381e709cf@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <2317EEB0-C081-447C-BA29-40E69ACBC66B@oracle.com> <9bcc9040-e11a-eb72-9f32-38c381e709cf@oracle.com> Message-ID: On Apr 9, 2019, at 3:02 PM, Vladimir Ivanov wrote: > > On 09/04/2019 12:53, John Rose wrote: >> I agree that the AD file combinatorics need to be tamed >> more aggressively in this way. >> I have to wonder if this could be done during incubation, >> as a cleanup of tech. debt, so we can get experience with >> the API at the same time. > > I believe it depends on how severe static footprint increase will be. > > Last time I checked [1], Vector API-related changes (w/o SVML stubs) contributed ~3Mb/15% to libjvm.so on Linux. > > It would be very helpful to have more detailed and up-to-date information on that. I agree. And not to pre-empt what the numbers actually say, but I think a parametric, data-driven backend is worth considering, either as a real design, or as an ideal model for what we need to tame the complexity of vector opcodes. Most vector operations are parameterized by vector size, lane size, and per-lane operation, and those three parameters are largely independent of each other. This suggests to me that we could remove intermediate layers of distinction by plumbing uOp and bOp straight through to the backend, equipped with a numeric code derived from the FUnOp/FBinOp instance, plus bit sizes for the vector and the lane. vsize : {64,128,256,?,MAX_VSIZE} lsize : {8,16,32,64} and maybe {1,2,4,128,?} op : { and, or, xor, ? , iadd, isub, icmp, ?, fadd, fsub, fcmp, ?, crc32, clmul, aes_enc_step, ? } Maybe that's too monolithic to actually implement, but I think it is an ideal for taming ISA complexity, by breaking it into partially independent components. And I *do* have an agenda here to make room for the "snowflake" operations which don't make sense for scalars but are reasonably portable. Such operations are slowly growing in number, and have important applications. So the set of opcodes is not as simple one would think at first glance; it's not your grandpa's calculator. ? John P.S. Another degree of freedom is the presence of masking, and on how to materialize the unmasked lanes, zero, destination, or second source. Also whether the kind of the mask: Bitmask, vector-mask, or index range for loops. Not sure how to slice all that, but it seems that identifying the degrees of freedom helps us design a useful framework. P.P.S. Another place where small-integer codes might shine (besides opcodes) is with shuffle generation. The shiftEL operation is really a cross-lane shift, parameterized by a small integer between zero and the number of lanes (minus one). This is almost but not quite a constant shuffle operation; it's a shuffle across a preset small family of shuffles, and it could be refactored as a selection from a small family of "shuffle ops" which either materialize into shuffle constants or are pattern-matched into a special hardware instruction. And there are lots of other shuffle ops that can be special-cased, if you (a) look at use cases for special shuffles, then (b) figure out how to factor and parameterize the subspace of relevant shuffles, and then (c) go find relevant special instructions and write the code generator with built-in strength reduction when the relevant shuffle instances appear. From daniel.daugherty at oracle.com Tue Apr 9 22:28:21 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 9 Apr 2019 18:28:21 -0400 Subject: RFR(T): 8222229: ProblemList compiler/jsr292/InvokerSignatureMismatch.java In-Reply-To: <9CC6868D-B5DC-4590-A4B3-6E5751D61BA4@oracle.com> References: <98636a07-3b50-f0c3-21a7-546b0205a4f6@oracle.com> <63176C1D-0815-4533-A20A-A916F291A02F@oracle.com> <29d44d0d-5ddd-dcd0-7db5-5f6e1ac02cda@oracle.com> <9CC6868D-B5DC-4590-A4B3-6E5751D61BA4@oracle.com> Message-ID: <76b49226-8d96-9fe2-d918-f7103d6b5647@oracle.com> It needs to be on the open ProblemList-graal.txt so I've copied the same entry there. I'll need to backout the entry on the closed ProblemList-graal.txt... Sigh... Dan On 4/9/19 6:03 PM, Igor Ignatyev wrote: > oh, I actually expected it to be in open ProblemList b/c the test is open, but closed list should help in our setup too. hopefully 8221577 will be fixed soon, and openjdk community won't be significantly affected by it. > > -- Igor > >> On Apr 9, 2019, at 2:08 PM, Daniel D. Daugherty wrote: >> >> Just to be clear... I added these to the closed ProblemList-graal.txt >> file because this failure was introduced by a closed push that set >> options for JCK testing... >> >> I probably should have done the review on the closed list... :-( >> >> Dan >> >> >> On 4/9/19 4:55 PM, Igor Ignatyev wrote: >>> Dan, >>> >>> Looks good to me. >>> >>> -- Igor >>> >>>> On Apr 9, 2019, at 1:47 PM, Daniel D. Daugherty wrote: >>>> >>>> Greetings, >>>> >>>> I have a trivial fix for the following bug: >>>> >>>> JDK-8222229 ProblemList compiler/jsr292/InvokerSignatureMismatch.java >>>> https://bugs.openjdk.java.net/browse/JDK-8222229 >>>> >>>> in order to reduce the noise in the JDK13 CI... >>>> >>>> Here's the context diff: >>>> >>>> $ hg diff >>>> diff -r dc21be24a8ff closed/test/hotspot/jtreg/ProblemList-graal.txt >>>> --- a/closed/test/hotspot/jtreg/ProblemList-graal.txt Tue Apr 09 12:13:01 2019 -0700 >>>> +++ b/closed/test/hotspot/jtreg/ProblemList-graal.txt Tue Apr 09 16:39:45 2019 -0400 >>>> @@ -17,4 +17,6 @@ >>>> >>>> compiler/deoptimization/DeoptReturn.java 8199484 generic-all >>>> >>>> +compiler/jsr292/InvokerSignatureMismatch.java 8221577 generic-all >>>> + >>>> applications/microbenchmarks/other/Test.java#id191 8221585 generic-all >>>> >>>> This will reduce the noise in JDK13 CI tier3 until: >>>> >>>> JDK-8221577 [Graal] compiler/jsr292/InvokerSignatureMismatch.java fails with BytecodeParserError >>>> https://bugs.openjdk.java.net/browse/JDK-8221577 >>>> >>>> is fixed. >>>> >>>> Thanks, in advance, for any questions, comments or suggestions. >>>> >>>> Dan > From vladimir.x.ivanov at oracle.com Tue Apr 9 22:30:07 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 9 Apr 2019 15:30:07 -0700 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <5b1b85a0-eb32-dc17-c1cb-b6fa5391b817@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <5b1b85a0-eb32-dc17-c1cb-b6fa5391b817@oracle.com> Message-ID: <1b906bea-a8fd-23db-86be-f2ab6e9cae8c@oracle.com> Sandhya, The testing is still running, but I see some new failures. In particular: ====================================================================== compiler/c2/5049410/Test5049410.java # assert(false) failed: bad AD file # Internal Error (.../src/hotspot/share/opto/matcher.cpp:1587) # assert(false) failed: bad AD file ----------System.err:(70/3923)---------- stdout: [o599 RShiftVB === _ o595 o59 [[o601 ]] #vectors[4]:{byte} --N: o599 RShiftVB === _ o595 o59 [[o601 ]] #vectors[4]:{byte} --N: o595 LoadVector === o223 o470 o464 |o29 [[o608 o606 o604 o602 o600 o598 o596 o599 ]] @byte[int:>=0]:NotNull:exact+any *, idx=5; mismatched #vectors[4]:{byte} VECS 0 VECS LEGVECS 0 LEGVECS --N: o59 ConI === o0 [[o611 o599 o251 o285 o292 o293 o252 o244 ]] #int:1 IMMI 10 IMMI IMMI1 0 IMMI1 IMMI2 0 IMMI2 IMMI8 5 IMMI8 IMMU8 5 IMMU8 IMMI16 10 IMMI16 IMMU31 0 IMMU31 RREGI 100 loadConI RAX_REGI 100 loadConI RBX_REGI 100 loadConI RCX_REGI 100 loadConI RDX_REGI 100 loadConI RDI_REGI 100 loadConI NO_RCX_REGI 100 loadConI NO_RAX_RDX_REGI 100 loadConI STACKSLOTI 200 storeSSI ====================================================================== java/lang/Math/HypotTests.java ----------System.err:(1574/80451)---------- Failure for Math.hypot: For inputs -3.0000400027E10 (-0x1.bf0a71a6cp34) and -4.00002E10 (-0x1.2a0653a8p35) expected 5.0000399999E10 (0x1.748831cfep35) got 5.000040001660008E10 (0x1.748831d21333ep35); difference greater than ulp tolerance 1.0 Failure for Math.hypot: For inputs -4.00002E10 (-0x1.2a0653a8p35) and -3.0000400027E10 (-0x1.bf0a71a6cp34) expected 5.0000399999E10 (0x1.748831cfep35) got 5.000040001660008E10 (0x1.748831d21333ep35); difference greater than ulp tolerance 1.0 Failure for StrictMath.hypot: For inputs -3.0000400027E10 (-0x1.bf0a71a6cp34) and -4.00002E10 (-0x1.2a0653a8p35) expected 5.0000399999E10 (0x1.748831cfep35) got 5.000040001660008E10 (0x1.748831d21333ep35); difference greater than ulp tolerance 1.0 ====================================================================== java/math/BigDecimal/DivideMcTests.java ----------System.err:(215/24574)---------- Unexpected result from 3.61167296280301E+38 / 224198292018431; expected 1.6109279559123938E+24 got 1.6109279534086685E+24 Unexpected result from 3.61167296280301E+38 / 9.87673128759528E+18; expected 3.6567492398412267E+19 got 3.6567492393890272E+19 Unexpected result from 3.61167296280301E+38 / 27062777463.0281; expected 1.3345536938095539E+28 got 1.3345536926616795E+28 Unexpected result from 3.61167296280301E+38 / 9.83114227763768E+22; expected 3673706331174018.0 got 3673706328176239.7 ... ====================================================================== java/lang/Double/ParseDouble.java ----------System.err:(15/891)---------- java.lang.RuntimeException: Double.parseDouble failed. String:0x.100p1 Result:0.125 at ParseDouble.fail(ParseDouble.java:39) at ParseDouble.check(ParseDouble.java:116) at ParseDouble.testParsing(ParseDouble.java:564) at ParseDouble.main(ParseDouble.java:767) ====================================================================== java/lang/Float/ParseFloat.java ----------System.err:(15/906)---------- java.lang.RuntimeException: Float.parseFloat failed. String:0.249999992549419402915276829 Result:0.24999999 at ParseFloat.fail(ParseFloat.java:38) at ParseFloat.check(ParseFloat.java:111) at ParseFloat.testPowers(ParseFloat.java:314) at ParseFloat.main(ParseFloat.java:328) ====================================================================== java/lang/Math/DivModTests.java ----------System.out:(1/89)---------- FAIL: Long.floorMod(-9223372036854775807, 3) = 2 is different than BigDecimal result: 0 ----------System.err:(12/729)---------- java.lang.RuntimeException: 1 errors found in DivMod methods. at DivModTests.main(DivModTests.java:49) ====================================================================== java/math/BigDecimal/DivideTests.java ----------System.err:(23/1552)---------- 16 / 3125 threw an exception. java.lang.ArithmeticException: Non-terminating decimal expansion; no exact representable decimal result. at java.base/java.math.BigDecimal.divide(BigDecimal.java:1722) at DivideTests.powersOf2and5(DivideTests.java:157) at DivideTests.main(DivideTests.java:415) ====================================================================== java/math/BigDecimal/RangeTests.java ----------System.out:(8/942)---------- Sum:79228162514264337593543950335 + 4611686018427387903 == 79228162518876023607676370944; expected 79228162518876023611971338238 Sum:4611686018427387903 + 79228162514264337593543950335 == 79228162518876023607676370944; expected 79228162518876023611971338238 Sum:-79228162514264337593543950335 + 4611686018427387903 == -79228162509652651579411529726; expected -79228162509652651575116562432 Sum:4611686018427387903 + -79228162514264337593543950335 == -79228162509652651579411529726; expected -79228162509652651575116562432 Sum:-9223372036854775808 + 4611686018427387903 == -4611686022722355199; expected -4611686018427387905 Sum:4611686018427387903 + -9223372036854775808 == -4611686022722355199; expected -4611686018427387905 Sum:9223372036854775808 + -4611686018427387903 == 4611686018427387903; expected 4611686018427387905 Sum:-4611686018427387903 + 9223372036854775808 == 4611686018427387903; expected 4611686018427387905 ----------System.err:(12/729)---------- java.lang.RuntimeException: Incurred 8 failures while testing. at RangeTests.main(RangeTests.java:238) ====================================================================== java/math/BigDecimal/StringConstructor.java ----------System.out:(1/48)---------- Seed from RandomFactory = -6241457317850693905L ----------System.err:(14/795)---------- bd string: scale: 1 -653335158894.7 bd doppel: scale: 1 -653335159489.9 java.lang.RuntimeException: String constructor failure. ====================================================================== java/util/Timer/DelayOverflow.java ----------System.err:(55/3697)---------- 8893 not equal to 1 java.lang.Exception: Stack trace at java.base/java.lang.Thread.dumpStack(Thread.java:1384) at DelayOverflow.fail(DelayOverflow.java:100) at DelayOverflow.fail(DelayOverflow.java:101) at DelayOverflow.equal(DelayOverflow.java:106) at DelayOverflow.test(DelayOverflow.java:87) at DelayOverflow.instanceMain(DelayOverflow.java:113) ... Best regards, Vladimir Ivanov On 08/04/2019 13:37, Vladimir Ivanov wrote: > I tried to submit it for testing and spotted a build failure w/ clang: > > .../src/hotspot/cpu/x86/x86.ad:1515:26: error: '&&' within '||' > [-Werror,-Wlogical-op-parentheses] > ???????????? (vlen == 64) && (VM_Version::supports_avx512bw() == false)) > ???????????? ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > .../src/hotspot/cpu/x86/x86.ad:1515:26: note: place parentheses around > the '&&' expression to silence this warning > ???????????? (vlen == 64) && (VM_Version::supports_avx512bw() == false)) > ????????????????????????? ^ > ???????????? (???????????????????????????????????????????????????????? ) > > I'll let you know how the testing is going. > > Best regards, > Vladimir Ivanov > > On 05/04/2019 18:18, Viswanathan, Sandhya wrote: >> Please find below a link to the webrev which enhances super-word auto >> vectorization for x86. >> >> The following additional operations are supported: >> >> 1)Absolute for all data types >> >> 2)Shifts for byte data types >> >> 3)Shift right arithmetic for long data type >> >> 4)Byte multiply >> >> 5)Negate for float/double >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 >> >> Webrev: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.00/ >> >> The compiler jtreg tests pass with UseAVX=0,1,2,3 and KNL. >> >> Your review and comments are welcome. >> >> Best Regards, >> >> Sandhya >> From daniel.daugherty at oracle.com Tue Apr 9 22:35:43 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 9 Apr 2019 18:35:43 -0400 Subject: RFR(T): 8222229: ProblemList compiler/jsr292/InvokerSignatureMismatch.java In-Reply-To: <76b49226-8d96-9fe2-d918-f7103d6b5647@oracle.com> References: <98636a07-3b50-f0c3-21a7-546b0205a4f6@oracle.com> <63176C1D-0815-4533-A20A-A916F291A02F@oracle.com> <29d44d0d-5ddd-dcd0-7db5-5f6e1ac02cda@oracle.com> <9CC6868D-B5DC-4590-A4B3-6E5751D61BA4@oracle.com> <76b49226-8d96-9fe2-d918-f7103d6b5647@oracle.com> Message-ID: <86c6f4b3-5c2d-578e-bb86-b9f8567f9183@oracle.com> Igor, I used "hg backout" and the following bug ID: ??? JDK-8222235 backout compiler/jsr292/InvokerSignatureMismatch.java entry from closed ProblemList-graal.txt ??? https://bugs.openjdk.java.net/browse/JDK-8222235 to backout the errant changeset from closed. Dan On 4/9/19 6:28 PM, Daniel D. Daugherty wrote: > It needs to be on the open ProblemList-graal.txt so I've copied the > same entry there. I'll need to backout the entry on the closed > ProblemList-graal.txt... Sigh... > > Dan > > > On 4/9/19 6:03 PM, Igor Ignatyev wrote: >> oh, I actually expected it to be in open ProblemList b/c the test is >> open, but closed list should help in our setup too. hopefully 8221577 >> will be fixed soon, and openjdk community won't be significantly >> affected by it. >> >> -- Igor >> >>> On Apr 9, 2019, at 2:08 PM, Daniel D. Daugherty >>> wrote: >>> >>> Just to be clear... I added these to the closed ProblemList-graal.txt >>> file because this failure was introduced by a closed push that set >>> options for JCK testing... >>> >>> I probably should have done the review on the closed list... :-( >>> >>> Dan >>> >>> >>> On 4/9/19 4:55 PM, Igor Ignatyev wrote: >>>> Dan, >>>> >>>> Looks good to me. >>>> >>>> -- Igor >>>> >>>>> On Apr 9, 2019, at 1:47 PM, Daniel D. Daugherty >>>>> wrote: >>>>> >>>>> Greetings, >>>>> >>>>> I have a trivial fix for the following bug: >>>>> >>>>> ???? JDK-8222229 ProblemList >>>>> compiler/jsr292/InvokerSignatureMismatch.java >>>>> ???? https://bugs.openjdk.java.net/browse/JDK-8222229 >>>>> >>>>> in order to reduce the noise in the JDK13 CI... >>>>> >>>>> Here's the context diff: >>>>> >>>>> $ hg diff >>>>> diff -r dc21be24a8ff closed/test/hotspot/jtreg/ProblemList-graal.txt >>>>> --- a/closed/test/hotspot/jtreg/ProblemList-graal.txt Tue Apr 09 >>>>> 12:13:01 2019 -0700 >>>>> +++ b/closed/test/hotspot/jtreg/ProblemList-graal.txt Tue Apr 09 >>>>> 16:39:45 2019 -0400 >>>>> @@ -17,4 +17,6 @@ >>>>> >>>>> ? compiler/deoptimization/DeoptReturn.java 8199484 generic-all >>>>> >>>>> +compiler/jsr292/InvokerSignatureMismatch.java 8221577 generic-all >>>>> + >>>>> ? applications/microbenchmarks/other/Test.java#id191 8221585 >>>>> generic-all >>>>> >>>>> This will reduce the noise in JDK13 CI tier3 until: >>>>> >>>>> ???? JDK-8221577 [Graal] >>>>> compiler/jsr292/InvokerSignatureMismatch.java fails with >>>>> BytecodeParserError >>>>> ???? https://bugs.openjdk.java.net/browse/JDK-8221577 >>>>> >>>>> is fixed. >>>>> >>>>> Thanks, in advance, for any questions, comments or suggestions. >>>>> >>>>> Dan >> > From sandhya.viswanathan at intel.com Tue Apr 9 23:33:52 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 9 Apr 2019 23:33:52 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <9bcc9040-e11a-eb72-9f32-38c381e709cf@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <2317EEB0-C081-447C-BA29-40E69ACBC66B@oracle.com> <9bcc9040-e11a-eb72-9f32-38c381e709cf@oracle.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB829@FMSMSX126.amr.corp.intel.com> The current sizes are as follows: libjvm.so : 24358297 libjvm.so + 8222074 : 24718043 (1.5%) libjvm.so (panama vapi ): 28222649 (16%) This patch (8222074) adds about 1.5% to the overall size of libjvm.so. Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Vladimir Ivanov Sent: Tuesday, April 09, 2019 3:03 PM To: John Rose ; Vladimir Kozlov Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 On 09/04/2019 12:53, John Rose wrote: > I agree that the AD file combinatorics need to be tamed more > aggressively in this way. > > I have to wonder if this could be done during incubation, as a cleanup > of tech. debt, so we can get experience with the API at the same time. I believe it depends on how severe static footprint increase will be. Last time I checked [1], Vector API-related changes (w/o SVML stubs) contributed ~3Mb/15% to libjvm.so on Linux. It would be very helpful to have more detailed and up-to-date information on that. Best regards, Vladimir Ivanov [1] http://mail.openjdk.java.net/pipermail/panama-dev/2018-October/002992.html default branch: 22384560 vectorIntrinsic -svml: 25635648 > On Apr 9, 2019, at 12:33 PM, Vladimir Kozlov wrote: >> >> 4. Most important. The main reason we have a lot AD instructions is to 'match' different vector types for corresponding different vector length. I think we should revisit this approach. >> >> Intel CPU does not use parts of vector registers separately - C2 does not use XMM0b, XMM0c, XMM0d parts of xmm0. Even when C2 uses VecS type it use whole zmm register in avx512 but narrowed it by passing length to assembler instruction (or we use an instruction which uses only part of 512 bit register). >> >> Vladimir I. suggested to have VecMAX type which can be used to match all different vector length implementation to have only one AD instruction. And use vector length to generate corresponding code. For example, vabs8B_reg() and vabs16B_reg() are almost the same except vectors type VecD vs VecX. There should be no difference in code generation (we need to modify vec_mov_helper() and other similar code to check vector length when it see VecMAX). >> >> We can use this approach for already existing instructions too to reduce code size generated from AD files. > From sandhya.viswanathan at intel.com Tue Apr 9 23:59:16 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 9 Apr 2019 23:59:16 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> Hi Vladimir, Please see my answers in your email below. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, April 09, 2019 12:34 PM To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 Hi Sandhya, I looked through changes and had discussion with Vladimir Ivanov about them. In general logic of changes follow out usual pattern - no problem here. There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? But my main concern now is JVM size significant grow. And it will be worse when we implementing the rest of Vector API instructions. The main reason for size grow is additional AD instructions which are compiled by APLC into multiply functions. I originally thought that moving 'ins_encode %{ %}' code which have several instructions into macro-assembler will help if it is used by several AD instructions. But Vladimir I. convinced me that it will be insignificant comparing to reducing number of AD instructions. We came up with several suggestion how we can address it and it will greatly help if you (Intel) investigate them. 1. I think you should provide JVM size increase data for changes like this. What is increase for this one? Sandhya >> The size increase is about 1.5%: libjvm.so : 24358297 libjvm.so + 8222074 : 24718043 (1.5%) 2. How is important for Intel to support new vector instructions for CPU without AVX? May be we should stop new code for old CPUs such as vsll16B_reg, for exaxmple. It does not mean we can't use SSE instructions in implementation (for example, vabs8B_reg) - such cases are fine. Sandhya >> Definitely AVX is higher priority, we can try to merge as many rules as possible on similar lines as vabs8B_reg. 3. I still want to see some common instructions pattern in 'ins_encode %{ %}' be moved into macro-assembler. For example, the only difference between vs*_reg and vs*_reg_imm is one or 2 instructions, the rest is the same. Sandhya >> This is due to the peculiarity of Shift count handling in the superword, only when the count is not immediate it goes through RShiftCntV/LShiftCntV. It should be possible to merge these rules with some tweaks to superword.cpp. Since this was common part of the code, I didn?t want to change so as to minimize effect on other architecture. I can take a closer look and clean that up. 4. Most important. The main reason we have a lot AD instructions is to 'match' different vector types for corresponding different vector length. I think we should revisit this approach. Intel CPU does not use parts of vector registers separately - C2 does not use XMM0b, XMM0c, XMM0d parts of xmm0. Even when C2 uses VecS type it use whole zmm register in avx512 but narrowed it by passing length to assembler instruction (or we use an instruction which uses only part of 512 bit register). Vladimir I. suggested to have VecMAX type which can be used to match all different vector length implementation to have only one AD instruction. And use vector length to generate corresponding code. For example, vabs8B_reg() and vabs16B_reg() are almost the same except vectors type VecD vs VecX. There should be no difference in code generation (we need to modify vec_mov_helper() and other similar code to check vector length when it see VecMAX). We can use this approach for already existing instructions too to reduce code size generated from AD files. What do you think? Sandhya >> Using vecMAX will lead to spill/fill code using the largest vector width which is not recommended on Intel architecture. The better way to club or merge then would be as John suggested with binary/unary op for same register type. We will need to take into account the temporaries needed etc while clubbing the rules so as to not degrade the generated code unnecessarily. Some questions that will help me is how much redesign we want? Only for new code? Or also the existing code? Also is the libjvm size our criteria or ad file size or both? I will need a lot of support from you and Vlaidmir Ivanov if we need to get this redesign through in reviews and sponsorship over the coming month or so. I will explore a small protoype and submit that first and we could then go from there. Regards, Vladimir K On 4/9/19 10:18 AM, Viswanathan, Sandhya wrote: > Hi Yang, > > Thanks a lot for trying out the patch in your setup. > > Please do let me know when you check the details if you find the failure in DivideMvTests.java to be due to this patch. > > I will fix all the trailing space and unaligned line style issues that you pointed out. > > The TestInt is updated to cover for some additional support added for "int" in this patch like Absolute and subtraction from zero. > There is an additional test for Not for which we plan to add support in a follow up patch. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Yang Zhang (Arm Technology China) [mailto:Yang.Zhang at arm.com] > Sent: Tuesday, April 09, 2019 1:04 AM > To: Viswanathan, Sandhya ; > hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 > > Hi Sandhya > > Thanks for proposing this enhancement. > I have tested this patch in our internal ci. There is a new failure. But I didn't check the details. > java/math/BigDecimal/DivideMcTests.java > > In addition, there are trailing spaces in the following files. > src/hotspot/cpu/x86/assembler_x86.cpp > src/hotspot/cpu/x86/stubGenerator_x86_32.cpp > src/hotspot/cpu/x86/x86.ad > src/hotspot/cpu/x86/x86_32.ad > src/hotspot/share/opto/superword.cpp > > In file src/hotspot/share/classfile/vmSymbols.hpp, there are some unaligned lines. > In file test/hotspot/jtreg/compiler/c2/cr6340864/TestIntVect.java, there are new test functions. Are these new functions needed by byte/short/long? > > Regards, > Yang > > > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of > Viswanathan, Sandhya > Sent: Saturday, April 6, 2019 9:18 AM > To: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov > > Subject: RFR (M) 8222074: Enhance auto vectorization for x86 > > > Please find below a link to the webrev which enhances super-word auto vectorization for x86. > The following additional operations are supported: > > 1) Absolute for all data types > > 2) Shifts for byte data types > > 3) Shift right arithmetic for long data type > > 4) Byte multiply > > 5) Negate for float/double > > JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 > Webrev: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.00/ > > The compiler jtreg tests pass with UseAVX=0,1,2,3 and KNL. > Your review and comments are welcome. > > Best Regards, > Sandhya > > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. > From sandhya.viswanathan at intel.com Wed Apr 10 00:46:18 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 10 Apr 2019 00:46:18 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB87A@FMSMSX126.amr.corp.intel.com> Hi VladimirK, Another thing to think of is do we really need a drastic redesign or a through cleanup would give us what we are looking for? E.g. this patch only increased the libjvm size by 1.5% which can be further reduced by appropriate shiftcnt handling and reducing the SSE only rules. Also there is lot of common code, which can be modularized as an assembler method and invoked with appropriate parameters from the ad file. Some of this can be done for other existing rules as well. We could keep today's libjvm size as baseline and have some budget for auto vectorizer enahancements and rest of the vector api code to come in. Do let me know what you think and I will proceed appropriately. Best Regards, Sandhya -----Original Message----- From: Viswanathan, Sandhya Sent: Tuesday, April 09, 2019 4:59 PM To: 'Vladimir Kozlov' ; hotspot-compiler-dev at openjdk.java.net Cc: 'Vladimir Ivanov' ; John Rose Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 Hi Vladimir, Please see my answers in your email below. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, April 09, 2019 12:34 PM To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 Hi Sandhya, I looked through changes and had discussion with Vladimir Ivanov about them. In general logic of changes follow out usual pattern - no problem here. There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? But my main concern now is JVM size significant grow. And it will be worse when we implementing the rest of Vector API instructions. The main reason for size grow is additional AD instructions which are compiled by APLC into multiply functions. I originally thought that moving 'ins_encode %{ %}' code which have several instructions into macro-assembler will help if it is used by several AD instructions. But Vladimir I. convinced me that it will be insignificant comparing to reducing number of AD instructions. We came up with several suggestion how we can address it and it will greatly help if you (Intel) investigate them. 1. I think you should provide JVM size increase data for changes like this. What is increase for this one? Sandhya >> The size increase is about 1.5%: libjvm.so : 24358297 libjvm.so + 8222074 : 24718043 (1.5%) 2. How is important for Intel to support new vector instructions for CPU without AVX? May be we should stop new code for old CPUs such as vsll16B_reg, for exaxmple. It does not mean we can't use SSE instructions in implementation (for example, vabs8B_reg) - such cases are fine. Sandhya >> Definitely AVX is higher priority, we can try to merge as many rules as possible on similar lines as vabs8B_reg. 3. I still want to see some common instructions pattern in 'ins_encode %{ %}' be moved into macro-assembler. For example, the only difference between vs*_reg and vs*_reg_imm is one or 2 instructions, the rest is the same. Sandhya >> This is due to the peculiarity of Shift count handling in the superword, only when the count is not immediate it goes through RShiftCntV/LShiftCntV. It should be possible to merge these rules with some tweaks to superword.cpp. Since this was common part of the code, I didn?t want to change so as to minimize effect on other architecture. I can take a closer look and clean that up. 4. Most important. The main reason we have a lot AD instructions is to 'match' different vector types for corresponding different vector length. I think we should revisit this approach. Intel CPU does not use parts of vector registers separately - C2 does not use XMM0b, XMM0c, XMM0d parts of xmm0. Even when C2 uses VecS type it use whole zmm register in avx512 but narrowed it by passing length to assembler instruction (or we use an instruction which uses only part of 512 bit register). Vladimir I. suggested to have VecMAX type which can be used to match all different vector length implementation to have only one AD instruction. And use vector length to generate corresponding code. For example, vabs8B_reg() and vabs16B_reg() are almost the same except vectors type VecD vs VecX. There should be no difference in code generation (we need to modify vec_mov_helper() and other similar code to check vector length when it see VecMAX). We can use this approach for already existing instructions too to reduce code size generated from AD files. What do you think? Sandhya >> Using vecMAX will lead to spill/fill code using the largest vector width which is not recommended on Intel architecture. The better way to club or merge then would be as John suggested with binary/unary op for same register type. We will need to take into account the temporaries needed etc while clubbing the rules so as to not degrade the generated code unnecessarily. Some questions that will help me is how much redesign we want? Only for new code? Or also the existing code? Also is the libjvm size our criteria or ad file size or both? I will need a lot of support from you and Vlaidmir Ivanov if we need to get this redesign through in reviews and sponsorship over the coming month or so. I will explore a small protoype and submit that first and we could then go from there. Regards, Vladimir K On 4/9/19 10:18 AM, Viswanathan, Sandhya wrote: > Hi Yang, > > Thanks a lot for trying out the patch in your setup. > > Please do let me know when you check the details if you find the failure in DivideMvTests.java to be due to this patch. > > I will fix all the trailing space and unaligned line style issues that you pointed out. > > The TestInt is updated to cover for some additional support added for "int" in this patch like Absolute and subtraction from zero. > There is an additional test for Not for which we plan to add support in a follow up patch. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Yang Zhang (Arm Technology China) [mailto:Yang.Zhang at arm.com] > Sent: Tuesday, April 09, 2019 1:04 AM > To: Viswanathan, Sandhya ; > hotspot-compiler-dev at openjdk.java.net > Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 > > Hi Sandhya > > Thanks for proposing this enhancement. > I have tested this patch in our internal ci. There is a new failure. But I didn't check the details. > java/math/BigDecimal/DivideMcTests.java > > In addition, there are trailing spaces in the following files. > src/hotspot/cpu/x86/assembler_x86.cpp > src/hotspot/cpu/x86/stubGenerator_x86_32.cpp > src/hotspot/cpu/x86/x86.ad > src/hotspot/cpu/x86/x86_32.ad > src/hotspot/share/opto/superword.cpp > > In file src/hotspot/share/classfile/vmSymbols.hpp, there are some unaligned lines. > In file test/hotspot/jtreg/compiler/c2/cr6340864/TestIntVect.java, there are new test functions. Are these new functions needed by byte/short/long? > > Regards, > Yang > > > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of > Viswanathan, Sandhya > Sent: Saturday, April 6, 2019 9:18 AM > To: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov > > Subject: RFR (M) 8222074: Enhance auto vectorization for x86 > > > Please find below a link to the webrev which enhances super-word auto vectorization for x86. > The following additional operations are supported: > > 1) Absolute for all data types > > 2) Shifts for byte data types > > 3) Shift right arithmetic for long data type > > 4) Byte multiply > > 5) Negate for float/double > > JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 > Webrev: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.00/ > > The compiler jtreg tests pass with UseAVX=0,1,2,3 and KNL. > Your review and comments are welcome. > > Best Regards, > Sandhya > > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. > From vladimir.kozlov at oracle.com Wed Apr 10 00:57:37 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 9 Apr 2019 17:57:37 -0700 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> Message-ID: <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> On 4/9/19 4:59 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Please see my answers in your email below. My comments below too. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 09, 2019 12:34 PM > To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > Hi Sandhya, > > I looked through changes and had discussion with Vladimir Ivanov about them. > In general logic of changes follow out usual pattern - no problem here. > > There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? What about this question? > > But my main concern now is JVM size significant grow. And it will be worse when we implementing the rest of Vector API instructions. > > The main reason for size grow is additional AD instructions which are compiled by APLC into multiply functions. I originally thought that moving 'ins_encode %{ %}' code which have several instructions into macro-assembler will help if it is used by several AD instructions. But Vladimir I. convinced me that it will be insignificant comparing to reducing number of AD instructions. > > We came up with several suggestion how we can address it and it will greatly help if you (Intel) investigate them. > > 1. I think you should provide JVM size increase data for changes like this. What is increase for this one? > > Sandhya >> The size increase is about 1.5%: > libjvm.so : 24358297 > libjvm.so + 8222074 : 24718043 (1.5%) I think it is not a little - you implemented only 4 operations. May be we should follow Intel's rule: you can add new instructions only if you reduce power consumption (size in our case). ;-) > > 2. How is important for Intel to support new vector instructions for CPU without AVX? May be we should stop new code for old CPUs such as vsll16B_reg, for exaxmple. It does not mean we can't use SSE instructions in implementation (for example, vabs8B_reg) - such cases are fine. > > Sandhya >> Definitely AVX is higher priority, we can try to merge as many rules as possible on similar lines as vabs8B_reg. Okay. > > 3. I still want to see some common instructions pattern in 'ins_encode %{ %}' be moved into macro-assembler. For example, the only difference between vs*_reg and vs*_reg_imm is one or 2 instructions, the rest is the same. > > Sandhya >> This is due to the peculiarity of Shift count handling in the superword, only when the count is not immediate it goes through RShiftCntV/LShiftCntV. It should be possible to merge these rules with some tweaks to superword.cpp. Since this was common part of the code, I didn?t want to change so as to minimize effect on other architecture. I can take a closer look and clean that up. Okay. > > 4. Most important. The main reason we have a lot AD instructions is to 'match' different vector types for corresponding different vector length. I think we should revisit this approach. > > Intel CPU does not use parts of vector registers separately - C2 does not use XMM0b, XMM0c, XMM0d parts of xmm0. Even when C2 uses VecS type it use whole zmm register in avx512 but narrowed it by passing length to assembler instruction (or we use an instruction which uses only part of 512 bit register). > > Vladimir I. suggested to have VecMAX type which can be used to match all different vector length implementation to have only one AD instruction. And use vector length to generate corresponding code. For example, vabs8B_reg() and vabs16B_reg() are almost the same except vectors type VecD vs VecX. There should be no difference in code generation (we need to modify vec_mov_helper() and other similar code to check vector length when it see VecMAX). > > We can use this approach for already existing instructions too to reduce code size generated from AD files. > > What do you think? > > Sandhya >> Using vecMAX will lead to spill/fill code using the largest vector width which is not recommended on Intel That is why I added comment about vec_mov_helper(). This function is used for generating spills. You definitely should not save whole 512 bits register but only part corresponding to byte size of vector. Note, when I talked about vector length I meant length_in_bytes(). architecture. The better way to club or merge then would be as John suggested with binary/unary op for same register type. We will need to take into account the temporaries needed etc while clubbing the rules so as to not degrade the generated code I am not sure this approach is simpler in short term but I can't say what would be better in long run. This is needed to explore. unnecessarily. Some questions that will help me is how much redesign we want? Only for new code? Or also the existing code? Also In long term we should update all existing code too. is the libjvm size our criteria or ad file size or both? I will need a lot of support from you and Vlaidmir Ivanov if we need to The main goal is reduce libjvm size. But we should keep other platforms in mind. get this redesign through in reviews and sponsorship over the coming month or so. I will explore a small protoype and submit that first and we could then go from there. Thanks, Vladimir > > Regards, > Vladimir K > > On 4/9/19 10:18 AM, Viswanathan, Sandhya wrote: >> Hi Yang, >> >> Thanks a lot for trying out the patch in your setup. >> >> Please do let me know when you check the details if you find the failure in DivideMvTests.java to be due to this patch. >> >> I will fix all the trailing space and unaligned line style issues that you pointed out. >> >> The TestInt is updated to cover for some additional support added for "int" in this patch like Absolute and subtraction from zero. >> There is an additional test for Not for which we plan to add support in a follow up patch. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Yang Zhang (Arm Technology China) [mailto:Yang.Zhang at arm.com] >> Sent: Tuesday, April 09, 2019 1:04 AM >> To: Viswanathan, Sandhya ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> Hi Sandhya >> >> Thanks for proposing this enhancement. >> I have tested this patch in our internal ci. There is a new failure. But I didn't check the details. >> java/math/BigDecimal/DivideMcTests.java >> >> In addition, there are trailing spaces in the following files. >> src/hotspot/cpu/x86/assembler_x86.cpp >> src/hotspot/cpu/x86/stubGenerator_x86_32.cpp >> src/hotspot/cpu/x86/x86.ad >> src/hotspot/cpu/x86/x86_32.ad >> src/hotspot/share/opto/superword.cpp >> >> In file src/hotspot/share/classfile/vmSymbols.hpp, there are some unaligned lines. >> In file test/hotspot/jtreg/compiler/c2/cr6340864/TestIntVect.java, there are new test functions. Are these new functions needed by byte/short/long? >> >> Regards, >> Yang >> >> >> -----Original Message----- >> From: hotspot-compiler-dev >> On Behalf Of >> Viswanathan, Sandhya >> Sent: Saturday, April 6, 2019 9:18 AM >> To: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov >> >> Subject: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> >> Please find below a link to the webrev which enhances super-word auto vectorization for x86. >> The following additional operations are supported: >> >> 1) Absolute for all data types >> >> 2) Shifts for byte data types >> >> 3) Shift right arithmetic for long data type >> >> 4) Byte multiply >> >> 5) Negate for float/double >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 >> Webrev: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.00/ >> >> The compiler jtreg tests pass with UseAVX=0,1,2,3 and KNL. >> Your review and comments are welcome. >> >> Best Regards, >> Sandhya >> >> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. >> From vladimir.kozlov at oracle.com Wed Apr 10 01:06:36 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 9 Apr 2019 18:06:36 -0700 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB87A@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB87A@FMSMSX126.amr.corp.intel.com> Message-ID: On 4/9/19 5:46 PM, Viswanathan, Sandhya wrote: > Hi VladimirK, > > Another thing to think of is do we really need a drastic redesign or a through cleanup would give us what we are looking for? E.g. this patch only increased the libjvm size by 1.5% which can be further reduced by appropriate shiftcnt handling and reducing the SSE only rules. Also there is lot of common code, which can be modularized as an assembler method and invoked with appropriate parameters from the ad file. Some of this can be done for other existing rules as well. I can accept the current patch without redesign if you can significantly reduce size increase. But redesign should happened before we introduce whole Vector API changes. Regards, Vladimir > > We could keep today's libjvm size as baseline and have some budget for auto vectorizer enahancements and rest of the vector api code to come in. > > Do let me know what you think and I will proceed appropriately. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Viswanathan, Sandhya > Sent: Tuesday, April 09, 2019 4:59 PM > To: 'Vladimir Kozlov' ; hotspot-compiler-dev at openjdk.java.net > Cc: 'Vladimir Ivanov' ; John Rose > Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 > > Hi Vladimir, > > Please see my answers in your email below. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 09, 2019 12:34 PM > To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > Hi Sandhya, > > I looked through changes and had discussion with Vladimir Ivanov about them. > In general logic of changes follow out usual pattern - no problem here. > > There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? > > But my main concern now is JVM size significant grow. And it will be worse when we implementing the rest of Vector API instructions. > > The main reason for size grow is additional AD instructions which are compiled by APLC into multiply functions. I originally thought that moving 'ins_encode %{ %}' code which have several instructions into macro-assembler will help if it is used by several AD instructions. But Vladimir I. convinced me that it will be insignificant comparing to reducing number of AD instructions. > > We came up with several suggestion how we can address it and it will greatly help if you (Intel) investigate them. > > 1. I think you should provide JVM size increase data for changes like this. What is increase for this one? > > Sandhya >> The size increase is about 1.5%: > libjvm.so : 24358297 > libjvm.so + 8222074 : 24718043 (1.5%) > > 2. How is important for Intel to support new vector instructions for CPU without AVX? May be we should stop new code for old CPUs such as vsll16B_reg, for exaxmple. It does not mean we can't use SSE instructions in implementation (for example, vabs8B_reg) - such cases are fine. > > Sandhya >> Definitely AVX is higher priority, we can try to merge as many rules as possible on similar lines as vabs8B_reg. > > 3. I still want to see some common instructions pattern in 'ins_encode %{ %}' be moved into macro-assembler. For example, the only difference between vs*_reg and vs*_reg_imm is one or 2 instructions, the rest is the same. > > Sandhya >> This is due to the peculiarity of Shift count handling in the superword, only when the count is not immediate it goes through RShiftCntV/LShiftCntV. It should be possible to merge these rules with some tweaks to superword.cpp. Since this was common part of the code, I didn?t want to change so as to minimize effect on other architecture. I can take a closer look and clean that up. > > 4. Most important. The main reason we have a lot AD instructions is to 'match' different vector types for corresponding different vector length. I think we should revisit this approach. > > Intel CPU does not use parts of vector registers separately - C2 does not use XMM0b, XMM0c, XMM0d parts of xmm0. Even when C2 uses VecS type it use whole zmm register in avx512 but narrowed it by passing length to assembler instruction (or we use an instruction which uses only part of 512 bit register). > > Vladimir I. suggested to have VecMAX type which can be used to match all different vector length implementation to have only one AD instruction. And use vector length to generate corresponding code. For example, vabs8B_reg() and vabs16B_reg() are almost the same except vectors type VecD vs VecX. There should be no difference in code generation (we need to modify vec_mov_helper() and other similar code to check vector length when it see VecMAX). > > We can use this approach for already existing instructions too to reduce code size generated from AD files. > > What do you think? > > Sandhya >> Using vecMAX will lead to spill/fill code using the largest vector width which is not recommended on Intel architecture. The better way to club or merge then would be as John suggested with binary/unary op for same register type. We will need to take into account the temporaries needed etc while clubbing the rules so as to not degrade the generated code unnecessarily. Some questions that will help me is how much redesign we want? Only for new code? Or also the existing code? Also is the libjvm size our criteria or ad file size or both? I will need a lot of support from you and Vlaidmir Ivanov if we need to get this redesign through in reviews and sponsorship over the coming month or so. I will explore a small protoype and submit that first and we could then go from there. > > Regards, > Vladimir K > > On 4/9/19 10:18 AM, Viswanathan, Sandhya wrote: >> Hi Yang, >> >> Thanks a lot for trying out the patch in your setup. >> >> Please do let me know when you check the details if you find the failure in DivideMvTests.java to be due to this patch. >> >> I will fix all the trailing space and unaligned line style issues that you pointed out. >> >> The TestInt is updated to cover for some additional support added for "int" in this patch like Absolute and subtraction from zero. >> There is an additional test for Not for which we plan to add support in a follow up patch. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Yang Zhang (Arm Technology China) [mailto:Yang.Zhang at arm.com] >> Sent: Tuesday, April 09, 2019 1:04 AM >> To: Viswanathan, Sandhya ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> Hi Sandhya >> >> Thanks for proposing this enhancement. >> I have tested this patch in our internal ci. There is a new failure. But I didn't check the details. >> java/math/BigDecimal/DivideMcTests.java >> >> In addition, there are trailing spaces in the following files. >> src/hotspot/cpu/x86/assembler_x86.cpp >> src/hotspot/cpu/x86/stubGenerator_x86_32.cpp >> src/hotspot/cpu/x86/x86.ad >> src/hotspot/cpu/x86/x86_32.ad >> src/hotspot/share/opto/superword.cpp >> >> In file src/hotspot/share/classfile/vmSymbols.hpp, there are some unaligned lines. >> In file test/hotspot/jtreg/compiler/c2/cr6340864/TestIntVect.java, there are new test functions. Are these new functions needed by byte/short/long? >> >> Regards, >> Yang >> >> >> -----Original Message----- >> From: hotspot-compiler-dev >> On Behalf Of >> Viswanathan, Sandhya >> Sent: Saturday, April 6, 2019 9:18 AM >> To: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov >> >> Subject: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> >> Please find below a link to the webrev which enhances super-word auto vectorization for x86. >> The following additional operations are supported: >> >> 1) Absolute for all data types >> >> 2) Shifts for byte data types >> >> 3) Shift right arithmetic for long data type >> >> 4) Byte multiply >> >> 5) Negate for float/double >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 >> Webrev: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.00/ >> >> The compiler jtreg tests pass with UseAVX=0,1,2,3 and KNL. >> Your review and comments are welcome. >> >> Best Regards, >> Sandhya >> >> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. >> From sandhya.viswanathan at intel.com Wed Apr 10 01:04:59 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 10 Apr 2019 01:04:59 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> Hi Vladimir, Yes, I missed the question below: >> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? No it is not intentional, we can use the dst register in those cases and reduced the tmps. Thanks a lot for clarifying on vec_mov_helper(), it is much clearer now what you have in mind. But the code generated many times differs with vector size so it may not help in reducing the libjvm code size. I will explore all possible ways to reduce code size increase. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, April 09, 2019 5:58 PM To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 On 4/9/19 4:59 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Please see my answers in your email below. My comments below too. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 09, 2019 12:34 PM > To: Viswanathan, Sandhya ; > hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > Hi Sandhya, > > I looked through changes and had discussion with Vladimir Ivanov about them. > In general logic of changes follow out usual pattern - no problem here. > > There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? What about this question? > > But my main concern now is JVM size significant grow. And it will be worse when we implementing the rest of Vector API instructions. > > The main reason for size grow is additional AD instructions which are compiled by APLC into multiply functions. I originally thought that moving 'ins_encode %{ %}' code which have several instructions into macro-assembler will help if it is used by several AD instructions. But Vladimir I. convinced me that it will be insignificant comparing to reducing number of AD instructions. > > We came up with several suggestion how we can address it and it will greatly help if you (Intel) investigate them. > > 1. I think you should provide JVM size increase data for changes like this. What is increase for this one? > > Sandhya >> The size increase is about 1.5%: > libjvm.so : 24358297 > libjvm.so + 8222074 : 24718043 (1.5%) I think it is not a little - you implemented only 4 operations. May be we should follow Intel's rule: you can add new instructions only if you reduce power consumption (size in our case). ;-) > > 2. How is important for Intel to support new vector instructions for CPU without AVX? May be we should stop new code for old CPUs such as vsll16B_reg, for exaxmple. It does not mean we can't use SSE instructions in implementation (for example, vabs8B_reg) - such cases are fine. > > Sandhya >> Definitely AVX is higher priority, we can try to merge as many rules as possible on similar lines as vabs8B_reg. Okay. > > 3. I still want to see some common instructions pattern in 'ins_encode %{ %}' be moved into macro-assembler. For example, the only difference between vs*_reg and vs*_reg_imm is one or 2 instructions, the rest is the same. > > Sandhya >> This is due to the peculiarity of Shift count handling in the superword, only when the count is not immediate it goes through RShiftCntV/LShiftCntV. It should be possible to merge these rules with some tweaks to superword.cpp. Since this was common part of the code, I didn?t want to change so as to minimize effect on other architecture. I can take a closer look and clean that up. Okay. > > 4. Most important. The main reason we have a lot AD instructions is to 'match' different vector types for corresponding different vector length. I think we should revisit this approach. > > Intel CPU does not use parts of vector registers separately - C2 does not use XMM0b, XMM0c, XMM0d parts of xmm0. Even when C2 uses VecS type it use whole zmm register in avx512 but narrowed it by passing length to assembler instruction (or we use an instruction which uses only part of 512 bit register). > > Vladimir I. suggested to have VecMAX type which can be used to match all different vector length implementation to have only one AD instruction. And use vector length to generate corresponding code. For example, vabs8B_reg() and vabs16B_reg() are almost the same except vectors type VecD vs VecX. There should be no difference in code generation (we need to modify vec_mov_helper() and other similar code to check vector length when it see VecMAX). > > We can use this approach for already existing instructions too to reduce code size generated from AD files. > > What do you think? > > Sandhya >> Using vecMAX will lead to spill/fill code using the largest > vector width which is not recommended on Intel That is why I added comment about vec_mov_helper(). This function is used for generating spills. You definitely should not save whole 512 bits register but only part corresponding to byte size of vector. Note, when I talked about vector length I meant length_in_bytes(). architecture. The better way to club or merge then would be as John suggested with binary/unary op for same register type. We will need to take into account the temporaries needed etc while clubbing the rules so as to not degrade the generated code I am not sure this approach is simpler in short term but I can't say what would be better in long run. This is needed to explore. unnecessarily. Some questions that will help me is how much redesign we want? Only for new code? Or also the existing code? Also In long term we should update all existing code too. is the libjvm size our criteria or ad file size or both? I will need a lot of support from you and Vlaidmir Ivanov if we need to The main goal is reduce libjvm size. But we should keep other platforms in mind. get this redesign through in reviews and sponsorship over the coming month or so. I will explore a small protoype and submit that first and we could then go from there. Thanks, Vladimir > > Regards, > Vladimir K > > On 4/9/19 10:18 AM, Viswanathan, Sandhya wrote: >> Hi Yang, >> >> Thanks a lot for trying out the patch in your setup. >> >> Please do let me know when you check the details if you find the failure in DivideMvTests.java to be due to this patch. >> >> I will fix all the trailing space and unaligned line style issues that you pointed out. >> >> The TestInt is updated to cover for some additional support added for "int" in this patch like Absolute and subtraction from zero. >> There is an additional test for Not for which we plan to add support in a follow up patch. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Yang Zhang (Arm Technology China) [mailto:Yang.Zhang at arm.com] >> Sent: Tuesday, April 09, 2019 1:04 AM >> To: Viswanathan, Sandhya ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> Hi Sandhya >> >> Thanks for proposing this enhancement. >> I have tested this patch in our internal ci. There is a new failure. But I didn't check the details. >> java/math/BigDecimal/DivideMcTests.java >> >> In addition, there are trailing spaces in the following files. >> src/hotspot/cpu/x86/assembler_x86.cpp >> src/hotspot/cpu/x86/stubGenerator_x86_32.cpp >> src/hotspot/cpu/x86/x86.ad >> src/hotspot/cpu/x86/x86_32.ad >> src/hotspot/share/opto/superword.cpp >> >> In file src/hotspot/share/classfile/vmSymbols.hpp, there are some unaligned lines. >> In file test/hotspot/jtreg/compiler/c2/cr6340864/TestIntVect.java, there are new test functions. Are these new functions needed by byte/short/long? >> >> Regards, >> Yang >> >> >> -----Original Message----- >> From: hotspot-compiler-dev >> On Behalf Of >> Viswanathan, Sandhya >> Sent: Saturday, April 6, 2019 9:18 AM >> To: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov >> >> Subject: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> >> Please find below a link to the webrev which enhances super-word auto vectorization for x86. >> The following additional operations are supported: >> >> 1) Absolute for all data types >> >> 2) Shifts for byte data types >> >> 3) Shift right arithmetic for long data type >> >> 4) Byte multiply >> >> 5) Negate for float/double >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 >> Webrev: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.00/ >> >> The compiler jtreg tests pass with UseAVX=0,1,2,3 and KNL. >> Your review and comments are welcome. >> >> Best Regards, >> Sandhya >> >> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. >> From sandhya.viswanathan at intel.com Wed Apr 10 01:11:37 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 10 Apr 2019 01:11:37 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB87A@FMSMSX126.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB8AB@FMSMSX126.amr.corp.intel.com> Hi Vladimir, >> I can accept the current patch without redesign if you can significantly reduce size increase. Thanks a lot. >> But redesign should happened before we introduce whole Vector API changes. Point well taken on whole Vector API changes. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Tuesday, April 09, 2019 6:07 PM To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 On 4/9/19 5:46 PM, Viswanathan, Sandhya wrote: > Hi VladimirK, > > Another thing to think of is do we really need a drastic redesign or a through cleanup would give us what we are looking for? E.g. this patch only increased the libjvm size by 1.5% which can be further reduced by appropriate shiftcnt handling and reducing the SSE only rules. Also there is lot of common code, which can be modularized as an assembler method and invoked with appropriate parameters from the ad file. Some of this can be done for other existing rules as well. I can accept the current patch without redesign if you can significantly reduce size increase. But redesign should happened before we introduce whole Vector API changes. Regards, Vladimir > > We could keep today's libjvm size as baseline and have some budget for auto vectorizer enahancements and rest of the vector api code to come in. > > Do let me know what you think and I will proceed appropriately. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Viswanathan, Sandhya > Sent: Tuesday, April 09, 2019 4:59 PM > To: 'Vladimir Kozlov' ; hotspot-compiler-dev at openjdk.java.net > Cc: 'Vladimir Ivanov' ; John Rose > Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 > > Hi Vladimir, > > Please see my answers in your email below. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 09, 2019 12:34 PM > To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > Hi Sandhya, > > I looked through changes and had discussion with Vladimir Ivanov about them. > In general logic of changes follow out usual pattern - no problem here. > > There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? > > But my main concern now is JVM size significant grow. And it will be worse when we implementing the rest of Vector API instructions. > > The main reason for size grow is additional AD instructions which are compiled by APLC into multiply functions. I originally thought that moving 'ins_encode %{ %}' code which have several instructions into macro-assembler will help if it is used by several AD instructions. But Vladimir I. convinced me that it will be insignificant comparing to reducing number of AD instructions. > > We came up with several suggestion how we can address it and it will greatly help if you (Intel) investigate them. > > 1. I think you should provide JVM size increase data for changes like this. What is increase for this one? > > Sandhya >> The size increase is about 1.5%: > libjvm.so : 24358297 > libjvm.so + 8222074 : 24718043 (1.5%) > > 2. How is important for Intel to support new vector instructions for CPU without AVX? May be we should stop new code for old CPUs such as vsll16B_reg, for exaxmple. It does not mean we can't use SSE instructions in implementation (for example, vabs8B_reg) - such cases are fine. > > Sandhya >> Definitely AVX is higher priority, we can try to merge as many rules as possible on similar lines as vabs8B_reg. > > 3. I still want to see some common instructions pattern in 'ins_encode %{ %}' be moved into macro-assembler. For example, the only difference between vs*_reg and vs*_reg_imm is one or 2 instructions, the rest is the same. > > Sandhya >> This is due to the peculiarity of Shift count handling in the superword, only when the count is not immediate it goes through RShiftCntV/LShiftCntV. It should be possible to merge these rules with some tweaks to superword.cpp. Since this was common part of the code, I didn?t want to change so as to minimize effect on other architecture. I can take a closer look and clean that up. > > 4. Most important. The main reason we have a lot AD instructions is to 'match' different vector types for corresponding different vector length. I think we should revisit this approach. > > Intel CPU does not use parts of vector registers separately - C2 does not use XMM0b, XMM0c, XMM0d parts of xmm0. Even when C2 uses VecS type it use whole zmm register in avx512 but narrowed it by passing length to assembler instruction (or we use an instruction which uses only part of 512 bit register). > > Vladimir I. suggested to have VecMAX type which can be used to match all different vector length implementation to have only one AD instruction. And use vector length to generate corresponding code. For example, vabs8B_reg() and vabs16B_reg() are almost the same except vectors type VecD vs VecX. There should be no difference in code generation (we need to modify vec_mov_helper() and other similar code to check vector length when it see VecMAX). > > We can use this approach for already existing instructions too to reduce code size generated from AD files. > > What do you think? > > Sandhya >> Using vecMAX will lead to spill/fill code using the largest vector width which is not recommended on Intel architecture. The better way to club or merge then would be as John suggested with binary/unary op for same register type. We will need to take into account the temporaries needed etc while clubbing the rules so as to not degrade the generated code unnecessarily. Some questions that will help me is how much redesign we want? Only for new code? Or also the existing code? Also is the libjvm size our criteria or ad file size or both? I will need a lot of support from you and Vlaidmir Ivanov if we need to get this redesign through in reviews and sponsorship over the coming month or so. I will explore a small protoype and submit that first and we could then go from there. > > Regards, > Vladimir K > > On 4/9/19 10:18 AM, Viswanathan, Sandhya wrote: >> Hi Yang, >> >> Thanks a lot for trying out the patch in your setup. >> >> Please do let me know when you check the details if you find the failure in DivideMvTests.java to be due to this patch. >> >> I will fix all the trailing space and unaligned line style issues that you pointed out. >> >> The TestInt is updated to cover for some additional support added for "int" in this patch like Absolute and subtraction from zero. >> There is an additional test for Not for which we plan to add support in a follow up patch. >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Yang Zhang (Arm Technology China) [mailto:Yang.Zhang at arm.com] >> Sent: Tuesday, April 09, 2019 1:04 AM >> To: Viswanathan, Sandhya ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> Hi Sandhya >> >> Thanks for proposing this enhancement. >> I have tested this patch in our internal ci. There is a new failure. But I didn't check the details. >> java/math/BigDecimal/DivideMcTests.java >> >> In addition, there are trailing spaces in the following files. >> src/hotspot/cpu/x86/assembler_x86.cpp >> src/hotspot/cpu/x86/stubGenerator_x86_32.cpp >> src/hotspot/cpu/x86/x86.ad >> src/hotspot/cpu/x86/x86_32.ad >> src/hotspot/share/opto/superword.cpp >> >> In file src/hotspot/share/classfile/vmSymbols.hpp, there are some unaligned lines. >> In file test/hotspot/jtreg/compiler/c2/cr6340864/TestIntVect.java, there are new test functions. Are these new functions needed by byte/short/long? >> >> Regards, >> Yang >> >> >> -----Original Message----- >> From: hotspot-compiler-dev >> On Behalf Of >> Viswanathan, Sandhya >> Sent: Saturday, April 6, 2019 9:18 AM >> To: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov >> >> Subject: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> >> Please find below a link to the webrev which enhances super-word auto vectorization for x86. >> The following additional operations are supported: >> >> 1) Absolute for all data types >> >> 2) Shifts for byte data types >> >> 3) Shift right arithmetic for long data type >> >> 4) Byte multiply >> >> 5) Negate for float/double >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 >> Webrev: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.00/ >> >> The compiler jtreg tests pass with UseAVX=0,1,2,3 and KNL. >> Your review and comments are welcome. >> >> Best Regards, >> Sandhya >> >> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. >> From vladimir.kozlov at oracle.com Wed Apr 10 01:19:10 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 9 Apr 2019 18:19:10 -0700 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> Message-ID: On 4/9/19 6:04 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Yes, I missed the question below: >>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? > > No it is not intentional, we can use the dst register in those cases and reduced the tmps. > > Thanks a lot for clarifying on vec_mov_helper(), it is much clearer now what you have in mind. But the code generated many times differs with vector size so it may not help in reducing the libjvm code size. I will explore all possible ways to reduce code size increase. You can "cheat" ;-) by optimizing existing code to get under size budget for new code. Do it as separate RFE. Vladimir > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, April 09, 2019 5:58 PM > To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > On 4/9/19 4:59 PM, Viswanathan, Sandhya wrote: >> Hi Vladimir, >> >> Please see my answers in your email below. > > My comments below too. > >> >> Best Regards, >> Sandhya >> >> >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, April 09, 2019 12:34 PM >> To: Viswanathan, Sandhya ; >> hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 >> >> Hi Sandhya, >> >> I looked through changes and had discussion with Vladimir Ivanov about them. >> In general logic of changes follow out usual pattern - no problem here. >> >> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? > > What about this question? > >> >> But my main concern now is JVM size significant grow. And it will be worse when we implementing the rest of Vector API instructions. >> >> The main reason for size grow is additional AD instructions which are compiled by APLC into multiply functions. I originally thought that moving 'ins_encode %{ %}' code which have several instructions into macro-assembler will help if it is used by several AD instructions. But Vladimir I. convinced me that it will be insignificant comparing to reducing number of AD instructions. >> >> We came up with several suggestion how we can address it and it will greatly help if you (Intel) investigate them. >> >> 1. I think you should provide JVM size increase data for changes like this. What is increase for this one? >> >> Sandhya >> The size increase is about 1.5%: >> libjvm.so : 24358297 >> libjvm.so + 8222074 : 24718043 (1.5%) > > I think it is not a little - you implemented only 4 operations. > > May be we should follow Intel's rule: you can add new instructions only if you reduce power consumption (size in our case). ;-) > >> >> 2. How is important for Intel to support new vector instructions for CPU without AVX? May be we should stop new code for old CPUs such as vsll16B_reg, for exaxmple. It does not mean we can't use SSE instructions in implementation (for example, vabs8B_reg) - such cases are fine. >> >> Sandhya >> Definitely AVX is higher priority, we can try to merge as many rules as possible on similar lines as vabs8B_reg. > > Okay. > >> >> 3. I still want to see some common instructions pattern in 'ins_encode %{ %}' be moved into macro-assembler. For example, the only difference between vs*_reg and vs*_reg_imm is one or 2 instructions, the rest is the same. >> >> Sandhya >> This is due to the peculiarity of Shift count handling in the superword, only when the count is not immediate it goes through RShiftCntV/LShiftCntV. It should be possible to merge these rules with some tweaks to superword.cpp. Since this was common part of the code, I didn?t want to change so as to minimize effect on other architecture. I can take a closer look and clean that up. > > Okay. > >> >> 4. Most important. The main reason we have a lot AD instructions is to 'match' different vector types for corresponding different vector length. I think we should revisit this approach. >> >> Intel CPU does not use parts of vector registers separately - C2 does not use XMM0b, XMM0c, XMM0d parts of xmm0. Even when C2 uses VecS type it use whole zmm register in avx512 but narrowed it by passing length to assembler instruction (or we use an instruction which uses only part of 512 bit register). >> >> Vladimir I. suggested to have VecMAX type which can be used to match all different vector length implementation to have only one AD instruction. And use vector length to generate corresponding code. For example, vabs8B_reg() and vabs16B_reg() are almost the same except vectors type VecD vs VecX. There should be no difference in code generation (we need to modify vec_mov_helper() and other similar code to check vector length when it see VecMAX). >> >> We can use this approach for already existing instructions too to reduce code size generated from AD files. >> >> What do you think? >> >> Sandhya >> Using vecMAX will lead to spill/fill code using the largest >> vector width which is not recommended on Intel > > That is why I added comment about vec_mov_helper(). This function is used for generating spills. You definitely should not save whole 512 bits register but only part corresponding to byte size of vector. > Note, when I talked about vector length I meant length_in_bytes(). > > architecture. The better way to club or merge then would be as John suggested with binary/unary op for same register type. We will need to take into account the temporaries needed etc while clubbing the rules so as to not degrade the generated code > > I am not sure this approach is simpler in short term but I can't say what would be better in long run. This is needed to explore. > > unnecessarily. Some questions that will help me is how much redesign we want? Only for new code? Or also the existing code? Also > > In long term we should update all existing code too. > > is the libjvm size our criteria or ad file size or both? I will need a lot of support from you and Vlaidmir Ivanov if we need to > > The main goal is reduce libjvm size. But we should keep other platforms in mind. > > get this redesign through in reviews and sponsorship over the coming month or so. I will explore a small protoype and submit that first and we could then go from there. > > Thanks, > Vladimir > >> >> Regards, >> Vladimir K >> >> On 4/9/19 10:18 AM, Viswanathan, Sandhya wrote: >>> Hi Yang, >>> >>> Thanks a lot for trying out the patch in your setup. >>> >>> Please do let me know when you check the details if you find the failure in DivideMvTests.java to be due to this patch. >>> >>> I will fix all the trailing space and unaligned line style issues that you pointed out. >>> >>> The TestInt is updated to cover for some additional support added for "int" in this patch like Absolute and subtraction from zero. >>> There is an additional test for Not for which we plan to add support in a follow up patch. >>> >>> Best Regards, >>> Sandhya >>> >>> >>> -----Original Message----- >>> From: Yang Zhang (Arm Technology China) [mailto:Yang.Zhang at arm.com] >>> Sent: Tuesday, April 09, 2019 1:04 AM >>> To: Viswanathan, Sandhya ; >>> hotspot-compiler-dev at openjdk.java.net >>> Subject: RE: RFR (M) 8222074: Enhance auto vectorization for x86 >>> >>> Hi Sandhya >>> >>> Thanks for proposing this enhancement. >>> I have tested this patch in our internal ci. There is a new failure. But I didn't check the details. >>> java/math/BigDecimal/DivideMcTests.java >>> >>> In addition, there are trailing spaces in the following files. >>> src/hotspot/cpu/x86/assembler_x86.cpp >>> src/hotspot/cpu/x86/stubGenerator_x86_32.cpp >>> src/hotspot/cpu/x86/x86.ad >>> src/hotspot/cpu/x86/x86_32.ad >>> src/hotspot/share/opto/superword.cpp >>> >>> In file src/hotspot/share/classfile/vmSymbols.hpp, there are some unaligned lines. >>> In file test/hotspot/jtreg/compiler/c2/cr6340864/TestIntVect.java, there are new test functions. Are these new functions needed by byte/short/long? >>> >>> Regards, >>> Yang >>> >>> >>> -----Original Message----- >>> From: hotspot-compiler-dev >>> On Behalf Of >>> Viswanathan, Sandhya >>> Sent: Saturday, April 6, 2019 9:18 AM >>> To: hotspot-compiler-dev at openjdk.java.net; Vladimir Kozlov >>> >>> Subject: RFR (M) 8222074: Enhance auto vectorization for x86 >>> >>> >>> Please find below a link to the webrev which enhances super-word auto vectorization for x86. >>> The following additional operations are supported: >>> >>> 1) Absolute for all data types >>> >>> 2) Shifts for byte data types >>> >>> 3) Shift right arithmetic for long data type >>> >>> 4) Byte multiply >>> >>> 5) Negate for float/double >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8222074 >>> Webrev: http://cr.openjdk.java.net/~sviswanathan/8222074/webrev.00/ >>> >>> The compiler jtreg tests pass with UseAVX=0,1,2,3 and KNL. >>> Your review and comments are welcome. >>> >>> Best Regards, >>> Sandhya >>> >>> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. >>> From jcbeyler at google.com Wed Apr 10 02:36:08 2019 From: jcbeyler at google.com (Jean Christophe Beyler) Date: Tue, 9 Apr 2019 19:36:08 -0700 Subject: RFR (S) 8221853: Data race in compile broker (set_last_compile) In-Reply-To: References: <4168dea9-8964-3b44-af2b-e30cc1ffe144@oracle.com> <3eea0593-fe96-5894-05da-c0c5f5ef38d6@oracle.com> Message-ID: Thanks Tobias and Vladimir, It passed the submit repo so I pushed it. Thanks again, Jc On Wed, Apr 3, 2019 at 11:42 PM Tobias Hartmann wrote: > Looks good to me too. > > Best regards, > Tobias > > On 03.04.19 19:26, Vladimir Kozlov wrote: > > Looks good. Please, run submit testing before push. > > > > Thanks, > > Vladimir > > > > On 4/3/19 10:13 AM, Jean Christophe Beyler wrote: > >> Hi Vladimir, > >> > >> Sounds good to me: > >> Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.02/ > >> Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 > >> > >> I cleaned it up a bit and renamed it to "update_compile_perf_data" let > me know what you think, > >> Jc > >> > >> > >> On Wed, Apr 3, 2019 at 9:37 AM Vladimir Kozlov < > vladimir.kozlov at oracle.com > >> > wrote: > >> > >> Hi Jc, > >> > >> I agree with removal of print_last_compiled() method and related > code. > >> But you need to keep part of set_last_compiled() code (guarded by > UsePerfData) which set > >> values of CompilerCounters. It > >> is used. > >> > >> Thanks, > >> Vladimir > >> > >> On 4/3/19 9:05 AM, Jean Christophe Beyler wrote: > >> > Hi Tobias, > >> > > >> > Sounds good to me, here is a webrev that removes it entirely: > >> > > >> > Webrev: http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.01/ > >> > Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 > >> > > >> > Let me know what you think, > >> > Jc > >> > > >> > On Wed, Apr 3, 2019 at 4:17 AM Tobias Hartmann < > tobias.hartmann at oracle.com > >> > >> tobias.hartmann at oracle.com>>> wrote: > >> > > >> > Hi Jc, > >> > > >> > I would actually prefer to just remove this unused code if > no one objects. > >> > > >> > Best regards, > >> > Tobias > >> > > >> > On 02.04.19 18:52, Jean Christophe Beyler wrote: > >> > > Hi all, > >> > > > >> > > While working on enabling Java TSAN, one non-goal is that > if we let it do its work, > >> it does thread > >> > > sanitizing on the JVM. Though this is a non-goal, I saw > this one pop up and wanted > >> to know if you > >> > > would like it cleaned up? > >> > > > >> > > Webrev: > http://cr.openjdk.java.net/~jcbeyler/8221853/webrev.00/ > >> > > Bug: https://bugs.openjdk.java.net/browse/JDK-8221853 > >> > > > >> > > I'm not sure the webrev is the way you'd like to go but > from what I can see: > >> > > > >> > > - This is benign as no one was using the data being > raced > >> > > - No one calls print_last_compiled, which uses data > only set in set_last_compiled > >> > > - Because it is debug, the whole code could be wrapped > into non product builds > >> > > - I did add a compile lock for both the printout and > the set_last but I could > >> make a new lock > >> > > just for this code instead of using the general compile > lock. > >> > > > >> > > Thanks and let me know, > >> > > Jc > >> > > >> > > >> > > >> > -- > >> > > >> > Thanks, > >> > Jc > >> > >> > >> > >> -- > >> > >> Thanks, > >> Jc > -- Thanks, Jc -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsrbnd at gmail.com Wed Apr 10 11:10:04 2019 From: bsrbnd at gmail.com (B. Blaser) Date: Wed, 10 Apr 2019 13:10:04 +0200 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> Message-ID: Hi Sandhya and Vladimir K., On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya wrote: > > Hi Vladimir, > > Yes, I missed the question below: > >> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? > > No it is not intentional, we can use the dst register in those cases and reduced the tmps. I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg(): 7349 format %{"pmovsxbw $tmp,$src1\n\t" 7350 "pmovsxbw $tmp2,$src2\n\t" I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think? Thanks, Bernard From sgehwolf at redhat.com Wed Apr 10 11:18:36 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Wed, 10 Apr 2019 13:18:36 +0200 Subject: [8u] RFR: 8221355: Performance regression after JDK-8155635 backport into 8u Message-ID: <464fcc0f70569534892d11976a857cd3eb7a9494.camel@redhat.com> Hi, Could I please get a review of this 8u212 performance regression fix? webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8221355/01/webrev/ Bug: https://bugs.openjdk.java.net/browse/JDK-8221355 Before this fix: $ time ./bin/java UnsafeGetObject 1204 ms 1212 ms 1237 ms 1289 ms 1348 ms real 0m6.354s user 0m6.330s sys 0m0.023s After this fix: $ time ./bin/java UnsafeGetObject 54 ms 51 ms 43 ms 45 ms 44 ms real 0m0.302s user 0m0.303s sys 0m0.009s UnsafeGetObject.java is from: https://bugs.openjdk.java.net/browse/JDK-8181822 Testing: Sanity testing with java.{,io,lang,math,net,nio,security,util} tests. Thoughts? Thanks, Severin From shade at redhat.com Wed Apr 10 11:25:02 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 10 Apr 2019 13:25:02 +0200 Subject: [8u] RFR: 8221355: Performance regression after JDK-8155635 backport into 8u In-Reply-To: <464fcc0f70569534892d11976a857cd3eb7a9494.camel@redhat.com> References: <464fcc0f70569534892d11976a857cd3eb7a9494.camel@redhat.com> Message-ID: <7f2abab8-0998-8e12-0c33-faa35216b6dc@redhat.com> On 4/10/19 1:18 PM, Severin Gehwolf wrote: > Could I please get a review of this 8u212 performance regression fix? > webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8221355/01/webrev/ This is not the same patch that Oracle apparently pushed, or that I tested myself. Compare with: http://cr.openjdk.java.net/~shade/8221355/8221355-01.patch The difference is not critical, but better match? It should definitely be in 8u-dev (next CPU). I'll leave the decision for 8u (current CPU) to 8u maintainers. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From sgehwolf at redhat.com Wed Apr 10 12:42:31 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Wed, 10 Apr 2019 14:42:31 +0200 Subject: [8u] RFR: 8221355: Performance regression after JDK-8155635 backport into 8u In-Reply-To: <7f2abab8-0998-8e12-0c33-faa35216b6dc@redhat.com> References: <464fcc0f70569534892d11976a857cd3eb7a9494.camel@redhat.com> <7f2abab8-0998-8e12-0c33-faa35216b6dc@redhat.com> Message-ID: <02c9168afbbe33980567a3e3898a3c7f924ad205.camel@redhat.com> Hi Aleksey, Thanks for the review! On Wed, 2019-04-10 at 13:25 +0200, Aleksey Shipilev wrote: > On 4/10/19 1:18 PM, Severin Gehwolf wrote: > > Could I please get a review of this 8u212 performance regression fix? > > webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8221355/01/webrev/ > > This is not the same patch that Oracle apparently pushed, or that I tested myself. Compare with: > http://cr.openjdk.java.net/~shade/8221355/8221355-01.patch > > The difference is not critical, but better match? Is there a reason to keep unused local variable 'can_access_non_heap' around in JDK 8? It makes sense for the JDK 9 change as its being used further down in the same function: http://hg.openjdk.java.net/jdk-updates/jdk9u/hotspot/file/22d7a88dbe78/src/share/vm/opto/library_call.cpp#l2493 That's not the case for JDK 8u, though. Thoughts? The left over comment is misleading, though, which I've now removed: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8221355/02/webrev/ > It should definitely be in 8u-dev (next CPU). I'll leave the decision for 8u (current CPU) to 8u > maintainers. Yes, I'd be aiming this for 8u-dev for now. Thanks, Severin From shade at redhat.com Wed Apr 10 12:55:29 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 10 Apr 2019 14:55:29 +0200 Subject: [8u] RFR: 8221355: Performance regression after JDK-8155635 backport into 8u In-Reply-To: <02c9168afbbe33980567a3e3898a3c7f924ad205.camel@redhat.com> References: <464fcc0f70569534892d11976a857cd3eb7a9494.camel@redhat.com> <7f2abab8-0998-8e12-0c33-faa35216b6dc@redhat.com> <02c9168afbbe33980567a3e3898a3c7f924ad205.camel@redhat.com> Message-ID: On 4/10/19 2:42 PM, Severin Gehwolf wrote: > Is there a reason to keep unused local variable 'can_access_non_heap' > around in JDK 8? It makes sense for the JDK 9 change as its being used > further down in the same function: > http://hg.openjdk.java.net/jdk-updates/jdk9u/hotspot/file/22d7a88dbe78/src/share/vm/opto/library_call.cpp#l2493 > > That's not the case for JDK 8u, though. Thoughts? Right. I don't think there is a reason to keep the can_access_non_heap. > The left over comment is misleading, though, which I've now removed: > http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8221355/02/webrev/ ...but now I am confused how that patch is supposed to work. It seems to me it accepts the accesses to off-heap when object passed it is actually null, but not transparently-null for the compiler? That is probably acceptable, but maybe someone savvy in this code can take a look? Vladimir I, Roland? -- Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From rwestrel at redhat.com Wed Apr 10 13:16:41 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 10 Apr 2019 15:16:41 +0200 Subject: [8u] RFR: 8221355: Performance regression after JDK-8155635 backport into 8u In-Reply-To: References: <464fcc0f70569534892d11976a857cd3eb7a9494.camel@redhat.com> <7f2abab8-0998-8e12-0c33-faa35216b6dc@redhat.com> <02c9168afbbe33980567a3e3898a3c7f924ad205.camel@redhat.com> Message-ID: <87r2aaov9i.fsf@redhat.com> >> The left over comment is misleading, though, which I've now removed: >> http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8221355/02/webrev/ > > ...but now I am confused how that patch is supposed to work. It seems to me it accepts the accesses > to off-heap when object passed it is actually null, but not transparently-null for the compiler? > That is probably acceptable, but maybe someone savvy in this code can take a look? Vladimir I, Roland? When base is known to be null, this patch causes c2 to not intrinsify object accesses and the native method is called. Without this patch, c2 doesn't intrinsify object accesses unless it knows base to be not null which is indeed too conservative. That looks good to me. Roland. From gnu.andrew at redhat.com Wed Apr 10 15:26:50 2019 From: gnu.andrew at redhat.com (Andrew John Hughes) Date: Wed, 10 Apr 2019 16:26:50 +0100 Subject: [8u] RFR: 8221355: Performance regression after JDK-8155635 backport into 8u In-Reply-To: <7f2abab8-0998-8e12-0c33-faa35216b6dc@redhat.com> References: <464fcc0f70569534892d11976a857cd3eb7a9494.camel@redhat.com> <7f2abab8-0998-8e12-0c33-faa35216b6dc@redhat.com> Message-ID: On 10/04/2019 12:25, Aleksey Shipilev wrote: > On 4/10/19 1:18 PM, Severin Gehwolf wrote: >> Could I please get a review of this 8u212 performance regression fix? >> webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8221355/01/webrev/ > > This is not the same patch that Oracle apparently pushed, or that I tested myself. Compare with: > http://cr.openjdk.java.net/~shade/8221355/8221355-01.patch > > The difference is not critical, but better match? > > It should definitely be in 8u-dev (next CPU). I'll leave the decision for 8u (current CPU) to 8u > maintainers. > > -Aleksey > I think it should be 8u212 to avoid a performance regression between OpenJDK 8u212 and Oracle's 8u212. Once Severin has pushed his patch, I'll pull that into my local version of jdk8u and the result will be tagged jdk8u212-b04. -- Andrew :) Senior Free Java Software Engineer Red Hat, Inc. (http://www.redhat.com) PGP Key: ed25519/0xCFDA0F9B35964222 (hkp://keys.gnupg.net) Fingerprint = 5132 579D D154 0ED2 3E04 C5A0 CFDA 0F9B 3596 4222 https://keybase.io/gnu_andrew -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: OpenPGP digital signature URL: From sgehwolf at redhat.com Wed Apr 10 15:32:29 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Wed, 10 Apr 2019 17:32:29 +0200 Subject: [8u] RFR: 8221355: Performance regression after JDK-8155635 backport into 8u In-Reply-To: <87r2aaov9i.fsf@redhat.com> References: <464fcc0f70569534892d11976a857cd3eb7a9494.camel@redhat.com> <7f2abab8-0998-8e12-0c33-faa35216b6dc@redhat.com> <02c9168afbbe33980567a3e3898a3c7f924ad205.camel@redhat.com> <87r2aaov9i.fsf@redhat.com> Message-ID: On Wed, 2019-04-10 at 15:16 +0200, Roland Westrelin wrote: > > > The left over comment is misleading, though, which I've now removed: > > > http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8221355/02/webrev/ > > > > ...but now I am confused how that patch is supposed to work. It seems to me it accepts the accesses > > to off-heap when object passed it is actually null, but not transparently-null for the compiler? > > That is probably acceptable, but maybe someone savvy in this code can take a look? Vladimir I, Roland? > > When base is known to be null, this patch causes c2 to not intrinsify > object accesses and the native method is called. Without this patch, c2 > doesn't intrinsify object accesses unless it knows base to be not null > which is indeed too conservative. > > That looks good to me. Thanks for the review! Cheers, Severin From sandhya.viswanathan at intel.com Wed Apr 10 15:36:26 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 10 Apr 2019 15:36:26 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABCE7@FMSMSX126.amr.corp.intel.com> Hi Bernard, One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src. Best Regards, Sandhya -----Original Message----- From: B. Blaser [mailto:bsrbnd at gmail.com] Sent: Wednesday, April 10, 2019 4:10 AM To: Viswanathan, Sandhya Cc: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 Hi Sandhya and Vladimir K., On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya wrote: > > Hi Vladimir, > > Yes, I missed the question below: > >> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? > > No it is not intentional, we can use the dst register in those cases and reduced the tmps. I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg(): 7349 format %{"pmovsxbw $tmp,$src1\n\t" 7350 "pmovsxbw $tmp2,$src2\n\t" I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think? Thanks, Bernard From sgehwolf at redhat.com Wed Apr 10 15:50:57 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Wed, 10 Apr 2019 17:50:57 +0200 Subject: [8u] RFR: 8221355: Performance regression after JDK-8155635 backport into 8u In-Reply-To: References: <464fcc0f70569534892d11976a857cd3eb7a9494.camel@redhat.com> <7f2abab8-0998-8e12-0c33-faa35216b6dc@redhat.com> Message-ID: <841cb14a37079b216ba3c64bea17e6f47a76d54d.camel@redhat.com> On Wed, 2019-04-10 at 16:26 +0100, Andrew John Hughes wrote: > > On 10/04/2019 12:25, Aleksey Shipilev wrote: > > On 4/10/19 1:18 PM, Severin Gehwolf wrote: > > > Could I please get a review of this 8u212 performance regression fix? > > > webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8221355/01/webrev/ > > > > This is not the same patch that Oracle apparently pushed, or that I tested myself. Compare with: > > http://cr.openjdk.java.net/~shade/8221355/8221355-01.patch > > > > The difference is not critical, but better match? > > > > It should definitely be in 8u-dev (next CPU). I'll leave the decision for 8u (current CPU) to 8u > > maintainers. > > > > -Aleksey > > > > I think it should be 8u212 to avoid a performance regression between > OpenJDK 8u212 and Oracle's 8u212. > > Once Severin has pushed his patch, I'll pull that into my local version > of jdk8u and the result will be tagged jdk8u212-b04. Pushed: http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/0cbfe6c38b2e Thanks, Severin From vladimir.kozlov at oracle.com Wed Apr 10 16:58:47 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 10 Apr 2019 09:58:47 -0700 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABCE7@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABCE7@FMSMSX126.amr.corp.intel.com> Message-ID: <4a77b7c0-fc1a-441c-d018-70568876c4f4@oracle.com> On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote: > Hi Bernard, > > One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src. Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead $tmp2 to avoid overwriting $src2 before we get value from it if $dst = $src2. On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result. It is a little mess which may cause ineffective use of registers in compiled code. Thanks, Vladimir > > Best Regards, > Sandhya > > > -----Original Message----- > From: B. Blaser [mailto:bsrbnd at gmail.com] > Sent: Wednesday, April 10, 2019 4:10 AM > To: Viswanathan, Sandhya > Cc: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > Hi Sandhya and Vladimir K., > > On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya wrote: >> >> Hi Vladimir, >> >> Yes, I missed the question below: >>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? >> >> No it is not intentional, we can use the dst register in those cases and reduced the tmps. > > I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg(): > > 7349 format %{"pmovsxbw $tmp,$src1\n\t" > 7350 "pmovsxbw $tmp2,$src2\n\t" > > I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think? > > Thanks, > Bernard > From sandhya.viswanathan at intel.com Wed Apr 10 17:21:48 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 10 Apr 2019 17:21:48 +0000 Subject: RFR (M) 8222074: Enhance auto vectorization for x86 In-Reply-To: <4a77b7c0-fc1a-441c-d018-70568876c4f4@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A99813@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB5C2@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB845@FMSMSX126.amr.corp.intel.com> <21eeec09-624f-2dbd-b2f5-86d512233fe0@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AAB898@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABCE7@FMSMSX126.amr.corp.intel.com> <4a77b7c0-fc1a-441c-d018-70568876c4f4@oracle.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1AABDA2@FMSMSX126.amr.corp.intel.com> Yes good catch, in mul32B_reg_avx(), the last two instructions are the only place where dst is used: __ vpackuswb($dst$$XMMRegister, $tmp2$$XMMRegister, $tmp1$$XMMRegister, vector_len); __ vpermq($dst$$XMMRegister, $dst$$XMMRegister, 0xD8, vector_len); Here dst can be same as tmp2 or tmp1 in packuswb() and so the effect TEMP dst is not required. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Wednesday, April 10, 2019 9:59 AM To: Viswanathan, Sandhya ; B. Blaser Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 On 4/10/19 8:36 AM, Viswanathan, Sandhya wrote: > Hi Bernard, > > One could add TEMP dst in effect() to let the register allocator know that dst needs to be different from src. Yes, we use this way. Or, in mul4B_reg() case, we can use $dst instead $tmp2 to avoid overwriting $src2 before we get value from it if $dst = $src2. On other hand, mul32B_reg_avx() and other have 'TEMP dst' effect but $dst is used only for final result. It is a little mess which may cause ineffective use of registers in compiled code. Thanks, Vladimir > > Best Regards, > Sandhya > > > -----Original Message----- > From: B. Blaser [mailto:bsrbnd at gmail.com] > Sent: Wednesday, April 10, 2019 4:10 AM > To: Viswanathan, Sandhya > Cc: Vladimir Kozlov ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR (M) 8222074: Enhance auto vectorization for x86 > > Hi Sandhya and Vladimir K., > > On Wed, 10 Apr 2019 at 03:06, Viswanathan, Sandhya wrote: >> >> Hi Vladimir, >> >> Yes, I missed the question below: >>>> There are cases where we can use less `TEMP tmp` registers by using 'dst' register like in mul4B_reg(). Is it intentional to not use 'dst' there? >> >> No it is not intentional, we can use the dst register in those cases and reduced the tmps. > > I guess we have to be careful using $dst instead of $tmp registers as the allocator sometimes provides identical $src & $dst. Also, I'm not sure this would be possible in the case of mul4B_reg(): > > 7349 format %{"pmovsxbw $tmp,$src1\n\t" > 7350 "pmovsxbw $tmp2,$src2\n\t" > > I believe this couldn't work if you use $dst instead of $tmp and $dst = $src2, what do you think? > > Thanks, > Bernard > From gil at azul.com Wed Apr 10 19:29:35 2019 From: gil at azul.com (Gil Tene) Date: Wed, 10 Apr 2019 19:29:35 +0000 Subject: [8u] RFR: 8221355: Performance regression after JDK-8155635 backport into 8u In-Reply-To: References: <464fcc0f70569534892d11976a857cd3eb7a9494.camel@redhat.com> <7f2abab8-0998-8e12-0c33-faa35216b6dc@redhat.com> Message-ID: Before we jump ahead and integrate this into the upcoming April 8u212, and bump the build number from b03 to b04, I'd like to point out that the Oracle backport (https://bugs.openjdk.java.net/browse/JDK-8221954) appears to be to their 8u212 b31, which is likely a BPR, and not the initial version of 8u212 that will be out in a week. So the premise that not including this in the initial version of OpenJDK 8u212 will lead to a performance regression compared Oracle's 8u212 is not quite right. The difference will happen when oracle publishes their 8u212-b31. Given the time crunch, the risk, and the fact that Oracle's April 16 update will likely NOT include this fix to the regression initially introduced in 8u202, I would advocate to still build a 8u212-b03, which would not include this back-port, and to either do a b04 later, or push this to July. ? Gil. > On Apr 10, 2019, at 8:26 AM, Andrew John Hughes wrote: > > > > On 10/04/2019 12:25, Aleksey Shipilev wrote: >> On 4/10/19 1:18 PM, Severin Gehwolf wrote: >>> Could I please get a review of this 8u212 performance regression fix? >>> webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8221355/01/webrev/ >> >> This is not the same patch that Oracle apparently pushed, or that I tested myself. Compare with: >> http://cr.openjdk.java.net/~shade/8221355/8221355-01.patch >> >> The difference is not critical, but better match? >> >> It should definitely be in 8u-dev (next CPU). I'll leave the decision for 8u (current CPU) to 8u >> maintainers. >> >> -Aleksey >> > > > I think it should be 8u212 to avoid a performance regression between > OpenJDK 8u212 and Oracle's 8u212. > > Once Severin has pushed his patch, I'll pull that into my local version > of jdk8u and the result will be tagged jdk8u212-b04. > -- > Andrew :) > > Senior Free Java Software Engineer > Red Hat, Inc. (http://www.redhat.com) > > PGP Key: ed25519/0xCFDA0F9B35964222 (hkp://keys.gnupg.net) > Fingerprint = 5132 579D D154 0ED2 3E04 C5A0 CFDA 0F9B 3596 4222 > https://keybase.io/gnu_andrew -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From vladimir.x.ivanov at oracle.com Thu Apr 11 01:03:05 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 10 Apr 2019 18:03:05 -0700 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> References: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> Message-ID: > http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.02/ Much better. One more question: what happens if there's an exception being thrown earlier in the block (e.g., NPE or from a call) and it makes the call site unreachable? Inspecting relevant counters I mentioned before may help detecting such case. Best regards, Vladimir Ivanov >> What you are proposing is to unconditionally inline constructor calls. >> I consider such change as too intrusive. >> >> I suggest to focus on "profile.count() == 0" check and make it >> smarter: when profile info is scarce, try to prove that the call site >> is actually reachable before giving up. > OK, I agree. >> In addition, it's possible to prove the call is always executed by >> looking at CFG or checking that start block is being parsed. > Very good idea! > I have checked that if the call site belongs to a start block in the > updated patch. > I had tried to look at the CFG like this, but failed. > ----------------------------------------------- > diff -r 7383a17b4c65 src/hotspot/share/opto/bytecodeInfo.cpp > --- a/src/hotspot/share/opto/bytecodeInfo.cpp?? Mon Apr 08 15:27:24 2019 > +0800 > +++ b/src/hotspot/share/opto/bytecodeInfo.cpp?? Mon Apr 08 17:25:04 2019 > +0800 > @@ -373,9 +373,11 @@ > ???? } else if (forced_inline()) { > ?????? // Inlining was forced by CompilerOracle, ciReplay or annotation > ???? } else if (profile.count() == 0) { > -????? // don't inline unreached call sites > -?????? set_msg("call site not reached"); > -?????? return false; > +????? if (C->cfg()->get_block(jvms->bci()) != C->cfg()->get_block(0)) { > +??????? // don't inline unreached call sites > +??????? set_msg("call site not reached"); > +??????? return false; > +????? } > ???? } > ?? } > ----------------------------------------------- > Do you have any comments on how to look at the CFG for more info? > Thanks. >> >> When "profile.count() == 0", but the call site has been reached >> before, it seems the following conditions should hold: >> >> ? caller_method->interpreter_invocation_count() > 0 > I think this condition is redundant since it always holds for a method > to be compiled. >> AND >> ? caller_method->method_data()->invocation_count() == (0 OR 1) > I'm not sure if this condition still holds for parallel execution of the > caller. >> AND >> ?callee_method->was_executed_more_than(0) == true > Even though this rule is true, it seems still hard to say that the > particular call site had been reached before. >> AND >> ?parse->block() == parse->start_block() >> > Very nice! > This rule is good enough to solve this particular issue. > And it has been implemented in the patch. >> Some of them can be turned into asserts (e.g., invocation_count() == 0). >> >> Best regards, >> Vladimir Ivanov >> >> [2] >> (lldb) p this >> (Parse *) $33 = 0x000070000eacc6e8 >> >> (lldb) p start_block() >> (Parse::Block *) $31 = 0x00000001008aef00 >> >> (lldb) p block() >> (Parse::Block *) $32 = 0x00000001008aef00 > Could you please also provide the backtrace info? > I had tried to find the right place to directly call start_block() and > block() in my patch, but failed. From fujie at loongson.cn Thu Apr 11 01:51:47 2019 From: fujie at loongson.cn (Jie Fu) Date: Thu, 11 Apr 2019 09:51:47 +0800 Subject: RFR:8222302:[TESTBUG]test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java fails on any other CPU Message-ID: Hi all, JBS:??? https://bugs.openjdk.java.net/browse/JDK-8222302 Webrev: http://cr.openjdk.java.net/~jiefu/8222302/webrev.00/ TestUseSHAOptionOnUnsupportedCPU.java fails on any other CPU (not AArch64, PPC, S390x, SPARC or X86). It is designed to test "UseSHASpecificTestCaseForUnsupportedCPU"[1] and "GenericTestCaseForOtherCPU"[2] on any other CPU[3]. But when they run on any other CPU (e.g., mips), an exception[4] is always thrown, which causes the failure. So there seems to be a logical bug in it. The change has been tested on mips and x86. Could you please review it? Thanks a lot. Best regards, Jie [1] http://hg.openjdk.java.net/jdk/jdk/file/bf07e140c49c/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java#l56 [2] http://hg.openjdk.java.net/jdk/jdk/file/bf07e140c49c/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHAOptionOnUnsupportedCPU.java#l58 [3] http://hg.openjdk.java.net/jdk/jdk/file/bf07e140c49c/test/hotspot/jtreg/compiler/intrinsics/sha/cli/testcases/GenericTestCaseForOtherCPU.java#l34 [4] http://hg.openjdk.java.net/jdk/jdk/file/bf07e140c49c/test/hotspot/jtreg/compiler/intrinsics/sha/cli/SHAOptionsBase.java#l92 From fujie at loongson.cn Thu Apr 11 03:24:59 2019 From: fujie at loongson.cn (Jie Fu) Date: Thu, 11 Apr 2019 11:24:59 +0800 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: References: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> Message-ID: <182d87da-0d99-3f33-fbe7-ef5818be0422@loongson.cn> Hi Vladimir, > >> http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.02/ > > Much better. > > One more question: what happens if there's an exception being thrown > earlier in the block (e.g., NPE or from a call) and it makes the call > site unreachable? Inspecting relevant counters I mentioned before may > help detecting such case. Good catch! Fixed in http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.03/ Please review and give some comments. Thanks a lot. Best regards, Jie From doug.simon at oracle.com Thu Apr 11 11:20:00 2019 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 11 Apr 2019 13:20:00 +0200 Subject: Epsilon + Graal In-Reply-To: <15ec222f-22a0-49b3-4c7d-29d477dd3c19@redhat.com> References: <15ec222f-22a0-49b3-4c7d-29d477dd3c19@redhat.com> Message-ID: <8CC4C60B-9F75-4D42-8FE2-992968A95AF6@oracle.com> Hi Aleksey, It would be great to see support for Epsilon in Graal. More inline below: > On 11 Apr 2019, at 11:40, Aleksey Shipilev wrote: > > Hi, > > I am tinkering with Epsilon support for Graal (targeting AOT binaries with no GC). This patch > applies to jdk/jdk: > http://cr.openjdk.java.net/~shade/epsilon/graal-support.patch > > ...and passes this test suite: > > $ CONF=linux-x86_64-server-fastdebug make images run-test TEST=compiler/aot > TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC" > > ...and does this: > > $ build/linux-x86_64-server-release/images/jdk/bin/jaotc -J-XX:+UseEpsilonGC --info HelloWorld.class > --output hello-epsilon.so > $ build/linux-x86_64-server-release/images/jdk/bin/java -XX:+UnlockExperimentalVMOptions > -XX:+UseEpsilonGC -Xlog:gc -XX:+UseAOT -XX:AOTLibrary=./hello-epsilon.so HelloWorld > HelloWorld > [0.004s][info][gc] Resizeable heap; starting at 2009M, max: 30718M, step: 128M > [0.004s][info][gc] Using TLAB allocation; max: 4096K > [0.004s][info][gc] Elastic TLABs enabled; elasticity: 1.10x > [0.004s][info][gc] Elastic TLABs decay enabled; decay time: 1000ms > [0.004s][info][gc] Using Epsilon > [0.038s][info][gc] Heap: 30718M reserved, 2009M (6.54%) committed, 698K (0.00%) used > > real 0m0.048s > user 0m0.064s > sys 0m0.014s > > > I am confused what to do next. Some process questions: > > a) Where do I propose the patch? As GitHub PR to oracle/graal, is that right? Yes, that?s where the Graal changes should go. > b) The change requires adjustments in JVMCI, how is that handled? I assume JVMCI and Graal changes > are done independently? In that case, there is a bit of circularity here: I cannot put JVMCI change > in without breaking runs with Epsilon for a while, and cannot put Epsilon changes in before JVMCI is > updated? I think you can make the Graal changes independently of the JVMCI changes with this in GraalHotSpotVMConfig: public final boolean useEpsilonGC = getFlag("UseEpsilonGC", Boolean.class, false); That means the JVMCI patch can be submitted separately. > c) Pretty sure current patch fails some write barrier verification, because verification assumes > either G1 or CardTable-based BarrierSet. Do we expect to clean up verification before the Epsilon > patch, or can it be done within the patch? Roman has volunteered to look into this so hopefully you can co-ordinate with him. I don?t see any problem with doing it all in one PR. > d) How do we run Graal tests (especially given the need for JVMCI adjustments)? Are they run > automatically on PR proposal? There are a few tests run in the Travis gate on a GitHub PR but I doubt these would be enough for what you want. We perform a bunch more testing when integrating a Graal PR internally. Any issues discovered there will be post to the GitHub PR, hopefully with commands to reproduce. One process option is to submit a normal JDK webrev with both JVMCI and Graal changes at the same time as submitting a Graal GitHub PR. This allows you to do whatever testing you want in the normal OpenJDK workflow. During the periodic Graal syncs to OpenJDK (which are thankfully becoming more frequent thanks to Jesper Wilhelmsson) , the Graal changes in OpenJDK will simply be overwritten. Hope that helps! -Doug -------------- next part -------------- An HTML attachment was scrubbed... URL: From shade at redhat.com Thu Apr 11 18:06:21 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 11 Apr 2019 20:06:21 +0200 Subject: Epsilon + Graal In-Reply-To: <8CC4C60B-9F75-4D42-8FE2-992968A95AF6@oracle.com> References: <15ec222f-22a0-49b3-4c7d-29d477dd3c19@redhat.com> <8CC4C60B-9F75-4D42-8FE2-992968A95AF6@oracle.com> Message-ID: <885935f1-62a1-4acc-5817-24c956d4647b@redhat.com> Hi Doug, Some more questions, if you will: On 4/11/19 1:20 PM, Doug Simon wrote: >> b) The change requires adjustments in JVMCI, how is that handled? I assume JVMCI and Graal changes >> are done independently? In that case, there is a bit of circularity here: I cannot put JVMCI change >> in without breaking runs with Epsilon for a while, and cannot put Epsilon changes in before JVMCI is >> updated? > > I think you can make the Graal changes independently of the JVMCI changes with this in > GraalHotSpotVMConfig: > > ? ??public final boolean useEpsilonGC = getFlag("UseEpsilonGC", Boolean.class, false); > > That means the JVMCI patch can be submitted separately. Yes, but that would mean I cannot run Graal tests with Epsilon enabled, or? > One process option is to submit a normal JDK webrev with both JVMCI and Graal changes at the same > time as submitting a Graal GitHub PR. This allows you to do whatever testing you want in the normal > OpenJDK workflow. During the periodic Graal syncs to OpenJDK (which are thankfully becoming more > frequent thanks to Jesper Wilhelmsson) , the Graal changes in OpenJDK will simply be overwritten. Oh, that's nice. So, can I develop the change in jdk/jdk, and then PR the Graal subset of it to oracle/graal github? That would definitely work better for my workflow. Is there a way to run Graal unit tests from jdk/jdk? Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From doug.simon at oracle.com Thu Apr 11 18:33:32 2019 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 11 Apr 2019 20:33:32 +0200 Subject: Epsilon + Graal In-Reply-To: <885935f1-62a1-4acc-5817-24c956d4647b@redhat.com> References: <15ec222f-22a0-49b3-4c7d-29d477dd3c19@redhat.com> <8CC4C60B-9F75-4D42-8FE2-992968A95AF6@oracle.com> <885935f1-62a1-4acc-5817-24c956d4647b@redhat.com> Message-ID: <11CE0B87-6627-482F-A996-1E9305F46BCB@oracle.com> > On 11 Apr 2019, at 20:06, Aleksey Shipilev wrote: > > Hi Doug, > > Some more questions, if you will: > > On 4/11/19 1:20 PM, Doug Simon wrote: >>> b) The change requires adjustments in JVMCI, how is that handled? I assume JVMCI and Graal changes >>> are done independently? In that case, there is a bit of circularity here: I cannot put JVMCI change >>> in without breaking runs with Epsilon for a while, and cannot put Epsilon changes in before JVMCI is >>> updated? >> >> I think you can make the Graal changes independently of the JVMCI changes with this in >> GraalHotSpotVMConfig: >> >> public final boolean useEpsilonGC = getFlag("UseEpsilonGC", Boolean.class, false); >> >> That means the JVMCI patch can be submitted separately. > > Yes, but that would mean I cannot run Graal tests with Epsilon enabled, or? Correct. >> One process option is to submit a normal JDK webrev with both JVMCI and Graal changes at the same >> time as submitting a Graal GitHub PR. This allows you to do whatever testing you want in the normal >> OpenJDK workflow. During the periodic Graal syncs to OpenJDK (which are thankfully becoming more >> frequent thanks to Jesper Wilhelmsson) , the Graal changes in OpenJDK will simply be overwritten. > > Oh, that's nice. So, can I develop the change in jdk/jdk, and then PR the Graal subset of it to > oracle/graal github? That would definitely work better for my workflow. Is there a way to run Graal > unit tests from jdk/jdk? Yes, although I?ve never mastered it. There is test/hotspot/jtreg/compiler/graalunit/README.md. I?m not sure complete or up to date it is. I?ve cc?ed Katya who may be able to help with any missing info. -Doug From vladimir.kozlov at oracle.com Thu Apr 11 19:17:02 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 11 Apr 2019 12:17:02 -0700 Subject: Epsilon + Graal In-Reply-To: <11CE0B87-6627-482F-A996-1E9305F46BCB@oracle.com> References: <15ec222f-22a0-49b3-4c7d-29d477dd3c19@redhat.com> <8CC4C60B-9F75-4D42-8FE2-992968A95AF6@oracle.com> <885935f1-62a1-4acc-5817-24c956d4647b@redhat.com> <11CE0B87-6627-482F-A996-1E9305F46BCB@oracle.com> Message-ID: <0f82ac60-f7de-03ea-f8f2-c85585b7a3ed@oracle.com> For testing graalunit in JDK I do: MYDIR=$PWD make images CONF=fastdebug make test-image-hotspot-jtreg-graal CONF=fastdebug cd open/test/hotspot/jtreg Run jtreg with -Dgraalunit.libs=$MYDIR/build/fastdebug/images/test/hotspot/jtreg/graal/ compiler/graalunit There is also 'make test' command to run in top directory but I forgot flags for it. Vladimir On 4/11/19 11:33 AM, Doug Simon wrote: > > >> On 11 Apr 2019, at 20:06, Aleksey Shipilev wrote: >> >> Hi Doug, >> >> Some more questions, if you will: >> >> On 4/11/19 1:20 PM, Doug Simon wrote: >>>> b) The change requires adjustments in JVMCI, how is that handled? I assume JVMCI and Graal changes >>>> are done independently? In that case, there is a bit of circularity here: I cannot put JVMCI change >>>> in without breaking runs with Epsilon for a while, and cannot put Epsilon changes in before JVMCI is >>>> updated? >>> >>> I think you can make the Graal changes independently of the JVMCI changes with this in >>> GraalHotSpotVMConfig: >>> >>> public final boolean useEpsilonGC = getFlag("UseEpsilonGC", Boolean.class, false); >>> >>> That means the JVMCI patch can be submitted separately. >> >> Yes, but that would mean I cannot run Graal tests with Epsilon enabled, or? > > Correct. > >>> One process option is to submit a normal JDK webrev with both JVMCI and Graal changes at the same >>> time as submitting a Graal GitHub PR. This allows you to do whatever testing you want in the normal >>> OpenJDK workflow. During the periodic Graal syncs to OpenJDK (which are thankfully becoming more >>> frequent thanks to Jesper Wilhelmsson) , the Graal changes in OpenJDK will simply be overwritten. >> >> Oh, that's nice. So, can I develop the change in jdk/jdk, and then PR the Graal subset of it to >> oracle/graal github? That would definitely work better for my workflow. Is there a way to run Graal >> unit tests from jdk/jdk? > > Yes, although I?ve never mastered it. There is test/hotspot/jtreg/compiler/graalunit/README.md. I?m not sure complete or up to date it is. I?ve cc?ed Katya who may be able to help with any missing info. > > -Doug > From lutz.schmidt at sap.com Thu Apr 11 21:24:22 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Thu, 11 Apr 2019 21:24:22 +0000 Subject: RFR(L): 8213084: Rework and enhance Print[Opto]Assembly output Message-ID: <7814AA8A-B0CC-48B3-8889-0F9EB3E3EB5C@sap.com> Dear All, this topic was discussed back in Nov/Dec 2018: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-November/031552.html http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-December/031641.html Purpose of the discussion was to find out if my ideas are at all regarded useful and desirable. The result was mixed, some pro, some con. I let the input from back then influence my work of the last months. In particular, output verbosity can be controlled in a wide range now. In addition to the general -XX:+Print* switches, the amount of output can be adjusted by newly introduced -XX:PrintAssemblyOptions. Here is the list (with default settings): PrintAssemblyOptions help: hsdis-print-raw test plugin by requesting raw output (deprecated) hsdis-print-raw-xml test plugin by requesting raw xml (deprecated) hsdis-print-pc turn off PC printing (on by default) (deprecated) hsdis-print-bytes turn on instruction byte output (deprecated) hsdis-show-pc toggle printing current pc, currently ON hsdis-show-offset toggle printing current offset, currently OFF hsdis-show-bytes toggle printing instruction bytes, currently OFF hsdis-show-data-hex toggle formatting data as hex, currently ON hsdis-show-data-int toggle formatting data as int, currently OFF hsdis-show-data-float toggle formatting data as float, currently OFF hsdis-show-structs toggle compiler data structures, currently OFF hsdis-show-comment toggle instruction comments, currently OFF hsdis-show-block-comment toggle block comments, currently OFF hsdis-align-instr toggle instruction alignment, currently OFF Finally, I have pushed my changes to a state where I can dare to request your comments and reviews. I would like to suggest and request that we first focus on the effects (i.e. the generated output) of the changes. Once we got that adjusted and accepted, we can check the actual implementation and add improvements there. Sounds like a plan? Here is what you get: The machine code generated by the JVM can be printed in three different formats: - Hexadecimal. This is basically a hex dump of the memory range containing the code. This format is always available (PRODUCT and not-PRODUCT builds), regardless of the availability of a disassembler library. It applies to all sorts of code, be it blobs, stubs, compiled nmethods, ... This format seems useless at first glance, but it is not. In an upcoming, separate enhancement, the JVM will be made capable of reading files containing such code blocks and disassembling them post mortem. The most prominent example is an hs_err* file. - Disassembled. This is an assembly listing of the instructions as found in the memory range occupied by the blob, stub, compiled nmethod ... As a prerequisite, a suitable disassembler library (hsdis-.so) must be available at runtime. Most often, that will only be the case in test environments. If no disassembler library is available, hexadecimal output is used as fallback. - OptoAssembly. This is a meta code listing created only by the C2 compiler. As it is somewhat closer to the Java code, it may be helpful in linking assembly code to Java code. All three formats can be merged with additional information, most prominently compiler-internal "knowledge" about blocks, related bytecodes, statistics counters, and much more. Following the code itself, compiler-internal data structures, like oop maps, relocations, scopes, dependencies, exception handlers, are printed to aid in debugging. The full set of information is available in non-PRODUCT builds. PRODUCT builds do not support OptoAssembly output. Data structures are unavailable as well. So how does the output actually look like? Here are a few small snippets (linuxx86_64) to give you an idea. The complete output of an entire C2-compiled method, in multiple verbosity variants, is available here: http://cr.openjdk.java.net/~lucy/webrevs/8213084/ OptoAssembly output for reference (always on with PrintAssembly): ================================================================= 036 B2: # out( B7 B3 ) <- in( B1 ) Freq: 1 036 movl RBP, [RSI + #12 (8-bit)] # compressed ptr ! Field: java/lang/String.value (constant) 039 movl R11, [RBP + #12 (8-bit)] # range 03d NullCheck RBP 03d B3: # out( B6 B4 ) <- in( B2 ) Freq: 0.999999 03d cmpl RDX, R11 # unsigned 040 jnb,us B6 P=0.000000 C=5375.000000 PrintAssembly with no disassembler library available: ===================================================== [Code] [Entry Point] 0x00007fc74d1d7b20: 448b 5608 49c1 e203 493b c20f 856f 69e7 ff90 9090 9090 9090 9090 9090 9090 9090 [Verified Entry Point] 0x00007fc74d1d7b40: 8984 2400 a0fe ff55 4883 ec20 440f be5e 1445 85db 7521 8b6e 0c44 8b5d 0c41 3bd3 0x00007fc74d1d7b60: 732c 0fb6 4415 1048 83c4 205d 4d8b 9728 0100 0041 8502 c348 8bee 8914 2444 895c 0x00007fc74d1d7b80: 2404 be4d ffff ffe8 1483 e7ff 0f0b bee5 ffff ff89 5424 04e8 0483 e7ff 0f0b bef6 0x00007fc74d1d7ba0: ffff ff89 5424 04e8 f482 e7ff 0f0b f4f4 f4f4 f4f4 f4f4 f4f4 f4f4 f4f4 f4f4 f4f4 [Exception Handler] 0x00007fc74d1d7bc0: e95b 0df5 ffe8 0000 0000 4883 2c24 05e9 0c7d e7ff [End] PrintAssembly with minimal verbosity: ===================================== 0x00007f0434b89bd6: mov 0xc(%rsi),%ebp 0x00007f0434b89bd9: mov 0xc(%rbp),%r11d 0x00007f0434b89bdd: cmp %r11d,%edx 0x00007f0434b89be0: jae 0x00007f0434b89c0e PrintAssembly (previous plus code offsets from code begin): =========================================================== 0x00007f63c11d7956 (+0x36): mov 0xc(%rsi),%ebp 0x00007f63c11d7959 (+0x39): mov 0xc(%rbp),%r11d 0x00007f63c11d795d (+0x3d): cmp %r11d,%edx 0x00007f63c11d7960 (+0x40): jae 0x00007f63c11d798e PrintAssembly (previous plus block comments): =========================================================== ;; B2: # out( B7 B3 ) <- in( B1 ) Freq: 1 0x00007f48211d76d6 (+0x36): mov 0xc(%rsi),%ebp 0x00007f48211d76d9 (+0x39): mov 0xc(%rbp),%r11d ;; B3: # out( B6 B4 ) <- in( B2 ) Freq: 0.999999 0x00007f48211d76dd (+0x3d): cmp %r11d,%edx 0x00007f48211d76e0 (+0x40): jae 0x00007f48211d770e PrintAssembly (previous plus instruction comments): =========================================================== ;; B2: # out( B7 B3 ) <- in( B1 ) Freq: 1 0x00007fc3e11d7a56 (+0x36): mov 0xc(%rsi),%ebp ;*getfield value {reexecute=0 rethrow=0 return_oop=0} ; - java.lang.String::charAt at 8 (line 702) 0x00007fc3e11d7a59 (+0x39): mov 0xc(%rbp),%r11d ; implicit exception: dispatches to 0x00007fc3e11d7a9e ;; B3: # out( B6 B4 ) <- in( B2 ) Freq: 0.999999 0x00007fc3e11d7a5d (+0x3d): cmp %r11d,%edx 0x00007fc3e11d7a60 (+0x40): jae 0x00007fc3e11d7a8e For completeness, here are the links to Bug: https://bugs.openjdk.java.net/browse/JDK-8213084 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8213084.00/ But please, as mentioned above, first focus on the output. The nitty details of the implementation I would like to discuss after the output format has received some support. Thank you so much for your time! Lutz From vladimir.x.ivanov at oracle.com Thu Apr 11 23:42:55 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 11 Apr 2019 16:42:55 -0700 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: <182d87da-0d99-3f33-fbe7-ef5818be0422@loongson.cn> References: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> <182d87da-0d99-3f33-fbe7-ef5818be0422@loongson.cn> Message-ID: > Fixed in http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.03/ I like it. What do you think about the following version? http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.00/ Best regards, Vladimir Ivanov From fujie at loongson.cn Fri Apr 12 02:27:46 2019 From: fujie at loongson.cn (Jie Fu) Date: Fri, 12 Apr 2019 10:27:46 +0800 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: References: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> <182d87da-0d99-3f33-fbe7-ef5818be0422@loongson.cn> Message-ID: <0936427d-f4d2-299a-87ce-860dce5e57e1@loongson.cn> Hi Vladimir, >> Fixed in >> http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.03/ > > I like it. What do you think about the following version? > > ? http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.00/ It is more clearer and easier to understand. I prefer your version. One question: I'm not sure if the following condition still holds with parallel execution of the caller. --------------------------------------------- if (caller_method->was_executed_more_than(1))? return false; // trust profile --------------------------------------------- For example, assuming that the caller methods was executed concurrently by 12 threads, is it possible that caller_method->interpreter_invocation_count()=3 && profile.count()=0 && no exception thrown earlier? Thanks a lot. Best regards, Jie From shade at redhat.com Fri Apr 12 13:31:41 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 12 Apr 2019 15:31:41 +0200 Subject: RFR (XS) 8222397: x86_32 tests with UseSHA1Intrinsics SEGV due to garbled registers Message-ID: <36a5e8e0-b91c-67c6-931e-e3ab92f9c4d3@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8222397 Fix: http://cr.openjdk.java.net/~shade/8222397/webrev.01/ This basically does what is already done in generate_sha256_implCompress: save the registers before they are foobared by the runtime call. Testing: tier{1,2} on Linux {x86_64, x86_32}, jdk-submit -- Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From dmitrij.pochepko at bell-sw.com Fri Apr 12 15:24:42 2019 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Fri, 12 Apr 2019 18:24:42 +0300 Subject: RFR(XXS): 8222412: AARCH64: lse atomics encoding is not accepting zr as source Message-ID: <7b2c42d6-8309-23ab-776c-132c1a2f4baf@bell-sw.com> Hi all, please review small fix for 8222412: AARCH64: lse atomics encoding is not accepting zr as source webrev: http://cr.openjdk.java.net/~dpochepk/8222412/webrev.01/ Current encoding for lse atomics hits assert when trying to use zr as source register while it is allowed by spec. Current vm doesn't use atomics with zr and this problem is not triggered. Testing: I generated lse atomics with zr as source register. No assert observed with patched vm. CR: https://bugs.openjdk.java.net/browse/JDK-8222412 Thanks, Dmitrij -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Fri Apr 12 15:49:19 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 12 Apr 2019 08:49:19 -0700 Subject: RFR (XS) 8222397: x86_32 tests with UseSHA1Intrinsics SEGV due to garbled registers In-Reply-To: <36a5e8e0-b91c-67c6-931e-e3ab92f9c4d3@redhat.com> References: <36a5e8e0-b91c-67c6-931e-e3ab92f9c4d3@redhat.com> Message-ID: <64f89d09-7718-9cf4-4b15-66c3c0757ea4@oracle.com> Hi Aleksey, With change you don't need push/pop limit which is rdi. And please add comment (false /*restoring*/). Thanks, Vladimir On 4/12/19 6:31 AM, Aleksey, Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8222397 > > Fix: > http://cr.openjdk.java.net/~shade/8222397/webrev.01/ > > This basically does what is already done in generate_sha256_implCompress: save the registers before > they are foobared by the runtime call. > > Testing: tier{1,2} on Linux {x86_64, x86_32}, jdk-submit > From gnu.andrew at redhat.com Fri Apr 12 17:01:36 2019 From: gnu.andrew at redhat.com (Andrew John Hughes) Date: Fri, 12 Apr 2019 18:01:36 +0100 Subject: [8u] RFR: 8221355: Performance regression after JDK-8155635 backport into 8u In-Reply-To: References: <464fcc0f70569534892d11976a857cd3eb7a9494.camel@redhat.com> <7f2abab8-0998-8e12-0c33-faa35216b6dc@redhat.com> Message-ID: On 10/04/2019 20:29, Gil Tene wrote: > Before we jump ahead and integrate this into the upcoming April 8u212, and bump the > build number from b03 to b04, I'd like to point out that the Oracle backport > (https://bugs.openjdk.java.net/browse/JDK-8221954) appears to be to their 8u212 b31, > which is likely a BPR, and not the initial version of 8u212 that will be out in a week. > > So the premise that not including this in the initial version of OpenJDK 8u212 will lead to a > performance regression compared Oracle's 8u212 is not quite right. The difference will > happen when oracle publishes their 8u212-b31. > > Given the time crunch, the risk, and the fact that Oracle's April 16 update will likely NOT > include this fix to the regression initially introduced in 8u202, I would advocate to still build > a 8u212-b03, which would not include this back-port, and to either do a b04 later, or push > this to July. > > ? Gil. > We'll be pushing up to jdk8u212-b04 on Tuesday. OpenJDK builds are being created for each build, but b04 may have to be released shortly after the unembargo. Others producing builds ahead of time are free to make their own call on this. If you don't have time to build & test b04, there is no issue with building b03 as that contains all the security updates. I don't see how that's any different to Oracle's situation, where they have clearly tagged a build of 8u212 with this fix in it, but, judging by the build number, it may not be available on Tuesday. We can always, of course, make the decision to point jdk8u212-ga at jdk8u212-b03. Again, there's precedent for that in the OpenJDK codebase from Oracle's tenure. -- Andrew :) Senior Free Java Software Engineer Red Hat, Inc. (http://www.redhat.com) PGP Key: ed25519/0xCFDA0F9B35964222 (hkp://keys.gnupg.net) Fingerprint = 5132 579D D154 0ED2 3E04 C5A0 CFDA 0F9B 3596 4222 https://keybase.io/gnu_andrew -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: OpenPGP digital signature URL: From ekaterina.pavlova at oracle.com Fri Apr 12 19:32:50 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Fri, 12 Apr 2019 12:32:50 -0700 Subject: RFR (T/XXS) 8208066: compiler/graalunit/JttThreadsTest.java failed with org.junit.runners.model.TestTimedOutException: test timed out after 20 seconds Message-ID: <986e9e53-3cad-d4dc-a729-7e9972d2389b@oracle.com> Hi All, some org.graalvm.compiler.jtt.threads tests set default timeout by calling createTimeoutSeconds(..). This default timeout could be not enough on slow machines or when running tests in slow configurations. Graal unit tests timeout should take into account timeout factor. Please review this small change which sets Graal unit tests timeout factor based on jtreg harness settings. JBS: https://bugs.openjdk.java.net/browse/JDK-8208066 webrev: http://cr.openjdk.java.net/~epavlova//8208066/webrev.00/index.html testing: run graalunit tests in default and -Xcomp configurations. thanks, -katya From igor.ignatyev at oracle.com Fri Apr 12 21:14:12 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 12 Apr 2019 14:14:12 -0700 Subject: RFR (T/XXS) 8208066: compiler/graalunit/JttThreadsTest.java failed with org.junit.runners.model.TestTimedOutException: test timed out after 20 seconds In-Reply-To: <986e9e53-3cad-d4dc-a729-7e9972d2389b@oracle.com> References: <986e9e53-3cad-d4dc-a729-7e9972d2389b@oracle.com> Message-ID: <32164A22-7E6E-44A3-A6F5-4617CB77CA86@oracle.com> Looks good. -- Igor > On Apr 12, 2019, at 12:32 PM, Ekaterina Pavlova wrote: > > Hi All, > > some org.graalvm.compiler.jtt.threads tests set default timeout by calling createTimeoutSeconds(..). > This default timeout could be not enough on slow machines or when running tests in slow configurations. > Graal unit tests timeout should take into account timeout factor. > Please review this small change which sets Graal unit tests timeout factor based on jtreg harness settings. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8208066 > webrev: http://cr.openjdk.java.net/~epavlova//8208066/webrev.00/index.html > testing: run graalunit tests in default and -Xcomp configurations. > > thanks, > -katya From dms at samersoff.net Sun Apr 14 09:38:48 2019 From: dms at samersoff.net (Dmitry Samersoff) Date: Sun, 14 Apr 2019 12:38:48 +0300 Subject: RFR (XS) 8222397: x86_32 tests with UseSHA1Intrinsics SEGV due to garbled registers In-Reply-To: <64f89d09-7718-9cf4-4b15-66c3c0757ea4@oracle.com> References: <36a5e8e0-b91c-67c6-931e-e3ab92f9c4d3@redhat.com> <64f89d09-7718-9cf4-4b15-66c3c0757ea4@oracle.com> Message-ID: <73dea74f-b0d2-a51d-b603-fcd09cbd1f32@samersoff.net> Aleksey, Please, keep empty lines around handleSOERegisters. -Dmitry On 12.04.2019 18:49, Vladimir Kozlov wrote: > Hi Aleksey, > > With change you don't need push/pop limit which is rdi. And please add > comment (false /*restoring*/). > > Thanks, > Vladimir > > On 4/12/19 6:31 AM, Aleksey, Shipilev wrote: >> Bug: >> ?? https://bugs.openjdk.java.net/browse/JDK-8222397 >> >> Fix: >> ?? http://cr.openjdk.java.net/~shade/8222397/webrev.01/ >> >> This basically does what is already done in >> generate_sha256_implCompress: save the registers before >> they are foobared by the runtime call. >> >> Testing: tier{1,2} on Linux {x86_64, x86_32}, jdk-submit >> From dms at samersoff.net Sun Apr 14 14:28:08 2019 From: dms at samersoff.net (Dmitry Samersoff) Date: Sun, 14 Apr 2019 17:28:08 +0300 Subject: RFR(XXS): 8222412: AARCH64: lse atomics encoding is not accepting zr as source In-Reply-To: <7b2c42d6-8309-23ab-776c-132c1a2f4baf@bell-sw.com> References: <7b2c42d6-8309-23ab-776c-132c1a2f4baf@bell-sw.com> Message-ID: <5dc18dc6-6657-35b6-27c4-a0f32ec2b8a2@samersoff.net> Dmitrij, The patch looks good to me. -Dmitry On 12.04.2019 18:24, Dmitrij Pochepko wrote: > Hi all, > > please review small fix for 8222412: AARCH64: lse atomics encoding is > not accepting zr as source > > webrev: http://cr.openjdk.java.net/~dpochepk/8222412/webrev.01/ > > Current encoding for lse atomics hits assert when trying to use zr as > source register while it is allowed by spec. Current vm doesn't use > atomics with zr and this problem is not triggered. > > > Testing: > > I generated lse atomics with zr as source register. No assert observed > with patched vm. > > > CR: https://bugs.openjdk.java.net/browse/JDK-8222412 > > Thanks, > Dmitrij > > From shade at redhat.com Sun Apr 14 17:36:43 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Sun, 14 Apr 2019 19:36:43 +0200 Subject: RFR (XS) 8222397: x86_32 tests with UseSHA1Intrinsics SEGV due to garbled registers In-Reply-To: <73dea74f-b0d2-a51d-b603-fcd09cbd1f32@samersoff.net> References: <36a5e8e0-b91c-67c6-931e-e3ab92f9c4d3@redhat.com> <64f89d09-7718-9cf4-4b15-66c3c0757ea4@oracle.com> <73dea74f-b0d2-a51d-b603-fcd09cbd1f32@samersoff.net> Message-ID: On 4/14/19 11:38 AM, Dmitry Samersoff wrote: > Please, keep empty lines around handleSOERegisters. I'd rather match the style, and make sure it leans to the same block as in other methods. > On 12.04.2019 18:49, Vladimir Kozlov wrote: >> With change you don't need push/pop limit which is rdi. And please add >> comment (false /*restoring*/). Right! Updated. New webrev: http://cr.openjdk.java.net/~shade/8222397/webrev.02 Passes the same testing as the original patch. -Aleksey From tobias.hartmann at oracle.com Mon Apr 15 08:34:53 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 15 Apr 2019 10:34:53 +0200 Subject: [13] RFR(T): 8222418: compiler/arguments/TestScavengeRootsInCode.java times out Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8222418 http://cr.openjdk.java.net/~thartmann/8222418/webrev.00/ The test sets -XX:-TieredCompilation -Xcomp and should therefore not be executed with Graal as JIT. Otherwise all Graal methods will be compiled by Graal itself running in interpreter mode which is very slow and causes the test to time out. Thanks, Tobias From tobias.hartmann at oracle.com Mon Apr 15 08:34:54 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 15 Apr 2019 10:34:54 +0200 Subject: [13] RFR(T): 8222417: compiler/loopopts/TestOverunrolling.java times out Message-ID: <477c16f2-11a2-2740-a780-ded7794ebc27@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8222417 http://cr.openjdk.java.net/~thartmann/8222417/webrev.00/ The test sets -XX:-TieredCompilation -Xcomp and should therefore not be executed with Graal as JIT. Otherwise all Graal methods will be compiled by Graal itself running in interpreter mode which is very slow and causes the test to time out. Thanks, Tobias From robbin.ehn at oracle.com Mon Apr 15 08:58:34 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Mon, 15 Apr 2019 10:58:34 +0200 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: <24A2B80F-D3D4-486A-9E29-8D48D49DF030@oracle.com> References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> <587d1afe-9f86-f801-92de-31711b711034@oracle.com> <24A2B80F-D3D4-486A-9E29-8D48D49DF030@oracle.com> Message-ID: Hi, please review. After reexamine this issue: Threads in native must always have their stack walkable. JFR sampler should never need to make a stack walkable (for native sample). I manage to locally reproduce reliable with changes to JFR sampler and having hundreds of threads running similar code as the in the bug. (Looping creating an array with negative size.) I found a place where we don't proper look at the suspend flags. The java thread can thus escape native and make it's stack unwalkable and later it tries to make it walkable at the same time as the JFR sampler. By removing some kind of fast check and instead always call the check_safepoint_and_suspend_for_native_trans I can no longer reproduce. (which have the JFR native trans suspend check) And it passes t1-5. Code: http://cr.openjdk.java.net/~rehn/8218147/v3/webrev/ Issue: https://bugs.openjdk.java.net/browse/JDK-8218147 Thanks, Robbin On 4/5/19 5:43 PM, Robbin Ehn wrote: > Hi Dean, > > Sorry, I missed this mail. > Yes we can do that. > Ignore my other mail, I'll update. > > Thanks, Robbin > > > dean.long at oracle.com skrev: (5 april 2019 09:22:24 CEST) >> On 4/4/19 5:16 PM, dean.long at oracle.com wrote: >>> >>>>> >>>>> If it's already set, should we check that _last_Java_pc matches the >> >>>>> new value? >>>> >>>> We manually set the pc in several places, so if it's set, it's not >>>> certain that >>>> it should be the same as in last sp. >>>> I can't distinguish between the cases. >>>> >>> >>> If we get pc from sp[-1] then it should match, but you're right, we >>> sometimes get pc from somewhere else. >> >> How about if we combine the !walkable check and the >> capture_last_Java_pc() logic into a single method? >> Then we can do something like: >> >> ??? if (!walkable()) { >> ??????? address pc = (address)_last_Java_sp[-1]; >> ??????? address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL); >> ??????? assert(a == NULL || a == pc, "unexpected PC %p", a); >> ??? } >> >> dl From shade at redhat.com Mon Apr 15 09:59:34 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 15 Apr 2019 11:59:34 +0200 Subject: Epsilon + Graal In-Reply-To: <11CE0B87-6627-482F-A996-1E9305F46BCB@oracle.com> References: <15ec222f-22a0-49b3-4c7d-29d477dd3c19@redhat.com> <8CC4C60B-9F75-4D42-8FE2-992968A95AF6@oracle.com> <885935f1-62a1-4acc-5817-24c956d4647b@redhat.com> <11CE0B87-6627-482F-A996-1E9305F46BCB@oracle.com> Message-ID: On 4/11/19 8:33 PM, Doug Simon wrote: >> Oh, that's nice. So, can I develop the change in jdk/jdk, and then PR the Graal subset of it >> to oracle/graal github? That would definitely work better for my workflow. Is there a way to >> run Graal unit tests from jdk/jdk? > > Yes, although I?ve never mastered it. There is test/hotspot/jtreg/compiler/graalunit/README.md. > I?m not sure complete or up to date it is. I?ve cc?ed Katya who may be able to help with any > missing info. Seems to work like this: $ mkdir graal-test-libs $ cd graal-test-libs $ wget (JARs mentioned in README.md) $ cd .. $ sh ./configure ... --with-graalunit-lib=graal-test-libs/ $ make run-test TEST=compiler/graalunit TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -Djvmci.Compiler=graal" With one little wrinkle: https://bugs.openjdk.java.net/browse/JDK-8222482 -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From richard.reingruber at sap.com Mon Apr 15 10:09:01 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Mon, 15 Apr 2019 10:09:01 +0000 Subject: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Message-ID: Hi, please review and sponsor this small enhancement of c2 array clearing on s390. The c2 instruction forms inlineCallClearArrayConstBig and inlineCallClearArray use the register pair R4,R5 as source operand to a move long extended (mvcle) instruction that clears the destination array. To do so the source length (R5) is set to 0 and 0 is used for padding. The s390 manual (Principles of Operation[1]) states that if the source length is 0, then the value in the register used for the source address is not changed and no access exceptions for that operand are recognized. In other words: it is completely ignored. This allows to take any odd register for the source length and to remove the source address operand completely: Bug: https://bugs.openjdk.java.net/browse/JDK-8222271 Webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8222271/webrev/ Thanks, Richard. [1] https://www.ibm.com/support/libraryserver/download/dz9zr006.pdf#G13.1223008 From shade at redhat.com Mon Apr 15 10:16:08 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 15 Apr 2019 12:16:08 +0200 Subject: Epsilon + Graal In-Reply-To: References: <15ec222f-22a0-49b3-4c7d-29d477dd3c19@redhat.com> <8CC4C60B-9F75-4D42-8FE2-992968A95AF6@oracle.com> <885935f1-62a1-4acc-5817-24c956d4647b@redhat.com> <11CE0B87-6627-482F-A996-1E9305F46BCB@oracle.com> Message-ID: <1e64cf85-84e8-edab-8fe7-a7a4a51b8cd9@redhat.com> On 4/15/19 11:59 AM, Aleksey Shipilev wrote: > On 4/11/19 8:33 PM, Doug Simon wrote: >>> Oh, that's nice. So, can I develop the change in jdk/jdk, and then PR the Graal subset of it >>> to oracle/graal github? That would definitely work better for my workflow. Is there a way to >>> run Graal unit tests from jdk/jdk? >> >> Yes, although I?ve never mastered it. There is test/hotspot/jtreg/compiler/graalunit/README.md. >> I?m not sure complete or up to date it is. I?ve cc?ed Katya who may be able to help with any >> missing info. > Seems to work like this: > > $ mkdir graal-test-libs > $ cd graal-test-libs > $ wget (JARs mentioned in README.md) > $ cd .. > > $ sh ./configure ... --with-graalunit-lib=graal-test-libs/ > $ make run-test TEST=compiler/graalunit TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions > -XX:+EnableJVMCI -XX:+UseJVMCICompiler -Djvmci.Compiler=graal" > > With one little wrinkle: > https://bugs.openjdk.java.net/browse/JDK-8222482 ...well, maybe with another one: https://bugs.openjdk.java.net/browse/JDK-8222483 -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From dmitrij.pochepko at bell-sw.com Mon Apr 15 10:20:22 2019 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Mon, 15 Apr 2019 13:20:22 +0300 Subject: RFR(XXS): 8222412: AARCH64: lse atomics encoding is not accepting zr as source In-Reply-To: <5dc18dc6-6657-35b6-27c4-a0f32ec2b8a2@samersoff.net> References: <7b2c42d6-8309-23ab-776c-132c1a2f4baf@bell-sw.com> <5dc18dc6-6657-35b6-27c4-a0f32ec2b8a2@samersoff.net> Message-ID: Thank you for review! On 14/04/2019 5:28 PM, Dmitry Samersoff wrote: > Dmitrij, > > The patch looks good to me. > > -Dmitry > > On 12.04.2019 18:24, Dmitrij Pochepko wrote: >> Hi all, >> >> please review small fix for 8222412: AARCH64: lse atomics encoding is >> not accepting zr as source >> >> webrev: http://cr.openjdk.java.net/~dpochepk/8222412/webrev.01/ >> >> Current encoding for lse atomics hits assert when trying to use zr as >> source register while it is allowed by spec. Current vm doesn't use >> atomics with zr and this problem is not triggered. >> >> >> Testing: >> >> I generated lse atomics with zr as source register. No assert observed >> with patched vm. >> >> >> CR: https://bugs.openjdk.java.net/browse/JDK-8222412 >> >> Thanks, >> Dmitrij >> >> From dms at samersoff.net Mon Apr 15 11:39:21 2019 From: dms at samersoff.net (Dmitry Samersoff) Date: Mon, 15 Apr 2019 14:39:21 +0300 Subject: RFR (XS) 8222397: x86_32 tests with UseSHA1Intrinsics SEGV due to garbled registers In-Reply-To: References: <36a5e8e0-b91c-67c6-931e-e3ab92f9c4d3@redhat.com> <64f89d09-7718-9cf4-4b15-66c3c0757ea4@oracle.com> <73dea74f-b0d2-a51d-b603-fcd09cbd1f32@samersoff.net> Message-ID: <33aa5a1c-c3a9-9b15-06a2-7577eef750ab@samersoff.net> Aleksey, > New webrev: > http://cr.openjdk.java.net/~shade/8222397/webrev.02 Looks good to me. -Dmitry On 14.04.2019 20:36, Aleksey Shipilev wrote: > On 4/14/19 11:38 AM, Dmitry Samersoff wrote: >> Please, keep empty lines around handleSOERegisters. > > I'd rather match the style, and make sure it leans to the same block as in other methods. > >> On 12.04.2019 18:49, Vladimir Kozlov wrote: >>> With change you don't need push/pop limit which is rdi. And please add >>> comment (false /*restoring*/). > > Right! Updated. > > New webrev: > http://cr.openjdk.java.net/~shade/8222397/webrev.02 > > Passes the same testing as the original patch. > > -Aleksey > From martin.doerr at sap.com Mon Apr 15 12:47:11 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 15 Apr 2019 12:47:11 +0000 Subject: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays In-Reply-To: References: Message-ID: Hi Richard, your change looks good. Thanks for improving. I think the function parameter names "srcL" and "src_len" are confusing (already in current implementation). Maybe "tmpL" and "odd_tmp_reg" would be better? What do you think? Thanks, Martin -----Original Message----- From: hotspot-compiler-dev On Behalf Of Reingruber, Richard Sent: Montag, 15. April 2019 12:09 To: hotspot-compiler-dev at openjdk.java.net Subject: [CAUTION] RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi, please review and sponsor this small enhancement of c2 array clearing on s390. The c2 instruction forms inlineCallClearArrayConstBig and inlineCallClearArray use the register pair R4,R5 as source operand to a move long extended (mvcle) instruction that clears the destination array. To do so the source length (R5) is set to 0 and 0 is used for padding. The s390 manual (Principles of Operation[1]) states that if the source length is 0, then the value in the register used for the source address is not changed and no access exceptions for that operand are recognized. In other words: it is completely ignored. This allows to take any odd register for the source length and to remove the source address operand completely: Bug: https://bugs.openjdk.java.net/browse/JDK-8222271 Webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8222271/webrev/ Thanks, Richard. [1] https://www.ibm.com/support/libraryserver/download/dz9zr006.pdf#G13.1223008 From shade at redhat.com Mon Apr 15 14:40:36 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 15 Apr 2019 16:40:36 +0200 Subject: Epsilon + Graal In-Reply-To: <1e64cf85-84e8-edab-8fe7-a7a4a51b8cd9@redhat.com> References: <15ec222f-22a0-49b3-4c7d-29d477dd3c19@redhat.com> <8CC4C60B-9F75-4D42-8FE2-992968A95AF6@oracle.com> <885935f1-62a1-4acc-5817-24c956d4647b@redhat.com> <11CE0B87-6627-482F-A996-1E9305F46BCB@oracle.com> <1e64cf85-84e8-edab-8fe7-a7a4a51b8cd9@redhat.com> Message-ID: <9eef414e-f583-8b9e-6734-39506a5a425c@redhat.com> On 4/15/19 12:16 PM, Aleksey Shipilev wrote: >> With one little wrinkle: >> https://bugs.openjdk.java.net/browse/JDK-8222482 > > ...well, maybe with another one: > https://bugs.openjdk.java.net/browse/JDK-8222483 Ignoring these two issues, the following patch passes Graal unit tests with: $ CONF=linux-x86_64-server-fastdebug make run-test TEST=compiler/graalunit/ TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -Djvmci.Compiler=graal -XX:+UseEpsilonGC -Xmx50g" TEST_JOBS=1 Patch: http://cr.openjdk.java.net/~shade/epsilon/graal-initial/webrev.01/ I assume I can RFR it for jdk/jdk, and simultaneously PR src/jdk.internal.vm.compiler parts to oracle/graal GitHub? -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From doug.simon at oracle.com Mon Apr 15 14:51:45 2019 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 15 Apr 2019 16:51:45 +0200 Subject: Epsilon + Graal In-Reply-To: <9eef414e-f583-8b9e-6734-39506a5a425c@redhat.com> References: <15ec222f-22a0-49b3-4c7d-29d477dd3c19@redhat.com> <8CC4C60B-9F75-4D42-8FE2-992968A95AF6@oracle.com> <885935f1-62a1-4acc-5817-24c956d4647b@redhat.com> <11CE0B87-6627-482F-A996-1E9305F46BCB@oracle.com> <1e64cf85-84e8-edab-8fe7-a7a4a51b8cd9@redhat.com> <9eef414e-f583-8b9e-6734-39506a5a425c@redhat.com> Message-ID: <09836F0E-C0A6-4C79-8ED9-8FAED0968DBC@oracle.com> > On 15 Apr 2019, at 16:40, Aleksey Shipilev wrote: > > On 4/15/19 12:16 PM, Aleksey Shipilev wrote: >>> With one little wrinkle: >>> https://bugs.openjdk.java.net/browse/JDK-8222482 >> >> ...well, maybe with another one: >> https://bugs.openjdk.java.net/browse/JDK-8222483 > > Ignoring these two issues, the following patch passes Graal unit tests with: > > $ CONF=linux-x86_64-server-fastdebug make run-test TEST=compiler/graalunit/ > TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler > -Djvmci.Compiler=graal -XX:+UseEpsilonGC -Xmx50g" TEST_JOBS=1 > > Patch: > http://cr.openjdk.java.net/~shade/epsilon/graal-initial/webrev.01/ > > I assume I can RFR it for jdk/jdk, and simultaneously PR src/jdk.internal.vm.compiler parts to > oracle/graal GitHub? Yes, please go ahead. -Doug From richard.reingruber at sap.com Mon Apr 15 15:40:52 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Mon, 15 Apr 2019 15:40:52 +0000 Subject: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays In-Reply-To: References: Message-ID: Hi Martin, thanks for looking at this. > I think the function parameter names "srcL" and "src_len" are confusing (already in current implementation). > Maybe "tmpL" and "odd_tmp_reg" would be better? What do you think? You are absolutely right. I've changed the names. This is the new webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8222271/webrev.1/ Thanks, Richard. -----Original Message----- From: Doerr, Martin Sent: Montag, 15. April 2019 14:47 To: Reingruber, Richard ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi Richard, your change looks good. Thanks for improving. I think the function parameter names "srcL" and "src_len" are confusing (already in current implementation). Maybe "tmpL" and "odd_tmp_reg" would be better? What do you think? Thanks, Martin -----Original Message----- From: hotspot-compiler-dev On Behalf Of Reingruber, Richard Sent: Montag, 15. April 2019 12:09 To: hotspot-compiler-dev at openjdk.java.net Subject: [CAUTION] RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi, please review and sponsor this small enhancement of c2 array clearing on s390. The c2 instruction forms inlineCallClearArrayConstBig and inlineCallClearArray use the register pair R4,R5 as source operand to a move long extended (mvcle) instruction that clears the destination array. To do so the source length (R5) is set to 0 and 0 is used for padding. The s390 manual (Principles of Operation[1]) states that if the source length is 0, then the value in the register used for the source address is not changed and no access exceptions for that operand are recognized. In other words: it is completely ignored. This allows to take any odd register for the source length and to remove the source address operand completely: Bug: https://bugs.openjdk.java.net/browse/JDK-8222271 Webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8222271/webrev/ Thanks, Richard. [1] https://www.ibm.com/support/libraryserver/download/dz9zr006.pdf#G13.1223008 From martin.doerr at sap.com Mon Apr 15 15:49:38 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 15 Apr 2019 15:49:38 +0000 Subject: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays In-Reply-To: References: Message-ID: Hi Richard, thanks for the update. Looks good. I can push it after you got a 2nd review. Best regards, Martin -----Original Message----- From: Reingruber, Richard Sent: Montag, 15. April 2019 17:41 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi Martin, thanks for looking at this. > I think the function parameter names "srcL" and "src_len" are confusing (already in current implementation). > Maybe "tmpL" and "odd_tmp_reg" would be better? What do you think? You are absolutely right. I've changed the names. This is the new webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8222271/webrev.1/ Thanks, Richard. -----Original Message----- From: Doerr, Martin Sent: Montag, 15. April 2019 14:47 To: Reingruber, Richard ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi Richard, your change looks good. Thanks for improving. I think the function parameter names "srcL" and "src_len" are confusing (already in current implementation). Maybe "tmpL" and "odd_tmp_reg" would be better? What do you think? Thanks, Martin -----Original Message----- From: hotspot-compiler-dev On Behalf Of Reingruber, Richard Sent: Montag, 15. April 2019 12:09 To: hotspot-compiler-dev at openjdk.java.net Subject: [CAUTION] RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi, please review and sponsor this small enhancement of c2 array clearing on s390. The c2 instruction forms inlineCallClearArrayConstBig and inlineCallClearArray use the register pair R4,R5 as source operand to a move long extended (mvcle) instruction that clears the destination array. To do so the source length (R5) is set to 0 and 0 is used for padding. The s390 manual (Principles of Operation[1]) states that if the source length is 0, then the value in the register used for the source address is not changed and no access exceptions for that operand are recognized. In other words: it is completely ignored. This allows to take any odd register for the source length and to remove the source address operand completely: Bug: https://bugs.openjdk.java.net/browse/JDK-8222271 Webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8222271/webrev/ Thanks, Richard. [1] https://www.ibm.com/support/libraryserver/download/dz9zr006.pdf#G13.1223008 From vladimir.kozlov at oracle.com Mon Apr 15 15:51:41 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 15 Apr 2019 08:51:41 -0700 Subject: RFR (XS) 8222397: x86_32 tests with UseSHA1Intrinsics SEGV due to garbled registers In-Reply-To: <33aa5a1c-c3a9-9b15-06a2-7577eef750ab@samersoff.net> References: <36a5e8e0-b91c-67c6-931e-e3ab92f9c4d3@redhat.com> <64f89d09-7718-9cf4-4b15-66c3c0757ea4@oracle.com> <73dea74f-b0d2-a51d-b603-fcd09cbd1f32@samersoff.net> <33aa5a1c-c3a9-9b15-06a2-7577eef750ab@samersoff.net> Message-ID: <81A3FEF4-69E3-4FAB-90D4-9780BDCC206B@oracle.com> +1 Thanks Vladimir > On Apr 15, 2019, at 4:39 AM, Dmitry Samersoff wrote: > > Aleksey, > >> New webrev: >> http://cr.openjdk.java.net/~shade/8222397/webrev.02 > > Looks good to me. > > -Dmitry > >> On 14.04.2019 20:36, Aleksey Shipilev wrote: >>> On 4/14/19 11:38 AM, Dmitry Samersoff wrote: >>> Please, keep empty lines around handleSOERegisters. >> >> I'd rather match the style, and make sure it leans to the same block as in other methods. >> >>>> On 12.04.2019 18:49, Vladimir Kozlov wrote: >>>> With change you don't need push/pop limit which is rdi. And please add >>>> comment (false /*restoring*/). >> >> Right! Updated. >> >> New webrev: >> http://cr.openjdk.java.net/~shade/8222397/webrev.02 >> >> Passes the same testing as the original patch. >> >> -Aleksey >> From vladimir.kozlov at oracle.com Mon Apr 15 15:56:06 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 15 Apr 2019 08:56:06 -0700 Subject: [13] RFR(T): 8222417: compiler/loopopts/TestOverunrolling.java times out In-Reply-To: <477c16f2-11a2-2740-a780-ded7794ebc27@oracle.com> References: <477c16f2-11a2-2740-a780-ded7794ebc27@oracle.com> Message-ID: <93E0CC12-A81C-4643-8CD9-88FAF6CF1D63@oracle.com> Good. Thanks Vladimir > On Apr 15, 2019, at 1:34 AM, Tobias Hartmann wrote: > > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8222417 > http://cr.openjdk.java.net/~thartmann/8222417/webrev.00/ > > The test sets -XX:-TieredCompilation -Xcomp and should therefore not be executed with Graal as JIT. > Otherwise all Graal methods will be compiled by Graal itself running in interpreter mode which is > very slow and causes the test to time out. > > Thanks, > Tobias From richard.reingruber at sap.com Mon Apr 15 15:56:53 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Mon, 15 Apr 2019 15:56:53 +0000 Subject: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays In-Reply-To: References: Message-ID: Thanks Martin. Cheers, Richard. -----Original Message----- From: Doerr, Martin Sent: Montag, 15. April 2019 17:50 To: Reingruber, Richard ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi Richard, thanks for the update. Looks good. I can push it after you got a 2nd review. Best regards, Martin -----Original Message----- From: Reingruber, Richard Sent: Montag, 15. April 2019 17:41 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi Martin, thanks for looking at this. > I think the function parameter names "srcL" and "src_len" are confusing (already in current implementation). > Maybe "tmpL" and "odd_tmp_reg" would be better? What do you think? You are absolutely right. I've changed the names. This is the new webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8222271/webrev.1/ Thanks, Richard. -----Original Message----- From: Doerr, Martin Sent: Montag, 15. April 2019 14:47 To: Reingruber, Richard ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi Richard, your change looks good. Thanks for improving. I think the function parameter names "srcL" and "src_len" are confusing (already in current implementation). Maybe "tmpL" and "odd_tmp_reg" would be better? What do you think? Thanks, Martin -----Original Message----- From: hotspot-compiler-dev On Behalf Of Reingruber, Richard Sent: Montag, 15. April 2019 12:09 To: hotspot-compiler-dev at openjdk.java.net Subject: [CAUTION] RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi, please review and sponsor this small enhancement of c2 array clearing on s390. The c2 instruction forms inlineCallClearArrayConstBig and inlineCallClearArray use the register pair R4,R5 as source operand to a move long extended (mvcle) instruction that clears the destination array. To do so the source length (R5) is set to 0 and 0 is used for padding. The s390 manual (Principles of Operation[1]) states that if the source length is 0, then the value in the register used for the source address is not changed and no access exceptions for that operand are recognized. In other words: it is completely ignored. This allows to take any odd register for the source length and to remove the source address operand completely: Bug: https://bugs.openjdk.java.net/browse/JDK-8222271 Webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8222271/webrev/ Thanks, Richard. [1] https://www.ibm.com/support/libraryserver/download/dz9zr006.pdf#G13.1223008 From vladimir.kozlov at oracle.com Mon Apr 15 15:55:37 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 15 Apr 2019 08:55:37 -0700 Subject: [13] RFR(T): 8222418: compiler/arguments/TestScavengeRootsInCode.java times out In-Reply-To: References: Message-ID: <4AD32BF2-C4A0-43EB-9510-2E7A108C330A@oracle.com> Good. Thanks Vladimir > On Apr 15, 2019, at 1:34 AM, Tobias Hartmann wrote: > > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8222418 > http://cr.openjdk.java.net/~thartmann/8222418/webrev.00/ > > The test sets -XX:-TieredCompilation -Xcomp and should therefore not be executed with Graal as JIT. > Otherwise all Graal methods will be compiled by Graal itself running in interpreter mode which is > very slow and causes the test to time out. > > Thanks, > Tobias From adinn at redhat.com Mon Apr 15 16:15:51 2019 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 15 Apr 2019 17:15:51 +0100 Subject: RFR(XXS): 8222412: AARCH64: lse atomics encoding is not accepting zr as source In-Reply-To: <7b2c42d6-8309-23ab-776c-132c1a2f4baf@bell-sw.com> References: <7b2c42d6-8309-23ab-776c-132c1a2f4baf@bell-sw.com> Message-ID: <69d9357f-4945-7461-cf08-f5ac646b1591@redhat.com> Hello Dmitrij, On 12/04/2019 16:24, Dmitrij Pochepko wrote: > please review small fix for 8222412: AARCH64: lse atomics encoding is > not accepting zr as source > > webrev: http://cr.openjdk.java.net/~dpochepk/8222412/webrev.01/ > > Current encoding for lse atomics hits assert when trying to use zr as > source register while it is allowed by spec. Current vm doesn't use > atomics with zr and this problem is not triggered. I think this part of your comment is critical: "Current vm doesn't use atomics with zr and this problem is not triggered." In which case I have to ask why are you spending time fixing this and asking others to spend time reviewing it? I agree that this detail is indeed wrong. In related news the whole internet is broken yet that's no reason for anyone to spend their time trying to fix it. Do you have a use case for this fixed behaviour? That comment may sound harsh but this fix is the nadir (at least I hope it is) of a trajectory that has presented change after change offering little by way of motivation and, in direct consequence, little by way of benefit. It is all very nice to receive contributions gratis but they really need to be worth more than the cost of accepting them. > Testing: > > I generated lse atomics with zr as source register. No assert observed > with patched vm. I'd much prefer for this to be tested through being used in the VM. Perhaps you might re-present the patch as part of a larger patch that justifies its inclusion by fixing an actual breakage or performance problem. If you can do that I'd be happy to pass it as reviewed. > CR: https://bugs.openjdk.java.net/browse/JDK-8222412 Also, could you please downgrade the priority of this defect to P5 so it represent the true state of affairs. regards, Andrew Dinn ----------- From tobias.hartmann at oracle.com Mon Apr 15 16:40:39 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 15 Apr 2019 18:40:39 +0200 Subject: [13] RFR(T): 8222417: compiler/loopopts/TestOverunrolling.java times out In-Reply-To: <93E0CC12-A81C-4643-8CD9-88FAF6CF1D63@oracle.com> References: <477c16f2-11a2-2740-a780-ded7794ebc27@oracle.com> <93E0CC12-A81C-4643-8CD9-88FAF6CF1D63@oracle.com> Message-ID: <3a2cec9f-2cc8-a4ff-f75f-4b6673ed9699@oracle.com> Thanks, Vladimir. Best regards, Tobias On 15.04.19 17:56, Vladimir Kozlov wrote: > Good. > > Thanks > Vladimir > >> On Apr 15, 2019, at 1:34 AM, Tobias Hartmann wrote: >> >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8222417 >> http://cr.openjdk.java.net/~thartmann/8222417/webrev.00/ >> >> The test sets -XX:-TieredCompilation -Xcomp and should therefore not be executed with Graal as JIT. >> Otherwise all Graal methods will be compiled by Graal itself running in interpreter mode which is >> very slow and causes the test to time out. >> >> Thanks, >> Tobias > From tobias.hartmann at oracle.com Mon Apr 15 16:40:46 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 15 Apr 2019 18:40:46 +0200 Subject: [13] RFR(T): 8222418: compiler/arguments/TestScavengeRootsInCode.java times out In-Reply-To: <4AD32BF2-C4A0-43EB-9510-2E7A108C330A@oracle.com> References: <4AD32BF2-C4A0-43EB-9510-2E7A108C330A@oracle.com> Message-ID: <2a405a6f-ee82-e2b1-128f-b0eb6683b8ef@oracle.com> Thanks, Vladimir. Best regards, Tobias On 15.04.19 17:55, Vladimir Kozlov wrote: > Good. > > Thanks > Vladimir > >> On Apr 15, 2019, at 1:34 AM, Tobias Hartmann wrote: >> >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8222418 >> http://cr.openjdk.java.net/~thartmann/8222418/webrev.00/ >> >> The test sets -XX:-TieredCompilation -Xcomp and should therefore not be executed with Graal as JIT. >> Otherwise all Graal methods will be compiled by Graal itself running in interpreter mode which is >> very slow and causes the test to time out. >> >> Thanks, >> Tobias > From rkennke at redhat.com Mon Apr 15 20:04:10 2019 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 15 Apr 2019 22:04:10 +0200 Subject: RFR: JDK-8222079: Don't use memset to initialize fields decode_env constructor in disassembler.cpp Message-ID: Recent gcc (I use version 9) complains about using memset to initialize fields of decode_env. Let's use proper field initializers instead. Bug: https://bugs.openjdk.java.net/browse/JDK-8222079 Webrev: http://cr.openjdk.java.net/~rkennke/JDK-8222079/webrev.01/ Can I please get a review? Thanks, Roman From david.holmes at oracle.com Tue Apr 16 02:00:58 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 16 Apr 2019 12:00:58 +1000 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> <587d1afe-9f86-f801-92de-31711b711034@oracle.com> <24A2B80F-D3D4-486A-9E29-8D48D49DF030@oracle.com> Message-ID: Hi Robbin, On 15/04/2019 6:58 pm, Robbin Ehn wrote: > Hi, please review. > > After reexamine this issue: > Threads in native must always have their stack walkable. > JFR sampler should never need to make a stack walkable (for native sample). > > I manage to locally reproduce reliable with changes to JFR sampler and > having > hundreds of threads running similar code as the in the bug. > (Looping creating an array with negative size.) > > I found a place where we don't proper look at the suspend flags. > The java thread can thus escape native and make it's stack unwalkable > and later > it tries to make it walkable at the same time as the JFR sampler. > > By removing some kind of fast check and instead always call the > check_safepoint_and_suspend_for_native_trans I can no longer reproduce. Sorry but I can't see how this can fix anything: - if (SafepointMechanism::should_block(thread) || thread->is_suspend_after_native()) { JavaThread::check_safepoint_and_suspend_for_native_trans(thread); - } All you are doing is changing the timing of the race between the thread re-entering the VM/Java and the request for a suspend or safepoint. If there is a race between the sampler logic acting on the thread, and the thread acting on itself then that race has to be precluded somehow. Thanks, David ----- > (which have the JFR native trans suspend check) > And it passes t1-5. > > Code: > http://cr.openjdk.java.net/~rehn/8218147/v3/webrev/ > Issue: > https://bugs.openjdk.java.net/browse/JDK-8218147 > > Thanks, Robbin > > On 4/5/19 5:43 PM, Robbin Ehn wrote: >> Hi Dean, >> >> Sorry, I missed this mail. >> Yes we can do that. >> Ignore my other mail, I'll update. >> >> Thanks, Robbin >> >> >> dean.long at oracle.com skrev: (5 april 2019 09:22:24 CEST) >>> On 4/4/19 5:16 PM, dean.long at oracle.com wrote: >>>> >>>>>> >>>>>> If it's already set, should we check that _last_Java_pc matches the >>> >>>>>> new value? >>>>> >>>>> We manually set the pc in several places, so if it's set, it's not >>>>> certain that >>>>> it should be the same as in last sp. >>>>> I can't distinguish between the cases. >>>>> >>>> >>>> If we get pc from sp[-1] then it should match, but you're right, we >>>> sometimes get pc from somewhere else. >>> >>> How about if we combine the !walkable check and the >>> capture_last_Java_pc() logic into a single method? >>> Then we can do something like: >>> >>> ???? if (!walkable()) { >>> ???????? address pc = (address)_last_Java_sp[-1]; >>> ???????? address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL); >>> ???????? assert(a == NULL || a == pc, "unexpected PC %p", a); >>> ???? } >>> >>> dl From igor.ignatyev at oracle.com Tue Apr 16 03:41:59 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 15 Apr 2019 20:41:59 -0700 Subject: [13] RFR(T): 8222417: compiler/loopopts/TestOverunrolling.java times out In-Reply-To: <477c16f2-11a2-2740-a780-ded7794ebc27@oracle.com> References: <477c16f2-11a2-2740-a780-ded7794ebc27@oracle.com> Message-ID: <2561E67B-9E53-4FD9-9C9C-2707D17BCCD3@oracle.com> Hi Tobias, although I agree that this test shouldn't be executed w/ Graal as JIT in the current setup, I don't think '@requires !vm.graal.enabled' is a right choice here b/c this test still can/should be run w/ libgraal, so I'd suggest to put this test and other tests (e.g. TestScavengeRootsInCode.java) into graal-specific problem list under an umbrella bug saying smth like 'Graal is very slow w/ -XX:-TieredCompilation -Xcomp'. Thanks, -- Igor > On Apr 15, 2019, at 1:34 AM, Tobias Hartmann wrote: > > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8222417 > http://cr.openjdk.java.net/~thartmann/8222417/webrev.00/ > > The test sets -XX:-TieredCompilation -Xcomp and should therefore not be executed with Graal as JIT. > Otherwise all Graal methods will be compiled by Graal itself running in interpreter mode which is > very slow and causes the test to time out. > > Thanks, > Tobias -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbin.ehn at oracle.com Tue Apr 16 06:51:17 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 16 Apr 2019 08:51:17 +0200 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> <587d1afe-9f86-f801-92de-31711b711034@oracle.com> <24A2B80F-D3D4-486A-9E29-8D48D49DF030@oracle.com> Message-ID: Hi David, On 4/16/19 4:00 AM, David Holmes wrote: > Hi Robbin, > > On 15/04/2019 6:58 pm, Robbin Ehn wrote: >> Hi, please review. >> >> After reexamine this issue: >> Threads in native must always have their stack walkable. >> JFR sampler should never need to make a stack walkable (for native sample). >> >> I manage to locally reproduce reliable with changes to JFR sampler and having >> hundreds of threads running similar code as the in the bug. >> (Looping creating an array with negative size.) >> >> I found a place where we don't proper look at the suspend flags. >> The java thread can thus escape native and make it's stack unwalkable and later >> it tries to make it walkable at the same time as the JFR sampler. >> >> By removing some kind of fast check and instead always call the >> check_safepoint_and_suspend_for_native_trans I can no longer reproduce. > > Sorry but I can't see how this can fix anything: > > -???? if (SafepointMechanism::should_block(thread) || > thread->is_suspend_after_native()) { > ??????? JavaThread::check_safepoint_and_suspend_for_native_trans(thread); > -???? } > In check_safepoint_and_suspend_for_native_trans we check _trace_flag and stop with macro: JFR_ONLY(SUSPEND_THREAD_CONDITIONAL(thread);) This method does not do the right thing: bool is_suspend_after_native() const { return (_suspend_flags & (_external_suspend | _deopt_suspend)) != 0; } If we want to keep the out-of-line double checking, this is an alternative: diff -r 51211b2d6514 src/hotspot/share/runtime/thread.hpp --- a/src/hotspot/share/runtime/thread.hpp Tue Apr 16 08:38:32 2019 +0200 +++ b/src/hotspot/share/runtime/thread.hpp Tue Apr 16 08:44:21 2019 +0200 @@ -1417,3 +1417,3 @@ bool is_suspend_after_native() const { - return (_suspend_flags & (_external_suspend | _deopt_suspend)) != 0; + return (_suspend_flags & (_external_suspend | _deopt_suspend | _trace_flag)) != 0; } So we are missing checking that bit completely in this transition code. Thanks, Robbin > All you are doing is changing the timing of the race between the thread > re-entering the VM/Java and the request for a suspend or safepoint. > > If there is a race between the sampler logic acting on the thread, and the > thread acting on itself then that race has to be precluded somehow. > > Thanks, > David > ----- > >> (which have the JFR native trans suspend check) >> And it passes t1-5. >> >> Code: >> http://cr.openjdk.java.net/~rehn/8218147/v3/webrev/ >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8218147 >> >> Thanks, Robbin >> >> On 4/5/19 5:43 PM, Robbin Ehn wrote: >>> Hi Dean, >>> >>> Sorry, I missed this mail. >>> Yes we can do that. >>> Ignore my other mail, I'll update. >>> >>> Thanks, Robbin >>> >>> >>> dean.long at oracle.com skrev: (5 april 2019 09:22:24 CEST) >>>> On 4/4/19 5:16 PM, dean.long at oracle.com wrote: >>>>> >>>>>>> >>>>>>> If it's already set, should we check that _last_Java_pc matches the >>>> >>>>>>> new value? >>>>>> >>>>>> We manually set the pc in several places, so if it's set, it's not >>>>>> certain that >>>>>> it should be the same as in last sp. >>>>>> I can't distinguish between the cases. >>>>>> >>>>> >>>>> If we get pc from sp[-1] then it should match, but you're right, we >>>>> sometimes get pc from somewhere else. >>>> >>>> How about if we combine the !walkable check and the >>>> capture_last_Java_pc() logic into a single method? >>>> Then we can do something like: >>>> >>>> ???? if (!walkable()) { >>>> ???????? address pc = (address)_last_Java_sp[-1]; >>>> ???????? address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL); >>>> ???????? assert(a == NULL || a == pc, "unexpected PC %p", a); >>>> ???? } >>>> >>>> dl From david.holmes at oracle.com Tue Apr 16 07:27:19 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 16 Apr 2019 17:27:19 +1000 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> <587d1afe-9f86-f801-92de-31711b711034@oracle.com> <24A2B80F-D3D4-486A-9E29-8D48D49DF030@oracle.com> Message-ID: <3d0d841d-3bcf-ce3c-1d44-096c9aff2e72@oracle.com> On 16/04/2019 4:51 pm, Robbin Ehn wrote: > Hi David, > > On 4/16/19 4:00 AM, David Holmes wrote: >> Hi Robbin, >> >> On 15/04/2019 6:58 pm, Robbin Ehn wrote: >>> Hi, please review. >>> >>> After reexamine this issue: >>> Threads in native must always have their stack walkable. >>> JFR sampler should never need to make a stack walkable (for native >>> sample). >>> >>> I manage to locally reproduce reliable with changes to JFR sampler >>> and having >>> hundreds of threads running similar code as the in the bug. >>> (Looping creating an array with negative size.) >>> >>> I found a place where we don't proper look at the suspend flags. >>> The java thread can thus escape native and make it's stack unwalkable >>> and later >>> it tries to make it walkable at the same time as the JFR sampler. >>> >>> By removing some kind of fast check and instead always call the >>> check_safepoint_and_suspend_for_native_trans I can no longer reproduce. >> >> Sorry but I can't see how this can fix anything: >> >> -???? if (SafepointMechanism::should_block(thread) || >> thread->is_suspend_after_native()) { >> >> JavaThread::check_safepoint_and_suspend_for_native_trans(thread); >> -???? } >> > > In check_safepoint_and_suspend_for_native_trans we check _trace_flag and > stop with macro: > JFR_ONLY(SUSPEND_THREAD_CONDITIONAL(thread);) Ah I see - needed to expand that macro. > This method does not do the right thing: > bool is_suspend_after_native() const { > ? return (_suspend_flags & (_external_suspend | _deopt_suspend)) != 0; > } > > If we want to keep the out-of-line double checking, this is an alternative: > diff -r 51211b2d6514 src/hotspot/share/runtime/thread.hpp > --- a/src/hotspot/share/runtime/thread.hpp??? Tue Apr 16 08:38:32 2019 > +0200 > +++ b/src/hotspot/share/runtime/thread.hpp??? Tue Apr 16 08:44:21 2019 > +0200 > @@ -1417,3 +1417,3 @@ > ?? bool is_suspend_after_native() const { > -??? return (_suspend_flags & (_external_suspend | _deopt_suspend)) != 0; > +??? return (_suspend_flags & (_external_suspend | _deopt_suspend | > _trace_flag)) != 0; > ?? } Right. Should that use JFR_ONLY? I was a little concerned that thread->set_trace_flag() may not ensure visibility of the flag update, but then realized it should be covered by the fence: static inline void transition_from_native(JavaThread *thread, JavaThreadState to) { // Change to transition state and ensure it is seen by the VM thread. thread->set_thread_state_fence(_thread_in_native_trans); and the comment should say: // Change to transition state and ensure it is seen by other thread, // and we will see any _suspend_flag changes below. However it also seems to me that in JfrThreadSampleClosure::do_sample_thread we need a storeload() barrier after: thread->set_trace_flag(); to ensure its not reordered with the reads of thread->thread_state() ? Thanks, David ----- > So we are missing checking that bit completely in this transition code. > > Thanks, Robbin > > >> All you are doing is changing the timing of the race between the >> thread re-entering the VM/Java and the request for a suspend or >> safepoint. >> >> If there is a race between the sampler logic acting on the thread, and >> the thread acting on itself then that race has to be precluded somehow. >> >> Thanks, >> David >> ----- >> >>> (which have the JFR native trans suspend check) >>> And it passes t1-5. >>> >>> Code: >>> http://cr.openjdk.java.net/~rehn/8218147/v3/webrev/ >>> Issue: >>> https://bugs.openjdk.java.net/browse/JDK-8218147 >>> >>> Thanks, Robbin >>> >>> On 4/5/19 5:43 PM, Robbin Ehn wrote: >>>> Hi Dean, >>>> >>>> Sorry, I missed this mail. >>>> Yes we can do that. >>>> Ignore my other mail, I'll update. >>>> >>>> Thanks, Robbin >>>> >>>> >>>> dean.long at oracle.com skrev: (5 april 2019 09:22:24 CEST) >>>>> On 4/4/19 5:16 PM, dean.long at oracle.com wrote: >>>>>> >>>>>>>> >>>>>>>> If it's already set, should we check that _last_Java_pc matches the >>>>> >>>>>>>> new value? >>>>>>> >>>>>>> We manually set the pc in several places, so if it's set, it's not >>>>>>> certain that >>>>>>> it should be the same as in last sp. >>>>>>> I can't distinguish between the cases. >>>>>>> >>>>>> >>>>>> If we get pc from sp[-1] then it should match, but you're right, we >>>>>> sometimes get pc from somewhere else. >>>>> >>>>> How about if we combine the !walkable check and the >>>>> capture_last_Java_pc() logic into a single method? >>>>> Then we can do something like: >>>>> >>>>> ???? if (!walkable()) { >>>>> ???????? address pc = (address)_last_Java_sp[-1]; >>>>> ???????? address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL); >>>>> ???????? assert(a == NULL || a == pc, "unexpected PC %p", a); >>>>> ???? } >>>>> >>>>> dl From tobias.hartmann at oracle.com Tue Apr 16 07:33:35 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 16 Apr 2019 09:33:35 +0200 Subject: [13] RFR(T): 8222417: compiler/loopopts/TestOverunrolling.java times out In-Reply-To: <2561E67B-9E53-4FD9-9C9C-2707D17BCCD3@oracle.com> References: <477c16f2-11a2-2740-a780-ded7794ebc27@oracle.com> <2561E67B-9E53-4FD9-9C9C-2707D17BCCD3@oracle.com> Message-ID: <6fac0535-ae63-193a-c3f9-bcb116a1da9f@oracle.com> Hi Igor, that's fine with me but then we should make sure to also change all the numerous other fixes that added '@requires !vm.graal.enabled' for the same reason. For example, JDK-8198924. I've created an umbrella bug for this: https://bugs.openjdk.java.net/browse/JDK-8222524 Here are the new webrevs: http://cr.openjdk.java.net/~thartmann/8222417/webrev.01/ http://cr.openjdk.java.net/~thartmann/8222418/webrev.01/ Thanks, Tobias On 16.04.19 05:41, Igor Ignatyev wrote: > Hi Tobias, > > although I agree that this test shouldn't be executed w/ Graal as JIT in the current setup, I don't > think '@requires !vm.graal.enabled' is a right choice here b/c this test still can/should be run w/ > libgraal, so I'd suggest to put this test and other tests (e.g.?TestScavengeRootsInCode.java)?into > graal-specific problem list under an umbrella bug saying smth like 'Graal is very slow w/ > -XX:-TieredCompilation -Xcomp'. > > Thanks, > -- Igor > >> On Apr 15, 2019, at 1:34 AM, Tobias Hartmann > > wrote: >> >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8222417 >> http://cr.openjdk.java.net/~thartmann/8222417/webrev.00/ >> >> The test sets -XX:-TieredCompilation -Xcomp and should therefore not be executed with Graal as JIT. >> Otherwise all Graal methods will be compiled by Graal itself running in interpreter mode which is >> very slow and causes the test to time out. >> >> Thanks, >> Tobias > From robbin.ehn at oracle.com Tue Apr 16 07:56:42 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 16 Apr 2019 09:56:42 +0200 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: <3d0d841d-3bcf-ce3c-1d44-096c9aff2e72@oracle.com> References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> <587d1afe-9f86-f801-92de-31711b711034@oracle.com> <24A2B80F-D3D4-486A-9E29-8D48D49DF030@oracle.com> <3d0d841d-3bcf-ce3c-1d44-096c9aff2e72@oracle.com> Message-ID: Hi David, *truncated* On 4/16/19 9:27 AM, David Holmes wrote: >> If we want to keep the out-of-line double checking, this is an alternative: >> diff -r 51211b2d6514 src/hotspot/share/runtime/thread.hpp >> --- a/src/hotspot/share/runtime/thread.hpp??? Tue Apr 16 08:38:32 2019 +0200 >> +++ b/src/hotspot/share/runtime/thread.hpp??? Tue Apr 16 08:44:21 2019 +0200 >> @@ -1417,3 +1417,3 @@ >> ??? bool is_suspend_after_native() const { >> -??? return (_suspend_flags & (_external_suspend | _deopt_suspend)) != 0; >> +??? return (_suspend_flags & (_external_suspend | _deopt_suspend | >> _trace_flag)) != 0; >> ??? } > So you prefer this patch? > Right. Should that use JFR_ONLY? _trace_flag is always present, so we don't need it. And I'm not sure how to get that macro into that method in a nice way? > > I was a little concerned that thread->set_trace_flag() may not ensure visibility > of the flag update, but then realized it should be covered by the fence: > > ?static inline void transition_from_native(JavaThread *thread, JavaThreadState > to) { > ??? // Change to transition state and ensure it is seen by the VM thread. > ??? thread->set_thread_state_fence(_thread_in_native_trans); > > and the comment should say: > > // Change to transition state and ensure it is seen by other thread, > // and we will see any _suspend_flag changes below. > > However it also seems to me that in JfrThreadSampleClosure::do_sample_thread we > need a storeload() barrier after: > > ? thread->set_trace_flag(); > > to ensure its not reordered with the reads of thread->thread_state() ? Setting/clearing suspend flags is always done with Atomic::cmpxchg, since there can be multiple threads manipulating the bit pattern. I can add a comment about it. Thanks, Robbin > > Thanks, > David > ----- > > >> So we are missing checking that bit completely in this transition code. >> >> Thanks, Robbin >> >> >>> All you are doing is changing the timing of the race between the thread >>> re-entering the VM/Java and the request for a suspend or safepoint. >>> >>> If there is a race between the sampler logic acting on the thread, and the >>> thread acting on itself then that race has to be precluded somehow. >>> >>> Thanks, >>> David >>> ----- >>> >>>> (which have the JFR native trans suspend check) >>>> And it passes t1-5. >>>> >>>> Code: >>>> http://cr.openjdk.java.net/~rehn/8218147/v3/webrev/ >>>> Issue: >>>> https://bugs.openjdk.java.net/browse/JDK-8218147 >>>> >>>> Thanks, Robbin >>>> >>>> On 4/5/19 5:43 PM, Robbin Ehn wrote: >>>>> Hi Dean, >>>>> >>>>> Sorry, I missed this mail. >>>>> Yes we can do that. >>>>> Ignore my other mail, I'll update. >>>>> >>>>> Thanks, Robbin >>>>> >>>>> >>>>> dean.long at oracle.com skrev: (5 april 2019 09:22:24 CEST) >>>>>> On 4/4/19 5:16 PM, dean.long at oracle.com wrote: >>>>>>> >>>>>>>>> >>>>>>>>> If it's already set, should we check that _last_Java_pc matches the >>>>>> >>>>>>>>> new value? >>>>>>>> >>>>>>>> We manually set the pc in several places, so if it's set, it's not >>>>>>>> certain that >>>>>>>> it should be the same as in last sp. >>>>>>>> I can't distinguish between the cases. >>>>>>>> >>>>>>> >>>>>>> If we get pc from sp[-1] then it should match, but you're right, we >>>>>>> sometimes get pc from somewhere else. >>>>>> >>>>>> How about if we combine the !walkable check and the >>>>>> capture_last_Java_pc() logic into a single method? >>>>>> Then we can do something like: >>>>>> >>>>>> ???? if (!walkable()) { >>>>>> ???????? address pc = (address)_last_Java_sp[-1]; >>>>>> ???????? address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL); >>>>>> ???????? assert(a == NULL || a == pc, "unexpected PC %p", a); >>>>>> ???? } >>>>>> >>>>>> dl From david.holmes at oracle.com Tue Apr 16 08:21:44 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 16 Apr 2019 18:21:44 +1000 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> <587d1afe-9f86-f801-92de-31711b711034@oracle.com> <24A2B80F-D3D4-486A-9E29-8D48D49DF030@oracle.com> <3d0d841d-3bcf-ce3c-1d44-096c9aff2e72@oracle.com> Message-ID: <40b529d0-aeca-24b8-636a-518b7445b593@oracle.com> On 16/04/2019 5:56 pm, Robbin Ehn wrote: > Hi David, > > *truncated* > > On 4/16/19 9:27 AM, David Holmes wrote: >>> If we want to keep the out-of-line double checking, this is an >>> alternative: >>> diff -r 51211b2d6514 src/hotspot/share/runtime/thread.hpp >>> --- a/src/hotspot/share/runtime/thread.hpp??? Tue Apr 16 08:38:32 >>> 2019 +0200 >>> +++ b/src/hotspot/share/runtime/thread.hpp??? Tue Apr 16 08:44:21 >>> 2019 +0200 >>> @@ -1417,3 +1417,3 @@ >>> ??? bool is_suspend_after_native() const { >>> -??? return (_suspend_flags & (_external_suspend | _deopt_suspend)) >>> != 0; >>> +??? return (_suspend_flags & (_external_suspend | _deopt_suspend | >>> _trace_flag)) != 0; >>> ??? } >> > > So you prefer this patch? Yes > >> Right. Should that use JFR_ONLY? > > _trace_flag is always present, so we don't need it. Okay ... not clear what will set it other than JFR ... > And I'm not sure how to get that macro into that method in a nice way? Define nice ;-) return (_suspend_flags & (_external_suspend | _deopt_suspend JFR_ONLY(| _trace_flag))) != 0; >> >> I was a little concerned that thread->set_trace_flag() may not ensure >> visibility of the flag update, but then realized it should be covered >> by the fence: >> >> ??static inline void transition_from_native(JavaThread *thread, >> JavaThreadState to) { >> ???? // Change to transition state and ensure it is seen by the VM >> thread. >> ???? thread->set_thread_state_fence(_thread_in_native_trans); >> >> and the comment should say: >> >> // Change to transition state and ensure it is seen by other thread, >> // and we will see any _suspend_flag changes below. >> >> However it also seems to me that in >> JfrThreadSampleClosure::do_sample_thread we need a storeload() barrier >> after: >> >> ?? thread->set_trace_flag(); >> >> to ensure its not reordered with the reads of thread->thread_state() ? > > Setting/clearing suspend flags is always done with Atomic::cmpxchg, > since there can be multiple threads manipulating the bit pattern. > I can add a comment about it. Missed that - thanks. David ----- > Thanks, Robbin > >> >> Thanks, >> David >> ----- >> >> >>> So we are missing checking that bit completely in this transition code. >>> >>> Thanks, Robbin >>> >>> >>>> All you are doing is changing the timing of the race between the >>>> thread re-entering the VM/Java and the request for a suspend or >>>> safepoint. >>>> >>>> If there is a race between the sampler logic acting on the thread, >>>> and the thread acting on itself then that race has to be precluded >>>> somehow. >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> (which have the JFR native trans suspend check) >>>>> And it passes t1-5. >>>>> >>>>> Code: >>>>> http://cr.openjdk.java.net/~rehn/8218147/v3/webrev/ >>>>> Issue: >>>>> https://bugs.openjdk.java.net/browse/JDK-8218147 >>>>> >>>>> Thanks, Robbin >>>>> >>>>> On 4/5/19 5:43 PM, Robbin Ehn wrote: >>>>>> Hi Dean, >>>>>> >>>>>> Sorry, I missed this mail. >>>>>> Yes we can do that. >>>>>> Ignore my other mail, I'll update. >>>>>> >>>>>> Thanks, Robbin >>>>>> >>>>>> >>>>>> dean.long at oracle.com skrev: (5 april 2019 09:22:24 CEST) >>>>>>> On 4/4/19 5:16 PM, dean.long at oracle.com wrote: >>>>>>>> >>>>>>>>>> >>>>>>>>>> If it's already set, should we check that _last_Java_pc >>>>>>>>>> matches the >>>>>>> >>>>>>>>>> new value? >>>>>>>>> >>>>>>>>> We manually set the pc in several places, so if it's set, it's not >>>>>>>>> certain that >>>>>>>>> it should be the same as in last sp. >>>>>>>>> I can't distinguish between the cases. >>>>>>>>> >>>>>>>> >>>>>>>> If we get pc from sp[-1] then it should match, but you're right, we >>>>>>>> sometimes get pc from somewhere else. >>>>>>> >>>>>>> How about if we combine the !walkable check and the >>>>>>> capture_last_Java_pc() logic into a single method? >>>>>>> Then we can do something like: >>>>>>> >>>>>>> ???? if (!walkable()) { >>>>>>> ???????? address pc = (address)_last_Java_sp[-1]; >>>>>>> ???????? address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL); >>>>>>> ???????? assert(a == NULL || a == pc, "unexpected PC %p", a); >>>>>>> ???? } >>>>>>> >>>>>>> dl From nils.eliasson at oracle.com Tue Apr 16 08:44:53 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 16 Apr 2019 10:44:53 +0200 Subject: RFR(XS): 8218468: Load barrier slow path node should be MachTypeNode Message-ID: <43ba5700-d3c4-7dc4-a09d-5961b80aad0f@oracle.com> Hi, We have a number of assert failures in RunThese testing. This is caused by? LoadBarrierSlowReg and LoadBarrierWeakSlowReg not being MachTypeNodes. This patches fixes that. I am also adding includes for ZGC and Shenandoah. Bug: https://bugs.openjdk.java.net/browse/JDK-8218468 Webrev: http://cr.openjdk.java.net/~neliasso/8218468/webrev.01/ Please review, Nils Eliasson From shade at redhat.com Tue Apr 16 08:47:06 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Apr 2019 10:47:06 +0200 Subject: RFR(XS): 8218468: Load barrier slow path node should be MachTypeNode In-Reply-To: <43ba5700-d3c4-7dc4-a09d-5961b80aad0f@oracle.com> References: <43ba5700-d3c4-7dc4-a09d-5961b80aad0f@oracle.com> Message-ID: On 4/16/19 10:44 AM, Nils Eliasson wrote: > Webrev: http://cr.openjdk.java.net/~neliasso/8218468/webrev.01/ Shenandoah part looks good. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From per.liden at oracle.com Tue Apr 16 08:54:00 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 16 Apr 2019 10:54:00 +0200 Subject: RFR(XS): 8218468: Load barrier slow path node should be MachTypeNode In-Reply-To: <43ba5700-d3c4-7dc4-a09d-5961b80aad0f@oracle.com> References: <43ba5700-d3c4-7dc4-a09d-5961b80aad0f@oracle.com> Message-ID: Looks good! /Per On 4/16/19 10:44 AM, Nils Eliasson wrote: > Hi, > > We have a number of assert failures in RunThese testing. This is caused > by? LoadBarrierSlowReg and LoadBarrierWeakSlowReg not being MachTypeNodes. > > This patches fixes that. > > I am also adding includes for ZGC and Shenandoah. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8218468 > > Webrev: http://cr.openjdk.java.net/~neliasso/8218468/webrev.01/ > > Please review, > > Nils Eliasson > From robbin.ehn at oracle.com Tue Apr 16 08:59:51 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 16 Apr 2019 10:59:51 +0200 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: <40b529d0-aeca-24b8-636a-518b7445b593@oracle.com> References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> <587d1afe-9f86-f801-92de-31711b711034@oracle.com> <24A2B80F-D3D4-486A-9E29-8D48D49DF030@oracle.com> <3d0d841d-3bcf-ce3c-1d44-096c9aff2e72@oracle.com> <40b529d0-aeca-24b8-636a-518b7445b593@oracle.com> Message-ID: <6f53161b-479b-6dff-5ead-18aaa4170bf3@oracle.com> Hi David, >> And I'm not sure how to get that macro into that method in a nice way? > > Define nice ;-) > > return (_suspend_flags & (_external_suspend | _deopt_suspend JFR_ONLY(| > _trace_flag))) != 0; Sure! I'll re-test and sent out a v4, thanks! /Robbin > >>> >>> I was a little concerned that thread->set_trace_flag() may not ensure >>> visibility of the flag update, but then realized it should be covered by the >>> fence: >>> >>> ??static inline void transition_from_native(JavaThread *thread, >>> JavaThreadState to) { >>> ???? // Change to transition state and ensure it is seen by the VM thread. >>> ???? thread->set_thread_state_fence(_thread_in_native_trans); >>> >>> and the comment should say: >>> >>> // Change to transition state and ensure it is seen by other thread, >>> // and we will see any _suspend_flag changes below. >>> >>> However it also seems to me that in JfrThreadSampleClosure::do_sample_thread >>> we need a storeload() barrier after: >>> >>> ?? thread->set_trace_flag(); >>> >>> to ensure its not reordered with the reads of thread->thread_state() ? >> >> Setting/clearing suspend flags is always done with Atomic::cmpxchg, since >> there can be multiple threads manipulating the bit pattern. >> I can add a comment about it. > > Missed that - thanks. > > David > ----- > >> Thanks, Robbin >> >>> >>> Thanks, >>> David >>> ----- >>> >>> >>>> So we are missing checking that bit completely in this transition code. >>>> >>>> Thanks, Robbin >>>> >>>> >>>>> All you are doing is changing the timing of the race between the thread >>>>> re-entering the VM/Java and the request for a suspend or safepoint. >>>>> >>>>> If there is a race between the sampler logic acting on the thread, and the >>>>> thread acting on itself then that race has to be precluded somehow. >>>>> >>>>> Thanks, >>>>> David >>>>> ----- >>>>> >>>>>> (which have the JFR native trans suspend check) >>>>>> And it passes t1-5. >>>>>> >>>>>> Code: >>>>>> http://cr.openjdk.java.net/~rehn/8218147/v3/webrev/ >>>>>> Issue: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8218147 >>>>>> >>>>>> Thanks, Robbin >>>>>> >>>>>> On 4/5/19 5:43 PM, Robbin Ehn wrote: >>>>>>> Hi Dean, >>>>>>> >>>>>>> Sorry, I missed this mail. >>>>>>> Yes we can do that. >>>>>>> Ignore my other mail, I'll update. >>>>>>> >>>>>>> Thanks, Robbin >>>>>>> >>>>>>> >>>>>>> dean.long at oracle.com skrev: (5 april 2019 09:22:24 CEST) >>>>>>>> On 4/4/19 5:16 PM, dean.long at oracle.com wrote: >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> If it's already set, should we check that _last_Java_pc matches the >>>>>>>> >>>>>>>>>>> new value? >>>>>>>>>> >>>>>>>>>> We manually set the pc in several places, so if it's set, it's not >>>>>>>>>> certain that >>>>>>>>>> it should be the same as in last sp. >>>>>>>>>> I can't distinguish between the cases. >>>>>>>>>> >>>>>>>>> >>>>>>>>> If we get pc from sp[-1] then it should match, but you're right, we >>>>>>>>> sometimes get pc from somewhere else. >>>>>>>> >>>>>>>> How about if we combine the !walkable check and the >>>>>>>> capture_last_Java_pc() logic into a single method? >>>>>>>> Then we can do something like: >>>>>>>> >>>>>>>> ???? if (!walkable()) { >>>>>>>> ???????? address pc = (address)_last_Java_sp[-1]; >>>>>>>> ???????? address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL); >>>>>>>> ???????? assert(a == NULL || a == pc, "unexpected PC %p", a); >>>>>>>> ???? } >>>>>>>> >>>>>>>> dl From lutz.schmidt at sap.com Tue Apr 16 09:52:20 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 16 Apr 2019 09:52:20 +0000 Subject: [CAUTION] RE: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Message-ID: Hi Richard, your change looks good. Please note that I am NOT a reviewer. Thanks for detecting and fixing this inefficiency. Regards, Lutz ?On 15.04.19, 17:56, "hotspot-compiler-dev on behalf of Reingruber, Richard" wrote: Thanks Martin. Cheers, Richard. -----Original Message----- From: Doerr, Martin Sent: Montag, 15. April 2019 17:50 To: Reingruber, Richard ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi Richard, thanks for the update. Looks good. I can push it after you got a 2nd review. Best regards, Martin -----Original Message----- From: Reingruber, Richard Sent: Montag, 15. April 2019 17:41 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi Martin, thanks for looking at this. > I think the function parameter names "srcL" and "src_len" are confusing (already in current implementation). > Maybe "tmpL" and "odd_tmp_reg" would be better? What do you think? You are absolutely right. I've changed the names. This is the new webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8222271/webrev.1/ Thanks, Richard. -----Original Message----- From: Doerr, Martin Sent: Montag, 15. April 2019 14:47 To: Reingruber, Richard ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi Richard, your change looks good. Thanks for improving. I think the function parameter names "srcL" and "src_len" are confusing (already in current implementation). Maybe "tmpL" and "odd_tmp_reg" would be better? What do you think? Thanks, Martin -----Original Message----- From: hotspot-compiler-dev On Behalf Of Reingruber, Richard Sent: Montag, 15. April 2019 12:09 To: hotspot-compiler-dev at openjdk.java.net Subject: [CAUTION] RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi, please review and sponsor this small enhancement of c2 array clearing on s390. The c2 instruction forms inlineCallClearArrayConstBig and inlineCallClearArray use the register pair R4,R5 as source operand to a move long extended (mvcle) instruction that clears the destination array. To do so the source length (R5) is set to 0 and 0 is used for padding. The s390 manual (Principles of Operation[1]) states that if the source length is 0, then the value in the register used for the source address is not changed and no access exceptions for that operand are recognized. In other words: it is completely ignored. This allows to take any odd register for the source length and to remove the source address operand completely: Bug: https://bugs.openjdk.java.net/browse/JDK-8222271 Webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8222271/webrev/ Thanks, Richard. [1] https://www.ibm.com/support/libraryserver/download/dz9zr006.pdf#G13.1223008 From richard.reingruber at sap.com Tue Apr 16 11:02:47 2019 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Tue, 16 Apr 2019 11:02:47 +0000 Subject: [CAUTION] RE: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays In-Reply-To: References: Message-ID: Thanks Lutz. Regards, Richard. -----Original Message----- From: Schmidt, Lutz Sent: Dienstag, 16. April 2019 11:52 To: Reingruber, Richard ; Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: Re: [CAUTION] RE: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi Richard, your change looks good. Please note that I am NOT a reviewer. Thanks for detecting and fixing this inefficiency. Regards, Lutz ?On 15.04.19, 17:56, "hotspot-compiler-dev on behalf of Reingruber, Richard" wrote: Thanks Martin. Cheers, Richard. -----Original Message----- From: Doerr, Martin Sent: Montag, 15. April 2019 17:50 To: Reingruber, Richard ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi Richard, thanks for the update. Looks good. I can push it after you got a 2nd review. Best regards, Martin -----Original Message----- From: Reingruber, Richard Sent: Montag, 15. April 2019 17:41 To: Doerr, Martin ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi Martin, thanks for looking at this. > I think the function parameter names "srcL" and "src_len" are confusing (already in current implementation). > Maybe "tmpL" and "odd_tmp_reg" would be better? What do you think? You are absolutely right. I've changed the names. This is the new webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8222271/webrev.1/ Thanks, Richard. -----Original Message----- From: Doerr, Martin Sent: Montag, 15. April 2019 14:47 To: Reingruber, Richard ; hotspot-compiler-dev at openjdk.java.net Subject: RE: RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi Richard, your change looks good. Thanks for improving. I think the function parameter names "srcL" and "src_len" are confusing (already in current implementation). Maybe "tmpL" and "odd_tmp_reg" would be better? What do you think? Thanks, Martin -----Original Message----- From: hotspot-compiler-dev On Behalf Of Reingruber, Richard Sent: Montag, 15. April 2019 12:09 To: hotspot-compiler-dev at openjdk.java.net Subject: [CAUTION] RFR(xs): 8222271: [s390] optimize register usage in C2 instruction forms for clearing arrays Hi, please review and sponsor this small enhancement of c2 array clearing on s390. The c2 instruction forms inlineCallClearArrayConstBig and inlineCallClearArray use the register pair R4,R5 as source operand to a move long extended (mvcle) instruction that clears the destination array. To do so the source length (R5) is set to 0 and 0 is used for padding. The s390 manual (Principles of Operation[1]) states that if the source length is 0, then the value in the register used for the source address is not changed and no access exceptions for that operand are recognized. In other words: it is completely ignored. This allows to take any odd register for the source length and to remove the source address operand completely: Bug: https://bugs.openjdk.java.net/browse/JDK-8222271 Webrev: http://cr.openjdk.java.net/~rrich/webrevs/2019/8222271/webrev/ Thanks, Richard. [1] https://www.ibm.com/support/libraryserver/download/dz9zr006.pdf#G13.1223008 From shade at redhat.com Tue Apr 16 11:38:19 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Apr 2019 13:38:19 +0200 Subject: [11u] RFR 8188133: C2: Static field accesses in clinit can trigger deoptimizations Message-ID: <56a46cfd-14c4-e1c6-c76b-542f6f0c8504@redhat.com> Original bug: https://bugs.openjdk.java.net/browse/JDK-8188133 Original fix: http://hg.openjdk.java.net/jdk/jdk/rev/d620a4a1d5ed The patch does not apply to 11u cleanly due to a different patch context in bytecodeInfo.cpp. The changed lines in the patch itself seem to be the same as the original. 11u webrev: http://cr.openjdk.java.net/~shade/8188133/wevrev.11u.01/ Testing: benchmark from the bug, tier1 -- Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From robbin.ehn at oracle.com Tue Apr 16 12:06:41 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 16 Apr 2019 14:06:41 +0200 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> <587d1afe-9f86-f801-92de-31711b711034@oracle.com> <24A2B80F-D3D4-486A-9E29-8D48D49DF030@oracle.com> Message-ID: <53a81521-1630-802f-ce9e-b597b62f6ac5@oracle.com> Hi, here is v4. http://cr.openjdk.java.net/~rehn/8218147/v4/webrev/index.html Re-prod test and t1-t2. Thanks, Robbin On 4/15/19 10:58 AM, Robbin Ehn wrote: > Hi, please review. > > After reexamine this issue: > Threads in native must always have their stack walkable. > JFR sampler should never need to make a stack walkable (for native sample). > > I manage to locally reproduce reliable with changes to JFR sampler and having > hundreds of threads running similar code as the in the bug. > (Looping creating an array with negative size.) > > I found a place where we don't proper look at the suspend flags. > The java thread can thus escape native and make it's stack unwalkable and later > it tries to make it walkable at the same time as the JFR sampler. > > By removing some kind of fast check and instead always call the > check_safepoint_and_suspend_for_native_trans I can no longer reproduce. > (which have the JFR native trans suspend check) > And it passes t1-5. > > Code: > http://cr.openjdk.java.net/~rehn/8218147/v3/webrev/ > Issue: > https://bugs.openjdk.java.net/browse/JDK-8218147 > > Thanks, Robbin > > On 4/5/19 5:43 PM, Robbin Ehn wrote: >> Hi Dean, >> >> Sorry, I missed this mail. >> Yes we can do that. >> Ignore my other mail, I'll update. >> >> Thanks, Robbin >> >> >> dean.long at oracle.com skrev: (5 april 2019 09:22:24 CEST) >>> On 4/4/19 5:16 PM, dean.long at oracle.com wrote: >>>> >>>>>> >>>>>> If it's already set, should we check that _last_Java_pc matches the >>> >>>>>> new value? >>>>> >>>>> We manually set the pc in several places, so if it's set, it's not >>>>> certain that >>>>> it should be the same as in last sp. >>>>> I can't distinguish between the cases. >>>>> >>>> >>>> If we get pc from sp[-1] then it should match, but you're right, we >>>> sometimes get pc from somewhere else. >>> >>> How about if we combine the !walkable check and the >>> capture_last_Java_pc() logic into a single method? >>> Then we can do something like: >>> >>> ???? if (!walkable()) { >>> ???????? address pc = (address)_last_Java_sp[-1]; >>> ???????? address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL); >>> ???????? assert(a == NULL || a == pc, "unexpected PC %p", a); >>> ???? } >>> >>> dl From martin.doerr at sap.com Tue Apr 16 12:13:38 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 16 Apr 2019 12:13:38 +0000 Subject: [11u] RFR 8188133: C2: Static field accesses in clinit can trigger deoptimizations In-Reply-To: <56a46cfd-14c4-e1c6-c76b-542f6f0c8504@redhat.com> References: <56a46cfd-14c4-e1c6-c76b-542f6f0c8504@redhat.com> Message-ID: Hi Aleksey, this looks good. Thanks for backporting. Best regards, Martin -----Original Message----- From: hotspot-compiler-dev On Behalf Of Aleksey Shipilev Sent: Dienstag, 16. April 2019 13:38 To: hotspot compiler ; jdk-updates-dev at openjdk.java.net Subject: [11u] RFR 8188133: C2: Static field accesses in clinit can trigger deoptimizations Original bug: https://bugs.openjdk.java.net/browse/JDK-8188133 Original fix: http://hg.openjdk.java.net/jdk/jdk/rev/d620a4a1d5ed The patch does not apply to 11u cleanly due to a different patch context in bytecodeInfo.cpp. The changed lines in the patch itself seem to be the same as the original. 11u webrev: http://cr.openjdk.java.net/~shade/8188133/wevrev.11u.01/ Testing: benchmark from the bug, tier1 -- Thanks, -Aleksey From david.holmes at oracle.com Tue Apr 16 13:00:04 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 16 Apr 2019 23:00:04 +1000 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: <53a81521-1630-802f-ce9e-b597b62f6ac5@oracle.com> References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> <587d1afe-9f86-f801-92de-31711b711034@oracle.com> <24A2B80F-D3D4-486A-9E29-8D48D49DF030@oracle.com> <53a81521-1630-802f-ce9e-b597b62f6ac5@oracle.com> Message-ID: Looks good to me. Thanks, David On 16/04/2019 10:06 pm, Robbin Ehn wrote: > Hi, here is v4. > > http://cr.openjdk.java.net/~rehn/8218147/v4/webrev/index.html > > Re-prod test and t1-t2. > > Thanks, Robbin > > On 4/15/19 10:58 AM, Robbin Ehn wrote: >> Hi, please review. >> >> After reexamine this issue: >> Threads in native must always have their stack walkable. >> JFR sampler should never need to make a stack walkable (for native >> sample). >> >> I manage to locally reproduce reliable with changes to JFR sampler and >> having >> hundreds of threads running similar code as the in the bug. >> (Looping creating an array with negative size.) >> >> I found a place where we don't proper look at the suspend flags. >> The java thread can thus escape native and make it's stack unwalkable >> and later >> it tries to make it walkable at the same time as the JFR sampler. >> >> By removing some kind of fast check and instead always call the >> check_safepoint_and_suspend_for_native_trans I can no longer reproduce. >> (which have the JFR native trans suspend check) >> And it passes t1-5. >> >> Code: >> http://cr.openjdk.java.net/~rehn/8218147/v3/webrev/ >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8218147 >> >> Thanks, Robbin >> >> On 4/5/19 5:43 PM, Robbin Ehn wrote: >>> Hi Dean, >>> >>> Sorry, I missed this mail. >>> Yes we can do that. >>> Ignore my other mail, I'll update. >>> >>> Thanks, Robbin >>> >>> >>> dean.long at oracle.com skrev: (5 april 2019 09:22:24 CEST) >>>> On 4/4/19 5:16 PM, dean.long at oracle.com wrote: >>>>> >>>>>>> >>>>>>> If it's already set, should we check that _last_Java_pc matches the >>>> >>>>>>> new value? >>>>>> >>>>>> We manually set the pc in several places, so if it's set, it's not >>>>>> certain that >>>>>> it should be the same as in last sp. >>>>>> I can't distinguish between the cases. >>>>>> >>>>> >>>>> If we get pc from sp[-1] then it should match, but you're right, we >>>>> sometimes get pc from somewhere else. >>>> >>>> How about if we combine the !walkable check and the >>>> capture_last_Java_pc() logic into a single method? >>>> Then we can do something like: >>>> >>>> ???? if (!walkable()) { >>>> ???????? address pc = (address)_last_Java_sp[-1]; >>>> ???????? address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL); >>>> ???????? assert(a == NULL || a == pc, "unexpected PC %p", a); >>>> ???? } >>>> >>>> dl From daniel.daugherty at oracle.com Tue Apr 16 13:22:08 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 16 Apr 2019 09:22:08 -0400 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: <53a81521-1630-802f-ce9e-b597b62f6ac5@oracle.com> References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> <587d1afe-9f86-f801-92de-31711b711034@oracle.com> <24A2B80F-D3D4-486A-9E29-8D48D49DF030@oracle.com> <53a81521-1630-802f-ce9e-b597b62f6ac5@oracle.com> Message-ID: <2799bfbb-ae0d-12f9-35f7-a0392a6b941e@oracle.com> On 4/16/19 8:06 AM, Robbin Ehn wrote: > Hi, here is v4. > > http://cr.openjdk.java.net/~rehn/8218147/v4/webrev/index.html src/hotspot/share/jfr/periodic/sampling/jfrThreadSampler.cpp ??? L362: ? thread->set_trace_flag();? // Provides StoreLoad, needed to keep read of thread state not floating up. ??????? Typo: s/not floating/from floating/ src/hotspot/share/runtime/thread.hpp ??? No comments. Thumbs up! Dan > > Re-prod test and t1-t2. > > Thanks, Robbin > > On 4/15/19 10:58 AM, Robbin Ehn wrote: >> Hi, please review. >> >> After reexamine this issue: >> Threads in native must always have their stack walkable. >> JFR sampler should never need to make a stack walkable (for native >> sample). >> >> I manage to locally reproduce reliable with changes to JFR sampler >> and having >> hundreds of threads running similar code as the in the bug. >> (Looping creating an array with negative size.) >> >> I found a place where we don't proper look at the suspend flags. >> The java thread can thus escape native and make it's stack unwalkable >> and later >> it tries to make it walkable at the same time as the JFR sampler. >> >> By removing some kind of fast check and instead always call the >> check_safepoint_and_suspend_for_native_trans I can no longer reproduce. >> (which have the JFR native trans suspend check) >> And it passes t1-5. >> >> Code: >> http://cr.openjdk.java.net/~rehn/8218147/v3/webrev/ >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8218147 >> >> Thanks, Robbin >> >> On 4/5/19 5:43 PM, Robbin Ehn wrote: >>> Hi Dean, >>> >>> Sorry, I missed this mail. >>> Yes we can do that. >>> Ignore my other mail, I'll update. >>> >>> Thanks, Robbin >>> >>> >>> dean.long at oracle.com skrev: (5 april 2019 09:22:24 CEST) >>>> On 4/4/19 5:16 PM, dean.long at oracle.com wrote: >>>>> >>>>>>> >>>>>>> If it's already set, should we check that _last_Java_pc matches the >>>> >>>>>>> new value? >>>>>> >>>>>> We manually set the pc in several places, so if it's set, it's not >>>>>> certain that >>>>>> it should be the same as in last sp. >>>>>> I can't distinguish between the cases. >>>>>> >>>>> >>>>> If we get pc from sp[-1] then it should match, but you're right, we >>>>> sometimes get pc from somewhere else. >>>> >>>> How about if we combine the !walkable check and the >>>> capture_last_Java_pc() logic into a single method? >>>> Then we can do something like: >>>> >>>> ???? if (!walkable()) { >>>> ???????? address pc = (address)_last_Java_sp[-1]; >>>> ???????? address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL); >>>> ???????? assert(a == NULL || a == pc, "unexpected PC %p", a); >>>> ???? } >>>> >>>> dl From robbin.ehn at oracle.com Tue Apr 16 13:30:30 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 16 Apr 2019 15:30:30 +0200 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> <587d1afe-9f86-f801-92de-31711b711034@oracle.com> <24A2B80F-D3D4-486A-9E29-8D48D49DF030@oracle.com> <53a81521-1630-802f-ce9e-b597b62f6ac5@oracle.com> Message-ID: <5a8c7b88-bda7-ad73-3c2a-ef390a814de4@oracle.com> Thanks David, Robbin On 4/16/19 3:00 PM, David Holmes wrote: > Looks good to me. > > Thanks, > David > > On 16/04/2019 10:06 pm, Robbin Ehn wrote: >> Hi, here is v4. >> >> http://cr.openjdk.java.net/~rehn/8218147/v4/webrev/index.html >> >> Re-prod test and t1-t2. >> >> Thanks, Robbin >> >> On 4/15/19 10:58 AM, Robbin Ehn wrote: >>> Hi, please review. >>> >>> After reexamine this issue: >>> Threads in native must always have their stack walkable. >>> JFR sampler should never need to make a stack walkable (for native sample). >>> >>> I manage to locally reproduce reliable with changes to JFR sampler and having >>> hundreds of threads running similar code as the in the bug. >>> (Looping creating an array with negative size.) >>> >>> I found a place where we don't proper look at the suspend flags. >>> The java thread can thus escape native and make it's stack unwalkable and later >>> it tries to make it walkable at the same time as the JFR sampler. >>> >>> By removing some kind of fast check and instead always call the >>> check_safepoint_and_suspend_for_native_trans I can no longer reproduce. >>> (which have the JFR native trans suspend check) >>> And it passes t1-5. >>> >>> Code: >>> http://cr.openjdk.java.net/~rehn/8218147/v3/webrev/ >>> Issue: >>> https://bugs.openjdk.java.net/browse/JDK-8218147 >>> >>> Thanks, Robbin >>> >>> On 4/5/19 5:43 PM, Robbin Ehn wrote: >>>> Hi Dean, >>>> >>>> Sorry, I missed this mail. >>>> Yes we can do that. >>>> Ignore my other mail, I'll update. >>>> >>>> Thanks, Robbin >>>> >>>> >>>> dean.long at oracle.com skrev: (5 april 2019 09:22:24 CEST) >>>>> On 4/4/19 5:16 PM, dean.long at oracle.com wrote: >>>>>> >>>>>>>> >>>>>>>> If it's already set, should we check that _last_Java_pc matches the >>>>> >>>>>>>> new value? >>>>>>> >>>>>>> We manually set the pc in several places, so if it's set, it's not >>>>>>> certain that >>>>>>> it should be the same as in last sp. >>>>>>> I can't distinguish between the cases. >>>>>>> >>>>>> >>>>>> If we get pc from sp[-1] then it should match, but you're right, we >>>>>> sometimes get pc from somewhere else. >>>>> >>>>> How about if we combine the !walkable check and the >>>>> capture_last_Java_pc() logic into a single method? >>>>> Then we can do something like: >>>>> >>>>> ???? if (!walkable()) { >>>>> ???????? address pc = (address)_last_Java_sp[-1]; >>>>> ???????? address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL); >>>>> ???????? assert(a == NULL || a == pc, "unexpected PC %p", a); >>>>> ???? } >>>>> >>>>> dl From robbin.ehn at oracle.com Tue Apr 16 13:31:23 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 16 Apr 2019 15:31:23 +0200 Subject: RFR(s): 8218147: make_walkable asserts on multiple calls In-Reply-To: <2799bfbb-ae0d-12f9-35f7-a0392a6b941e@oracle.com> References: <48b0d1d0-3b12-e08d-f606-bf96a8445bf8@oracle.com> <587d1afe-9f86-f801-92de-31711b711034@oracle.com> <24A2B80F-D3D4-486A-9E29-8D48D49DF030@oracle.com> <53a81521-1630-802f-ce9e-b597b62f6ac5@oracle.com> <2799bfbb-ae0d-12f9-35f7-a0392a6b941e@oracle.com> Message-ID: Hi Dan, On 4/16/19 3:22 PM, Daniel D. Daugherty wrote: > On 4/16/19 8:06 AM, Robbin Ehn wrote: >> Hi, here is v4. >> >> http://cr.openjdk.java.net/~rehn/8218147/v4/webrev/index.html > > src/hotspot/share/jfr/periodic/sampling/jfrThreadSampler.cpp > ??? L362: ? thread->set_trace_flag();? // Provides StoreLoad, needed to keep > read of thread state not floating up. > ??????? Typo: s/not floating/from floating/ Fixed! > > src/hotspot/share/runtime/thread.hpp > ??? No comments. > > Thumbs up! > Thanks Dan! /Robbin > Dan > > >> >> Re-prod test and t1-t2. >> >> Thanks, Robbin >> >> On 4/15/19 10:58 AM, Robbin Ehn wrote: >>> Hi, please review. >>> >>> After reexamine this issue: >>> Threads in native must always have their stack walkable. >>> JFR sampler should never need to make a stack walkable (for native sample). >>> >>> I manage to locally reproduce reliable with changes to JFR sampler and having >>> hundreds of threads running similar code as the in the bug. >>> (Looping creating an array with negative size.) >>> >>> I found a place where we don't proper look at the suspend flags. >>> The java thread can thus escape native and make it's stack unwalkable and later >>> it tries to make it walkable at the same time as the JFR sampler. >>> >>> By removing some kind of fast check and instead always call the >>> check_safepoint_and_suspend_for_native_trans I can no longer reproduce. >>> (which have the JFR native trans suspend check) >>> And it passes t1-5. >>> >>> Code: >>> http://cr.openjdk.java.net/~rehn/8218147/v3/webrev/ >>> Issue: >>> https://bugs.openjdk.java.net/browse/JDK-8218147 >>> >>> Thanks, Robbin >>> >>> On 4/5/19 5:43 PM, Robbin Ehn wrote: >>>> Hi Dean, >>>> >>>> Sorry, I missed this mail. >>>> Yes we can do that. >>>> Ignore my other mail, I'll update. >>>> >>>> Thanks, Robbin >>>> >>>> >>>> dean.long at oracle.com skrev: (5 april 2019 09:22:24 CEST) >>>>> On 4/4/19 5:16 PM, dean.long at oracle.com wrote: >>>>>> >>>>>>>> >>>>>>>> If it's already set, should we check that _last_Java_pc matches the >>>>> >>>>>>>> new value? >>>>>>> >>>>>>> We manually set the pc in several places, so if it's set, it's not >>>>>>> certain that >>>>>>> it should be the same as in last sp. >>>>>>> I can't distinguish between the cases. >>>>>>> >>>>>> >>>>>> If we get pc from sp[-1] then it should match, but you're right, we >>>>>> sometimes get pc from somewhere else. >>>>> >>>>> How about if we combine the !walkable check and the >>>>> capture_last_Java_pc() logic into a single method? >>>>> Then we can do something like: >>>>> >>>>> ???? if (!walkable()) { >>>>> ???????? address pc = (address)_last_Java_sp[-1]; >>>>> ???????? address a = Atomic::cmpxchg(pc, &_last_Java_pc, NULL); >>>>> ???????? assert(a == NULL || a == pc, "unexpected PC %p", a); >>>>> ???? } >>>>> >>>>> dl > From derekw at marvell.com Tue Apr 16 14:15:22 2019 From: derekw at marvell.com (Derek White) Date: Tue, 16 Apr 2019 14:15:22 +0000 Subject: [EXT] Re: [aarch64-port-dev ] RFR(XXS): 8222412: AARCH64: lse atomics encoding is not accepting zr as source In-Reply-To: <69d9357f-4945-7461-cf08-f5ac646b1591@redhat.com> References: <7b2c42d6-8309-23ab-776c-132c1a2f4baf@bell-sw.com> <69d9357f-4945-7461-cf08-f5ac646b1591@redhat.com> Message-ID: Hi Andrew, I asked Dmitrij to look at instruction encodings. The instruction encodings are the basis to the back-end of the aarch64 port. There have been at least 10 latent bugs [1] reported in these encodings. These were unimportant because they weren't used, until they were. These encoding bugs take little effort to find, less to review, and are a small investment to make future changes in the aarch64 port easier. The effort is much lower to find and fix them at once, instead of piecemeal over a few years. Furthermore, this isn't an open universe of issues - there is a well-defined, fix set of instructions to check. We've appreciated your detailed and thoughtful reviews immensely, and I might agree that these bugs aren't worth *your* time reviewing. But I think some of your concerns reflect a lack of review capacity in the aarch64 port. The rest of the JDK happily accepts checkins that are nothing more than fixing punctuation in code comments. In the meantime, we can batch up the encoding bugs in a bundle or two. Thanks again for your reviews and all your contributions to the aarch64 port, - Derek [1] https://bugs.openjdk.java.net/browse/JDK-8210578?jql=text%20~%20%22encoding%20aarch64%22 > -----Original Message----- > From: aarch64-port-dev On > Behalf Of Andrew Dinn > Sent: Monday, April 15, 2019 12:16 PM > To: Dmitrij Pochepko ; Andrew Haley > ; aarch64-port-dev at openjdk.java.net; hotspot compiler > > Subject: [EXT] Re: [aarch64-port-dev ] RFR(XXS): 8222412: AARCH64: lse > atomics encoding is not accepting zr as source > > External Email > > ---------------------------------------------------------------------- > Hello Dmitrij, > > On 12/04/2019 16:24, Dmitrij Pochepko wrote: > > please review small fix for 8222412: AARCH64: lse atomics encoding is > > not accepting zr as source > > > > webrev: http://cr.openjdk.java.net/~dpochepk/8222412/webrev.01/ > > > > Current encoding for lse atomics hits assert when trying to use zr as > > source register while it is allowed by spec. Current vm doesn't use > > atomics with zr and this problem is not triggered. > > I think this part of your comment is critical: > > "Current vm doesn't use atomics with zr and this problem is not triggered." > > In which case I have to ask why are you spending time fixing this and asking > others to spend time reviewing it? I agree that this detail is indeed wrong. In > related news the whole internet is broken yet that's no reason for anyone to > spend their time trying to fix it. Do you have a use case for this fixed > behaviour? > > That comment may sound harsh but this fix is the nadir (at least I hope it is) > of a trajectory that has presented change after change offering little by way > of motivation and, in direct consequence, little by way of benefit. It is all > very nice to receive contributions gratis but they really need to be worth > more than the cost of accepting them. > > > Testing: > > > > I generated lse atomics with zr as source register. No assert observed > > with patched vm. > > I'd much prefer for this to be tested through being used in the VM. > Perhaps you might re-present the patch as part of a larger patch that justifies > its inclusion by fixing an actual breakage or performance problem. If you can > do that I'd be happy to pass it as reviewed. > > > CR: https://bugs.openjdk.java.net/browse/JDK-8222412 > > Also, could you please downgrade the priority of this defect to P5 so it > represent the true state of affairs. > > regards, > > > Andrew Dinn > ----------- From vladimir.kozlov at oracle.com Tue Apr 16 14:19:06 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 16 Apr 2019 07:19:06 -0700 Subject: [11u] RFR 8188133: C2: Static field accesses in clinit can trigger deoptimizations In-Reply-To: <56a46cfd-14c4-e1c6-c76b-542f6f0c8504@redhat.com> References: <56a46cfd-14c4-e1c6-c76b-542f6f0c8504@redhat.com> Message-ID: <45401d25-5b1f-5dc0-0efe-56bee8e520e2@oracle.com> Looks good. Vladimir K On 4/16/19 4:38 AM, Aleksey Shipilev wrote: > Original bug: > https://bugs.openjdk.java.net/browse/JDK-8188133 > > Original fix: > http://hg.openjdk.java.net/jdk/jdk/rev/d620a4a1d5ed > > The patch does not apply to 11u cleanly due to a different patch context in bytecodeInfo.cpp. The > changed lines in the patch itself seem to be the same as the original. > > 11u webrev: > http://cr.openjdk.java.net/~shade/8188133/wevrev.11u.01/ > > Testing: benchmark from the bug, tier1 > From vladimir.kozlov at oracle.com Tue Apr 16 14:28:20 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 16 Apr 2019 07:28:20 -0700 Subject: RFR(XS): 8218468: Load barrier slow path node should be MachTypeNode In-Reply-To: <43ba5700-d3c4-7dc4-a09d-5961b80aad0f@oracle.com> References: <43ba5700-d3c4-7dc4-a09d-5961b80aad0f@oracle.com> Message-ID: Good. Thanks, Vladimir On 4/16/19 1:44 AM, Nils Eliasson wrote: > Hi, > > We have a number of assert failures in RunThese testing. This is caused by? LoadBarrierSlowReg and > LoadBarrierWeakSlowReg not being MachTypeNodes. > > This patches fixes that. > > I am also adding includes for ZGC and Shenandoah. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8218468 > > Webrev: http://cr.openjdk.java.net/~neliasso/8218468/webrev.01/ > > Please review, > > Nils Eliasson > From aph at redhat.com Tue Apr 16 15:40:39 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 16 Apr 2019 16:40:39 +0100 Subject: [EXT] Re: [aarch64-port-dev ] RFR(XXS): 8222412: AARCH64: lse atomics encoding is not accepting zr as source In-Reply-To: References: <7b2c42d6-8309-23ab-776c-132c1a2f4baf@bell-sw.com> <69d9357f-4945-7461-cf08-f5ac646b1591@redhat.com> Message-ID: <577ed183-cdd9-9616-9e3d-878a0b7a285c@redhat.com> On 4/16/19 3:15 PM, Derek White wrote: > Hi Andrew, > > I asked Dmitrij to look at instruction encodings. > > The instruction encodings are the basis to the back-end of the > aarch64 port. There have been at least 10 latent bugs [1] reported > in these encodings. These were unimportant because they weren't > used, until they were. Firstly: some instructions use r31 for sp, some for zr. It is not a bug when the assembler does not specify which, but asserts when either is used. That is deliberate. It is because I do not want the assembler to be untested: the first time anyone tries to use those instructions for real we'll get an assert, and then the programmer will have to actually check that the right thing happens. So, let's have a look at what we've had. JDK-8210578. Fixed by adinn, actually a bug. JDK-8191769. Fixed by dpochepk. actually a bug, but unused. JDK-8221995. Fixed by dpochepk. actually a bug, but only applies to CASP, which we are unlikely ever to use. JDK-8214961. Fixed by dpochepk. not a bug JDK-8205474. Fixed by dpochepk. actually a bug, but unused. JDK-8194256. Fixed by dpochepk. actually a bug, but unused. JDK-8222412. not a bug JDK-8202395. Fixed by dpochepk. actually a bug, but unused. JDK-8201185. Fixed by dpochepk. not a bug JDK-8221765. actually a bug JDK-8195859. Fixed by adinn, actually a bug. > These encoding bugs take little effort to find, less to review, and > are a small investment to make future changes in the aarch64 port > easier. The effort is much lower to find and fix them at once, > instead of piecemeal over a few years. Furthermore, this isn't an > open universe of issues - there is a well-defined, fix set of > instructions to check. I'd rather the ones where ZR/SP was deliberately left unspecified were not fixed, really. I don't really mind if someone checks *really* *carefully*, but I don't have the time. > We've appreciated your detailed and thoughtful reviews immensely, > and I might agree that these bugs aren't worth *your* time > reviewing. But I think some of your concerns reflect a lack of > review capacity in the aarch64 port. I disagree. > The rest of the JDK happily > accepts checkins that are nothing more than fixing punctuation in > code comments. > > In the meantime, we can batch up the encoding bugs in a bundle or two. One other thing that you might like to consider. When I first wrote the assembler I checked all of the instructions against the equivalent code generated by GAS. The script is here: http://hg.openjdk.java.net/aarch64-port/jdk8/file/9a781f9c1338/test/aarch64-asmtest.py Unfortunately, as the assembler was extended this script was never updated, so it only checks the encoding of the core instructions. If you wanted to do something really useful you could extend this script to test the encoding of all the instructions we generate, including the SIMD ones. It would provide a truly independent verification of the assembler. And, once it's done, you can submit a single patch containing all of the fixes you find. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From igor.ignatyev at oracle.com Tue Apr 16 19:45:13 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 16 Apr 2019 12:45:13 -0700 Subject: [13] RFR(T): 8222417: compiler/loopopts/TestOverunrolling.java times out In-Reply-To: <6fac0535-ae63-193a-c3f9-bcb116a1da9f@oracle.com> References: <477c16f2-11a2-2740-a780-ded7794ebc27@oracle.com> <2561E67B-9E53-4FD9-9C9C-2707D17BCCD3@oracle.com> <6fac0535-ae63-193a-c3f9-bcb116a1da9f@oracle.com> Message-ID: <22983B81-2597-47A0-95AD-4637BA68C8F9@oracle.com> > On Apr 16, 2019, at 12:33 AM, Tobias Hartmann wrote: > > Hi Igor, > > that's fine with me but then we should make sure to also change all the numerous other fixes that > added '@requires !vm.graal.enabled' for the same reason. For example, JDK-8198924. that's true, and there is already a task to do that -- JDK-8207267. please note that JDK-8198924 was integrated before we introduced ProblemList-graal, so we didn't really have a choice when. > > > I've created an umbrella bug for this: > https://bugs.openjdk.java.net/browse/JDK-8222524 I guess we should close it as a dup of 8207267. > > Here are the new webrevs: > http://cr.openjdk.java.net/~thartmann/8222417/webrev.01/ > http://cr.openjdk.java.net/~thartmann/8222418/webrev.01/ both look good to me. Thanks, -- Igor > > Thanks, > Tobias > > On 16.04.19 05:41, Igor Ignatyev wrote: >> Hi Tobias, >> >> although I agree that this test shouldn't be executed w/ Graal as JIT in the current setup, I don't >> think '@requires !vm.graal.enabled' is a right choice here b/c this test still can/should be run w/ >> libgraal, so I'd suggest to put this test and other tests (e.g. TestScavengeRootsInCode.java) into >> graal-specific problem list under an umbrella bug saying smth like 'Graal is very slow w/ >> -XX:-TieredCompilation -Xcomp'. >> >> Thanks, >> -- Igor >> >>> On Apr 15, 2019, at 1:34 AM, Tobias Hartmann >> > wrote: >>> >>> Hi, >>> >>> please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8222417 >>> http://cr.openjdk.java.net/~thartmann/8222417/webrev.00/ >>> >>> The test sets -XX:-TieredCompilation -Xcomp and should therefore not be executed with Graal as JIT. >>> Otherwise all Graal methods will be compiled by Graal itself running in interpreter mode which is >>> very slow and causes the test to time out. >>> >>> Thanks, >>> Tobias >> From tobias.hartmann at oracle.com Wed Apr 17 06:16:17 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 17 Apr 2019 08:16:17 +0200 Subject: [13] RFR(T): 8222417: compiler/loopopts/TestOverunrolling.java times out In-Reply-To: <22983B81-2597-47A0-95AD-4637BA68C8F9@oracle.com> References: <477c16f2-11a2-2740-a780-ded7794ebc27@oracle.com> <2561E67B-9E53-4FD9-9C9C-2707D17BCCD3@oracle.com> <6fac0535-ae63-193a-c3f9-bcb116a1da9f@oracle.com> <22983B81-2597-47A0-95AD-4637BA68C8F9@oracle.com> Message-ID: <4712e773-d73c-23ea-d201-852f977217b0@oracle.com> Hi Igor, On 16.04.19 21:45, Igor Ignatyev wrote: > that's true, and there is already a task to do that -- JDK-8207267. please note that JDK-8198924 was integrated before we introduced ProblemList-graal, so we didn't really have a choice when. Right, I've missed that. > I guess we should close it as a dup of 8207267. Done. >> http://cr.openjdk.java.net/~thartmann/8222417/webrev.01/ >> http://cr.openjdk.java.net/~thartmann/8222418/webrev.01/ > both look good to me. Thanks, pushed (changed the bug ID to 8207267). Best regards, Tobias From vladimir.x.ivanov at oracle.com Wed Apr 17 07:33:51 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 17 Apr 2019 00:33:51 -0700 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: <0936427d-f4d2-299a-87ce-860dce5e57e1@loongson.cn> References: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> <182d87da-0d99-3f33-fbe7-ef5818be0422@loongson.cn> <0936427d-f4d2-299a-87ce-860dce5e57e1@loongson.cn> Message-ID: <574d59f5-3437-738f-e10c-796dcb02b42e@oracle.com> Though I don't consider parallel execution case as problematic, I got a better idea while browsing the code :-) http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.01 It's inspired by AbstractInterpreter::is_not_reached() and piggybacks on constant pool entry resolution state to determine whether a call was executed in interpreter before. (The change in cpCache.cpp fixes a latent bug in ConstantPoolCacheEntry::method_if_resolved().) Best regards, Vladimir Ivanov On 11/04/2019 19:27, Jie Fu wrote: > Hi Vladimir, > >>> Fixed in >>> http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.03/ >> >> I like it. What do you think about the following version? >> >> ? http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.00/ > It is more clearer and easier to understand. > I prefer your version. > > One question: I'm not sure if the following condition still holds with > parallel execution of the caller. > --------------------------------------------- > if (caller_method->was_executed_more_than(1))? return false; // trust > profile > --------------------------------------------- > > For example, assuming that the caller methods was executed concurrently > by 12 threads, is it possible that > caller_method->interpreter_invocation_count()=3 && profile.count()=0 && > no exception thrown earlier? > Thanks a lot. > > Best regards, > Jie > From fujie at loongson.cn Wed Apr 17 08:01:13 2019 From: fujie at loongson.cn (Jie Fu) Date: Wed, 17 Apr 2019 16:01:13 +0800 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: <574d59f5-3437-738f-e10c-796dcb02b42e@oracle.com> References: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> <182d87da-0d99-3f33-fbe7-ef5818be0422@loongson.cn> <0936427d-f4d2-299a-87ce-860dce5e57e1@loongson.cn> <574d59f5-3437-738f-e10c-796dcb02b42e@oracle.com> Message-ID: <3a0f21ac-6de7-56ec-bc1c-5296d782dd88@loongson.cn> Cool! I'd like to spend some time to study your patch. Thanks Vladimir. On 2019/4/17 ??3:33, Vladimir Ivanov wrote: > Though I don't consider parallel execution case as problematic, > I got a better idea while browsing the code :-) > > ? http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.01 > > It's inspired by AbstractInterpreter::is_not_reached() and piggybacks > on constant pool entry resolution state to determine whether a call > was executed in interpreter before. > > (The change in cpCache.cpp fixes a latent bug in > ConstantPoolCacheEntry::method_if_resolved().) > > Best regards, > Vladimir Ivanov > > On 11/04/2019 19:27, Jie Fu wrote: >> Hi Vladimir, >> >>>> Fixed in >>>> http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.03/ >>> >>> I like it. What do you think about the following version? >>> >>> http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.00/ >> It is more clearer and easier to understand. >> I prefer your version. >> >> One question: I'm not sure if the following condition still holds >> with parallel execution of the caller. >> --------------------------------------------- >> if (caller_method->was_executed_more_than(1))? return false; // trust >> profile >> --------------------------------------------- >> >> For example, assuming that the caller methods was executed >> concurrently by 12 threads, is it possible that >> caller_method->interpreter_invocation_count()=3 && profile.count()=0 >> && no exception thrown earlier? >> Thanks a lot. >> >> Best regards, >> Jie >> From robbin.ehn at oracle.com Wed Apr 17 10:09:19 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 17 Apr 2019 12:09:19 +0200 Subject: RFR(s): 8222640: Remove deopt suspend In-Reply-To: <2f4eeddd-1480-4dab-8c64-187282586348@oracle.com> References: <2f4eeddd-1480-4dab-8c64-187282586348@oracle.com> Message-ID: <70e28808-14a2-d063-9dcf-cf49ca3fa7f6@oracle.com> Adding compiler. /Robbin On 4/17/19 10:35 AM, Robbin Ehn wrote: > Hi all, please consider this change. > > The code for deopt suspend is no longer needed since today the register window > is always flushed when this code executes. Exactly when this code was needed is > not clear, entered via duke changeset 1. I did not dig since we no longer have > such use case. > > Webrev: > http://cr.openjdk.java.net/~rehn/8222640/webrev/ > Issue: > https://bugs.openjdk.java.net/browse/JDK-8222640 > > Passes t1-5. > > Thanks, Robbin From robbin.ehn at oracle.com Wed Apr 17 14:32:40 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 17 Apr 2019 16:32:40 +0200 Subject: RFR(s): 8222640: Remove deopt suspend In-Reply-To: <6b46fdbf-0dbb-ce9b-047a-fc6d502653e6@oracle.com> References: <2f4eeddd-1480-4dab-8c64-187282586348@oracle.com> <6b46fdbf-0dbb-ce9b-047a-fc6d502653e6@oracle.com> Message-ID: <8bdd971b-9e15-5f6f-b14f-e955c735c327@oracle.com> Hi Dan, thanks for digging! Yes, I have already forward the mail to compiler, added here also. Thanks, Robbin On 2019-04-17 15:46, Daniel D. Daugherty wrote: > On 4/17/19 4:35 AM, Robbin Ehn wrote: >> Hi all, please consider this change. >> >> The code for deopt suspend is no longer needed since today the register window >> is always flushed when this code executes. Exactly when this code was needed is not clear, entered via duke changeset >> 1. I did not dig since we no longer have such use case. >> >> Webrev: >> http://cr.openjdk.java.net/~rehn/8222640/webrev/ >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8222640 >> >> Passes t1-5. >> >> Thanks, Robbin > > Since this code was added by the Compiler team, I think you're going > to want at least one Compiler team member to chime in on this review... > > > I was going to add a historical comment to your bug, but JBS appears to > be down at the moment... This code was added by this delta: > > $ sp -r1.795.1.1 src/share/vm/runtime/thread.cpp > src/share/vm/runtime/SCCS/s.thread.cpp: > > D 1.795.1.1 06/12/07 10:06:52 sgoldman 2086 2084 00031/00010/04023 > MRs: > COMMENTS: > 6463133 - patchless deopt. Support specialized deopt suspend for register window > based machines. Pass registerMap to revoke_bias to prevent redundant stack > walks.? frames now cache the codeBlob. > > > Looks like 6463133 was not a bug that I was tracking way back then > so I don't have an email folder for it. I did find Steve Goldman's > push message for it, but the fix for 6463133 is included with four > other bug fixes: > > --------------------------------------------------------- > > Job ID:???????????????? 20061207101238.sgoldman.6463133_deopt-M > Original workspace:???? gretch:/disk2/ws/6463133_deopt-M > Submitter:????????????? sgoldman > Archived data: /net/prt-data.east/archives/main/c2_baseline/2006/20061207101238.sgoldman.6463133_deopt-M/ > Webrev: > http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2006/20061207101238.sgoldman.6463133_deopt-M/workspace/webrevs/webrev-2006.12.07/index.html > > > Fixed 6463133: Deoptimization should not use code patching > Fixed 6490483: Java support for pstack broken > Fixed 6490489: java pstack support for server on x86 corrupts stack frame info > Fixed 6490492: java support for pstack rarely gets server compiled frames correct. > Fixed 6500866: jvm crash running forte stress kit. > > This converts deoptimization to no longer do patching of code > and now only patches return address. This made a rather large > change to the frame object so that now a frame always carries > along the codeBlob it refers to if it in fact does refer to > a codeBlob. This removes lots of redundant CodeCache::find_blob > calls. The testing of this fix which obviously changes the way > frames look on the stack discovered that both SA and pstack support > have been broken. pstack support has been been broken for years. > As part of the SA changes I found that the fix for: > > 6252656: Putative invariant for TLABS _start+_size==_end+alignment_reserve() not being maintained > > didn't properly update SA and I've added that fix. > > There is a discussion of patchless deopt here > > http://j2se.sfbay/web/bin/view/HotspotCompilers/PatchlessDeopt > > which is currently (Dec. 7, 2006) out of date but which I will clean up. > > One other notable change with this putback is that there is no longer a separate > exception handler for each codeblob. Now the exception handler (and the new > deopt handler) are stored in the stub (read uncommon code) area. > > > Reviewed by: Tom > > Fix verified (y/n): yes > > Verification testing: > > ??? PRT with various stress options. NSK and JDI tests with and without stress > ??? options. Dan's forte stress tests on sparc/x86/amd64. Lots of various hand > ??? testing of SA (in addition to sasanity) pstack, and dtrace. > [end sgoldman Thu Dec? 7 13:56:55 2006 EDT] > > sgoldman Mon Dec 11 15:28:48 2006 PDT > ------------------------------------- > > > Since this is an integration push from c2_baseline -> main/baseline, I do > not have the list of files modified. I would have to find the c2_baseline > TeamWare workspace if it still exists. However, I'm not sure it would help > much since 6463133 is combined with 4 other fixes... > > I did a search for all of the files that mention 6463133 in their SCCS > history and that list is 83 files long. Ouch. I've attached that list to > this email. > > Dan From jesper.wilhelmsson at oracle.com Wed Apr 17 22:13:08 2019 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Thu, 18 Apr 2019 00:13:08 +0200 Subject: RFR: JDK-8221598 - Update Graal Message-ID: Hi, Please review the patch to integrate recent Graal changes into OpenJDK. Graal tip to integrate: 20f370437efb6b2a3f455a238da6141dc101d38c Bug: https://bugs.openjdk.java.net/browse/JDK-8221598 Webrev: http://cr.openjdk.java.net/~jwilhelm/8221598/webrev.00/ Thanks, /Jesper -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From dean.long at oracle.com Thu Apr 18 04:22:55 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 17 Apr 2019 21:22:55 -0700 Subject: RFR(s): 8222640: Remove deopt suspend In-Reply-To: <70e28808-14a2-d063-9dcf-cf49ca3fa7f6@oracle.com> References: <2f4eeddd-1480-4dab-8c64-187282586348@oracle.com> <70e28808-14a2-d063-9dcf-cf49ca3fa7f6@oracle.com> Message-ID: In frame::deoptimize(), can we assert that we have an anchor frame and that it is walkable? dl On 4/17/19 3:09 AM, Robbin Ehn wrote: > Adding compiler. > > /Robbin > > On 4/17/19 10:35 AM, Robbin Ehn wrote: >> Hi all, please consider this change. >> >> The code for deopt suspend is no longer needed since today the >> register window >> is always flushed when this code executes. Exactly when this code was >> needed is not clear, entered via duke changeset 1. I did not dig >> since we no longer have such use case. >> >> Webrev: >> http://cr.openjdk.java.net/~rehn/8222640/webrev/ >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8222640 >> >> Passes t1-5. >> >> Thanks, Robbin From robbin.ehn at oracle.com Thu Apr 18 06:56:54 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 18 Apr 2019 08:56:54 +0200 Subject: RFR(s): 8222640: Remove deopt suspend In-Reply-To: References: <2f4eeddd-1480-4dab-8c64-187282586348@oracle.com> <70e28808-14a2-d063-9dcf-cf49ca3fa7f6@oracle.com> Message-ID: <8059b5bf-c55d-97f3-6e3c-4486a5b8dc93@oracle.com> Hi Dean, On 2019-04-18 06:22, dean.long at oracle.com wrote: > In frame::deoptimize(), can we assert that we have an anchor frame and that it is walkable? > I'll add this assert and re-run tests, thanks! /Robbin > dl > > On 4/17/19 3:09 AM, Robbin Ehn wrote: >> Adding compiler. >> >> /Robbin >> >> On 4/17/19 10:35 AM, Robbin Ehn wrote: >>> Hi all, please consider this change. >>> >>> The code for deopt suspend is no longer needed since today the register window >>> is always flushed when this code executes. Exactly when this code was needed is not clear, entered via duke changeset >>> 1. I did not dig since we no longer have such use case. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~rehn/8222640/webrev/ >>> Issue: >>> https://bugs.openjdk.java.net/browse/JDK-8222640 >>> >>> Passes t1-5. >>> >>> Thanks, Robbin > From fujie at loongson.cn Thu Apr 18 08:18:50 2019 From: fujie at loongson.cn (Jie Fu) Date: Thu, 18 Apr 2019 16:18:50 +0800 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: <574d59f5-3437-738f-e10c-796dcb02b42e@oracle.com> References: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> <182d87da-0d99-3f33-fbe7-ef5818be0422@loongson.cn> <0936427d-f4d2-299a-87ce-860dce5e57e1@loongson.cn> <574d59f5-3437-738f-e10c-796dcb02b42e@oracle.com> Message-ID: Hi Vladimir, The patch[1] seems unreasonable for the following test case: ---------------------------------------------------------- public class MonteCarlo { ??? public static void main(String[] args) { ??????? double sum = 0.0; ??????? MonteCarlo mc = new MonteCarlo(); ??????? for(int i = 1; i < 3000; i++) { ??????????? sum += mc.integrate(i); ??????? } ??????? System.out.println("sum = " + sum); ??? } ??? public final double integrate(int n) { ??????? Random R = null; ??????? if (n > 0) { ????????? R = new Random(1); ??????? } else { ????????? // This call site is not reached. ????????? // But AbstractInterpreter::is_not_reached(...) returns false for it. ????????? R = new Random(2); ??????? } ??????? int underCurve = 0; ??????? for (int count = 0; count < 1000000; count++) { ??????????? double x = R.nextDouble(); ??????????? double y = R.nextDouble(); ??????????? if ( x*x + y*y <= 1.0) { ??????????????? underCurve ++; ??????????? } ??????? } ??????? return underCurve; ??? } } ---------------------------------------------------------- In patch[1], AbstractInterpreter::is_not_reached(...) is somewhat just like callee_method->was_executed_more_than(0). So I still prefer your previous patch[2]. What do you think? Thanks. Best regards, Jie [1] http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.01/ [2] http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.00/ On 2019/4/17 ??3:33, Vladimir Ivanov wrote: > Though I don't consider parallel execution case as problematic, > I got a better idea while browsing the code :-) > > ? http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.01 > > It's inspired by AbstractInterpreter::is_not_reached() and piggybacks > on constant pool entry resolution state to determine whether a call > was executed in interpreter before. > > (The change in cpCache.cpp fixes a latent bug in > ConstantPoolCacheEntry::method_if_resolved().) > > Best regards, > Vladimir Ivanov > > On 11/04/2019 19:27, Jie Fu wrote: >> Hi Vladimir, >> >>>> Fixed in >>>> http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.03/ >>> >>> I like it. What do you think about the following version? >>> >>> http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.00/ >> It is more clearer and easier to understand. >> I prefer your version. >> >> One question: I'm not sure if the following condition still holds >> with parallel execution of the caller. >> --------------------------------------------- >> if (caller_method->was_executed_more_than(1))? return false; // trust >> profile >> --------------------------------------------- >> >> For example, assuming that the caller methods was executed >> concurrently by 12 threads, is it possible that >> caller_method->interpreter_invocation_count()=3 && profile.count()=0 >> && no exception thrown earlier? >> Thanks a lot. >> >> Best regards, >> Jie >> From fujie at loongson.cn Thu Apr 18 08:28:58 2019 From: fujie at loongson.cn (Jie Fu) Date: Thu, 18 Apr 2019 16:28:58 +0800 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: References: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> <182d87da-0d99-3f33-fbe7-ef5818be0422@loongson.cn> <0936427d-f4d2-299a-87ce-860dce5e57e1@loongson.cn> <574d59f5-3437-738f-e10c-796dcb02b42e@oracle.com> Message-ID: I'm sorry I missed the running script for my test case. -------------------------------------- #!/bin/bash JDK=/home/fool/jdk-dev/build/linux-x86_64-server-release/images/jdk ${JDK}/bin/javac MonteCarlo.java ${JDK}/bin/java \ ? -XX:+PrintCompilation \ ? -XX:-TieredCompilation \ ? -XX:CICompilerCount=1 \ ? -XX:+UnlockDiagnosticVMOptions \ ? -XX:+PrintInlining \ ? -XX:-UseOnStackReplacement \ ? MonteCarlo -------------------------------------- On 2019/4/18 ??4:18, Jie Fu wrote: > Hi Vladimir, > > The patch[1] seems unreasonable for the following test case: > ---------------------------------------------------------- > public class MonteCarlo { > ??? public static void main(String[] args) { > ??????? double sum = 0.0; > ??????? MonteCarlo mc = new MonteCarlo(); > > ??????? for(int i = 1; i < 3000; i++) { > ??????????? sum += mc.integrate(i); > ??????? } > > ??????? System.out.println("sum = " + sum); > ??? } > > ??? public final double integrate(int n) { > ??????? Random R = null; > ??????? if (n > 0) { > ????????? R = new Random(1); > ??????? } else { > ????????? // This call site is not reached. > ????????? // But AbstractInterpreter::is_not_reached(...) returns > false for it. > ????????? R = new Random(2); > ??????? } > > ??????? int underCurve = 0; > ??????? for (int count = 0; count < 1000000; count++) { > > ??????????? double x = R.nextDouble(); > ??????????? double y = R.nextDouble(); > > ??????????? if ( x*x + y*y <= 1.0) { > ??????????????? underCurve ++; > ??????????? } > ??????? } > ??????? return underCurve; > ??? } > } > ---------------------------------------------------------- > > In patch[1], AbstractInterpreter::is_not_reached(...) is somewhat just > like callee_method->was_executed_more_than(0). > So I still prefer your previous patch[2]. > > What do you think? > Thanks. > > Best regards, > Jie > > [1] http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.01/ > [2] http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.00/ > > > On 2019/4/17 ??3:33, Vladimir Ivanov wrote: >> Though I don't consider parallel execution case as problematic, >> I got a better idea while browsing the code :-) >> >> ? http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.01 >> >> It's inspired by AbstractInterpreter::is_not_reached() and piggybacks >> on constant pool entry resolution state to determine whether a call >> was executed in interpreter before. >> >> (The change in cpCache.cpp fixes a latent bug in >> ConstantPoolCacheEntry::method_if_resolved().) >> >> Best regards, >> Vladimir Ivanov >> >> On 11/04/2019 19:27, Jie Fu wrote: >>> Hi Vladimir, >>> >>>>> Fixed in >>>>> http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.03/ >>>> >>>> I like it. What do you think about the following version? >>>> >>>> http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.00/ >>> It is more clearer and easier to understand. >>> I prefer your version. >>> >>> One question: I'm not sure if the following condition still holds >>> with parallel execution of the caller. >>> --------------------------------------------- >>> if (caller_method->was_executed_more_than(1))? return false; // >>> trust profile >>> --------------------------------------------- >>> >>> For example, assuming that the caller methods was executed >>> concurrently by 12 threads, is it possible that >>> caller_method->interpreter_invocation_count()=3 && profile.count()=0 >>> && no exception thrown earlier? >>> Thanks a lot. >>> >>> Best regards, >>> Jie >>> > From fujie at loongson.cn Thu Apr 18 09:54:20 2019 From: fujie at loongson.cn (Jie Fu) Date: Thu, 18 Apr 2019 17:54:20 +0800 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: <574d59f5-3437-738f-e10c-796dcb02b42e@oracle.com> References: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> <182d87da-0d99-3f33-fbe7-ef5818be0422@loongson.cn> <0936427d-f4d2-299a-87ce-860dce5e57e1@loongson.cn> <574d59f5-3437-738f-e10c-796dcb02b42e@oracle.com> Message-ID: <61b32258-0107-787e-68da-ce42c3223cd3@loongson.cn> Hi Vladimir, > Though I don't consider parallel execution case as problematic, > I got a better idea while browsing the code :-) > > ? http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.01 Aha! I've found a way to show you that the following condition in patch[1] does NOT hold with the parallel execution of the caller. ----------------------------------------------- if (caller_method->was_executed_more_than(1))? return false; // trust profile ----------------------------------------------- Step 1: Apply this patch ----------------------------------------------- diff -r 5de35f58f70c src/hotspot/share/opto/bytecodeInfo.cpp --- a/src/hotspot/share/opto/bytecodeInfo.cpp?? Thu Apr 18 02:45:02 2019 +0200 +++ b/src/hotspot/share/opto/bytecodeInfo.cpp?? Thu Apr 18 17:32:16 2019 +0800 @@ -374,6 +374,8 @@ ?????? // Inlining was forced by CompilerOracle, ciReplay or annotation ???? } else if (profile.count() == 0) { ?????? // don't inline unreached call sites +?????? tty->print_cr("caller_method count = %d, was_executed_more_than(1) is %s", +??????????? caller_method->interpreter_invocation_count(), caller_method->was_executed_more_than(1) ? "true" : "false"); ??????? set_msg("call site not reached"); ??????? return false; ???? } ----------------------------------------------- Step 2: Run SPECjvm2008's scimark.monte_carlo with the reproduce script[2] on a machine with high parallelism. Step 3: Just wait and see the result. For example, I run it on an i7-8700 machine with just 12 threads. Here is the result showing that profile.count is 0 && caller_method->was_executed_more_than(1) is true. ----------------------------------------------- ? Benchmark:?? scimark.monte_carlo ? Run mode:??? timed run ? Test type:?? multi ? Threads:???? 12 ? Warmup:????? 120s ? Iterations:? 1 ? Run length:? 240s ??? 275?? 72???????????? java.lang.StringBuilder::append (8 bytes)?? made not entrant ??? 275?? 99???????????? java.io.File:: (47 bytes) made not entrant Warmup (120s) begins: Thu Apr 18 17:25:33 CST 2019 ??? 281? 113? s spec.benchmarks.scimark.utils.Random::nextDouble (124 bytes) ??? 282? 114 % spec.benchmarks.scimark.monte_carlo.MonteCarlo::integrate @ 15 (68 bytes) ????????????? s???????????? @ 22 spec.benchmarks.scimark.utils.Random::nextDouble (124 bytes) inline (hot) ????????????? s???????????? @ 28 spec.benchmarks.scimark.utils.Random::nextDouble (124 bytes) inline (hot) ??? 432? 114 % spec.benchmarks.scimark.monte_carlo.MonteCarlo::integrate @ 15 (68 bytes)?? made not entrant ??? 433? 115 spec.benchmarks.scimark.monte_carlo.MonteCarlo::integrate (68 bytes) caller_method count = 13, was_executed_more_than(1) is true ??????????????????????????? @ 6 spec.benchmarks.scimark.utils.Random:: (53 bytes) call site not reached ????????????? s???????????? @ 22 spec.benchmarks.scimark.utils.Random::nextDouble (124 bytes) inline (hot) ????????????? s???????????? @ 28 spec.benchmarks.scimark.utils.Random::nextDouble (124 bytes) inline (hot) ??? 436? 116 % spec.benchmarks.scimark.monte_carlo.MonteCarlo::integrate @ 15 (68 bytes) ????????????? s???????????? @ 22 spec.benchmarks.scimark.utils.Random::nextDouble (124 bytes) inline (hot) ????????????? s???????????? @ 28 spec.benchmarks.scimark.utils.Random::nextDouble (124 bytes) inline (hot) ----------------------------------------------- So do you agree to remove that condition in your patch[1]? Thanks a lot. Best regards, Jie [1] http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.00/ [2] http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/reproduce.sh From rwestrel at redhat.com Thu Apr 18 15:46:13 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 18 Apr 2019 17:46:13 +0200 Subject: RFR(S): 8222738: Shenandoah: assert(is_Proj()) failed when running cometd benchmarks Message-ID: <87zhonnwoq.fsf@redhat.com> http://cr.openjdk.java.net/~roland/8222738/webrev.00/ The failure occurs because Shenandoah barrier expansion code tries to expand a barrier whose control is a call node. That happens because one of the uses of the barrier is a CastPP whose control dominates the call while one its input is dependent on the return from the call. That, in turn, occurs because the null check that the CastPP depends on is optimized out by ConnectionGraph::optimize_ptr_compare() during EA. Then the CastPP depends on another unrelated check and further optimization of that check causes the CastPP control to change to something that dominates the call. Barrier expansion already has logic to deal with a barrier on the control projection of a call because it's shared between the exception and fallthrough paths. The fix I propose is to simply piggy back on that logic (by making clones for the exception and fallthrough paths). The conditions under which this particular bug shows up seem rare enough that going with the simplest fix is the wisest thing to do even though it causes a barrier to be emitted in both the exception path and the fallthrough path when it's possible only one of those paths really needs a barrier. Roland. From dmitrij.pochepko at bell-sw.com Thu Apr 18 16:54:36 2019 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Thu, 18 Apr 2019 19:54:36 +0300 Subject: RFR(XXS): 8222412: AARCH64: lse atomics encoding is not accepting zr as source In-Reply-To: <7b2c42d6-8309-23ab-776c-132c1a2f4baf@bell-sw.com> References: <7b2c42d6-8309-23ab-776c-132c1a2f4baf@bell-sw.com> Message-ID: <3b26d696-5c40-a5cd-5112-2898316f55cc@bell-sw.com> Hi all, I'm withdrawing this patch and will rework it according to comments. I'll use updated aarch64_asmtest.py instruction generator and all problems found by it will be fixed in one batch unless patch going to be too large for comfortable review. Since it can only check positive cases, I'll also going to separately process negative cases, where incorrect parameters for instructions generation expected to be refused (mostly various missing asserts) as separate patch Thanks, Dmitrij On 12/04/2019 6:24 PM, Dmitrij Pochepko wrote: > > Hi all, > > please review small fix for 8222412: AARCH64: lse atomics encoding is > not accepting zr as source > > webrev: http://cr.openjdk.java.net/~dpochepk/8222412/webrev.01/ > > Current encoding for lse atomics hits assert when trying to use zr as > source register while it is allowed by spec. Current vm doesn't use > atomics with zr and this problem is not triggered. > > > Testing: > > I generated lse atomics with zr as source register. No assert observed > with patched vm. > > > CR: https://bugs.openjdk.java.net/browse/JDK-8222412 > > Thanks, > Dmitrij > > From xxinliu at amazon.com Thu Apr 18 19:46:01 2019 From: xxinliu at amazon.com (Liu, Xin) Date: Thu, 18 Apr 2019 19:46:01 +0000 Subject: 8222670 patch review: prevent downgraded tasks from recompiling Message-ID: Hi, hotspot-compiler group, Could you review this webrev for JDK-8222670? https://cr.openjdk.java.net/~xliu/8222670/webrev.01/ Inside of our company, we saw some services suffer from recompilation and codecache bloat. Eg. the attachment was exacted from a real application logs. It happened randomly and eventually it would fill up the whole codecache. I wrote a testcase to simulate this problem: Level2RecompilationTest.java I don?t know how to enqueue a OSR method. I know hard-wired bci=15 sounds stupid. Arbitrary bci doesn?t work. It may crash in anaalyis. Any suggest? This patch is to detect pre-compiled nmethod when downgrade from level3 to level2. It will drop the task if it has been compiled. Thanks, --lx -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: lvl2_recomp_spring.log.zip Type: application/zip Size: 51797 bytes Desc: lvl2_recomp_spring.log.zip URL: From ekaterina.pavlova at oracle.com Thu Apr 18 22:38:53 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Thu, 18 Apr 2019 15:38:53 -0700 Subject: RFR (T/XXS) 8222747: [Graal] mx_subprocess files miss testing VM flags Message-ID: Hi, Please review small change which fixes command line written in 'mx_subprocess.cmd' file used by some Graal unit tests. JBS: https://bugs.openjdk.java.net/browse/JDK-8222747 webrev: http://cr.openjdk.java.net/~epavlova//8222747/webrev.00/index.html testing: run graalunit tests in all testing configurations Thanks, -katya From vladimir.kozlov at oracle.com Fri Apr 19 01:54:06 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 18 Apr 2019 18:54:06 -0700 Subject: RFR (T/XXS) 8222747: [Graal] mx_subprocess files miss testing VM flags In-Reply-To: References: Message-ID: <19daec7b-f44c-8de6-9be1-34ce7473c5aa@oracle.com> Good. Thanks, Vladimir On 4/18/19 3:38 PM, Ekaterina Pavlova wrote: > Hi, > > Please review small change which fixes command line written in 'mx_subprocess.cmd' file used by some Graal unit tests. > > > ??? JBS: https://bugs.openjdk.java.net/browse/JDK-8222747 > ?webrev: http://cr.openjdk.java.net/~epavlova//8222747/webrev.00/index.html > testing: run graalunit tests in all testing configurations > > > Thanks, > -katya > From sgehwolf at redhat.com Fri Apr 19 16:30:05 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Fri, 19 Apr 2019 18:30:05 +0200 Subject: 8222670 patch review: prevent downgraded tasks from recompiling In-Reply-To: References: Message-ID: <99aae03d0315482c723abda2f2cb530b4b52f82d.camel@redhat.com> On Thu, 2019-04-18 at 19:46 +0000, Liu, Xin wrote: > Hi, hotspot-compiler group, > > Could you review this webrev for JDK-8222670? > https://cr.openjdk.java.net/~xliu/8222670/webrev.01/ +++ new/test/hotspot/jtreg/compiler/tiered/TieredLevelsTest.java 2019-04-18 12:18:38.000000000 -0700 @@ -89,7 +89,7 @@ && actual == COMP_LEVEL_LIMITED_PROFILE) { // for simple method full_profile may be replaced by limited_profile if (IS_VERBOSE) { - System.out.printf("Level check: full profiling was replaced " + System.out.println("Level check: full profiling was replaced " + "by limited profiling. Expected: %d, actual:%d", expected, actual); This seems an unintended change, is it? Thanks, Severin From vladimir.kozlov at oracle.com Fri Apr 19 17:15:41 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 19 Apr 2019 10:15:41 -0700 Subject: RFR: JDK-8221598 - Update Graal In-Reply-To: References: Message-ID: Changes looks good. I looked on tests results and most of them are timeouts because Graal was run with -Xcomp. I was not able to identify serious issues because there were >200 failed tests - difficult to search. Someone have to look on results and see if there are new failures. Thanks, Vladimir On 4/17/19 3:13 PM, jesper.wilhelmsson at oracle.com wrote: > Hi, > > Please review the patch to integrate recent Graal changes into OpenJDK. > Graal tip to integrate: 20f370437efb6b2a3f455a238da6141dc101d38c > > Bug: https://bugs.openjdk.java.net/browse/JDK-8221598 > Webrev: http://cr.openjdk.java.net/~jwilhelm/8221598/webrev.00/ > > Thanks, > /Jesper > From dean.long at oracle.com Fri Apr 19 18:29:42 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 19 Apr 2019 11:29:42 -0700 Subject: RFR: JDK-8221598 - Update Graal In-Reply-To: References: Message-ID: I only see one failure in tiers 1-4, and it looks like JDK-8222550. dl On 4/19/19 10:15 AM, Vladimir Kozlov wrote: > Changes looks good. > > I looked on tests results and most of them are timeouts because Graal > was run with -Xcomp. > I was not able to identify serious issues because there were >200 > failed tests - difficult to search. > Someone have to look on results and see if there are new failures. > > Thanks, > Vladimir > > On 4/17/19 3:13 PM, jesper.wilhelmsson at oracle.com wrote: >> Hi, >> >> Please review the patch to integrate recent Graal changes into OpenJDK. >> Graal tip to integrate: 20f370437efb6b2a3f455a238da6141dc101d38c >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8221598 >> Webrev: http://cr.openjdk.java.net/~jwilhelm/8221598/webrev.00/ >> >> Thanks, >> /Jesper >> From vladimir.kozlov at oracle.com Fri Apr 19 19:06:20 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 19 Apr 2019 12:06:20 -0700 Subject: RFR: JDK-8221598 - Update Graal In-Reply-To: References: Message-ID: <2976f9cb-90f5-731e-f30f-ee354032d6f7@oracle.com> On 4/19/19 11:29 AM, dean.long at oracle.com wrote: > I only see one failure in tiers 1-4, and it looks like JDK-8222550. Okay. We can push it then. Vladimir > > dl > > On 4/19/19 10:15 AM, Vladimir Kozlov wrote: >> Changes looks good. >> >> I looked on tests results and most of them are timeouts because Graal was run with -Xcomp. >> I was not able to identify serious issues because there were >200 failed tests - difficult to search. >> Someone have to look on results and see if there are new failures. >> >> Thanks, >> Vladimir >> >> On 4/17/19 3:13 PM, jesper.wilhelmsson at oracle.com wrote: >>> Hi, >>> >>> Please review the patch to integrate recent Graal changes into OpenJDK. >>> Graal tip to integrate: 20f370437efb6b2a3f455a238da6141dc101d38c >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8221598 >>> Webrev: http://cr.openjdk.java.net/~jwilhelm/8221598/webrev.00/ >>> >>> Thanks, >>> /Jesper >>> > From vladimir.x.ivanov at oracle.com Fri Apr 19 23:19:23 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 19 Apr 2019 16:19:23 -0700 Subject: RFC: Recuperate slowdown of long-running static initialization Message-ID: <6c94c636-92e7-190a-421e-b68819cd2c3b@oracle.com> Hi, Recent changes severely affected how static intializers are executed and for long-running initializers it manifested as a severe slowdown. As an example, it manifests as 3x slowdown for some Clojure applications (JDK-8219233 [1]). The root cause is that until a class is fully initialized, every invocation of static method on it goes through method resolution. There were some changes (JDK-8219974 [2]) to partially recuperate the slowdown, but they had limited effect. I have been experimenting with a comprehensive fix and ended up with the following: http://cr.openjdk.java.net/~vlivanov/8219233/webrev.02/ (Unfortunately, I had to go with platform-specific changes and the patch contains only x86_64 part. On other platforms original behavior is preserved.) The idea is to put initialization barrier on entry into static methods. If wrong thread enters it, the thread is blocked until class initialization is finished (and exception is thrown if initialization finishes with an error). The barrier is as simple as: if (holder->is_not_initialized() && holder->is_reentrant_initialization(current_thread)) { // trigger call site re-resolution and block there } Performance experiments demonstrated that even through generated code contributes the most overhead, interpreter overhead is visible as well (~20%). (1) original (always reresolve): ~12,0s ( 1x) (2) C1/C2 - barriers; int - reresolve: ~3,8s (~3x) (3) int/C1/C2 - barriers: ~3,2s (-20%) Based on that, I decided to implement barriers both in JIT-compilers (C1/C2) & interpreter. For C1/C2 I made a decision to put the barrier at callee side (in nmethod prologue). Though it looks attractive to put it on caller side (before the call), it poses major implementation challenges for C1 where unresolved calls are eagerly compiled. For interpreter, on the other hand, it's much simpler to implement the barrier: throwing an exception on method entry is much more complicated than doing that as part of method resolution during the call. So, here's the correspondence between barriers and transitions they cover: (1) from interpreter (barrier on caller side) * all transitions: interpreter, compiled (i2c), native, aot, ... (2) from compiled (barrier on callee side) to compiled, to native (barrier in native wrapper on entry) (3) c2i bypasses both barriers (interpreter and compiled) and requires a dedicated barrier in c2i (4) to Graal/AOT: from interpreter: covered by interpreter barrier from compiled: current patch doesn't cover Graal and AOT, so call site patching is disabled for them leading to repeated call site resolution until method holder is fully initialized. I'd like to hear opinions about the patch and decisions I made before publishing it for review. For example, is it worth to change template interpreter? The change itself is small and localized, and performance improvement is noticeable, but still it resides in platform-specific code. Regarding the implementation of barriers in generated code, nmethod entry barriers (introduced by 8210498 [3]) look like a perfect fit (and I even experimented with them), but I decided to leave it aside for now: mainly to ease backports (8210498 was introduced in 12), but also to ease support on other platforms (as of now, nmethod entry barriers are supported solely on x86_64). As a followup work, the implementations can be unified in 13/12u. Entry barriers support in Graal/AOT is left for future work as well. Once the support is there, call site patching restrictions should be relaxed. Thanks! Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8219233 [2] https://bugs.openjdk.java.net/browse/JDK-8219974 [3] https://bugs.openjdk.java.net/browse/JDK-8210498 From vladimir.x.ivanov at oracle.com Sat Apr 20 01:26:38 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 19 Apr 2019 18:26:38 -0700 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: References: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> <182d87da-0d99-3f33-fbe7-ef5818be0422@loongson.cn> <0936427d-f4d2-299a-87ce-860dce5e57e1@loongson.cn> <574d59f5-3437-738f-e10c-796dcb02b42e@oracle.com> Message-ID: Good catch, Jie! The resolution state is usually shared between call sites referring to the same constant pool entry, so is_not_reached() is definite only when invoke is not reached. I'm OK with dropping was_executed_more_than(1) check. I hoped it could help catching the case when exception is thrown before the call, but the check itself causes more problems than I thought. Here's updated version: http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.02 After some explorations I decided to keep original behavior for immature profiles (profile.count == -1). Best regards, Vladimir Ivanov On 18/04/2019 01:18, Jie Fu wrote: > Hi Vladimir, > > The patch[1] seems unreasonable for the following test case: > ---------------------------------------------------------- > public class MonteCarlo { > ??? public static void main(String[] args) { > ??????? double sum = 0.0; > ??????? MonteCarlo mc = new MonteCarlo(); > > ??????? for(int i = 1; i < 3000; i++) { > ??????????? sum += mc.integrate(i); > ??????? } > > ??????? System.out.println("sum = " + sum); > ??? } > > ??? public final double integrate(int n) { > ??????? Random R = null; > ??????? if (n > 0) { > ????????? R = new Random(1); > ??????? } else { > ????????? // This call site is not reached. > ????????? // But AbstractInterpreter::is_not_reached(...) returns false > for it. > ????????? R = new Random(2); > ??????? } > > ??????? int underCurve = 0; > ??????? for (int count = 0; count < 1000000; count++) { > > ??????????? double x = R.nextDouble(); > ??????????? double y = R.nextDouble(); > > ??????????? if ( x*x + y*y <= 1.0) { > ??????????????? underCurve ++; > ??????????? } > ??????? } > ??????? return underCurve; > ??? } > } > ---------------------------------------------------------- > > In patch[1], AbstractInterpreter::is_not_reached(...) is somewhat just > like callee_method->was_executed_more_than(0). > So I still prefer your previous patch[2]. > > What do you think? > Thanks. > > Best regards, > Jie > > [1] http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.01/ > [2] http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.00/ > > > On 2019/4/17 ??3:33, Vladimir Ivanov wrote: >> Though I don't consider parallel execution case as problematic, >> I got a better idea while browsing the code :-) >> >> ? http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.01 >> >> It's inspired by AbstractInterpreter::is_not_reached() and piggybacks >> on constant pool entry resolution state to determine whether a call >> was executed in interpreter before. >> >> (The change in cpCache.cpp fixes a latent bug in >> ConstantPoolCacheEntry::method_if_resolved().) >> >> Best regards, >> Vladimir Ivanov >> >> On 11/04/2019 19:27, Jie Fu wrote: >>> Hi Vladimir, >>> >>>>> Fixed in >>>>> http://cr.openjdk.java.net/~jiefu/monte_carlo-perf-drop/webrev.03/ >>>> >>>> I like it. What do you think about the following version? >>>> >>>> http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.00/ >>> It is more clearer and easier to understand. >>> I prefer your version. >>> >>> One question: I'm not sure if the following condition still holds >>> with parallel execution of the caller. >>> --------------------------------------------- >>> if (caller_method->was_executed_more_than(1))? return false; // trust >>> profile >>> --------------------------------------------- >>> >>> For example, assuming that the caller methods was executed >>> concurrently by 12 threads, is it possible that >>> caller_method->interpreter_invocation_count()=3 && profile.count()=0 >>> && no exception thrown earlier? >>> Thanks a lot. >>> >>> Best regards, >>> Jie >>> > From vladimir.x.ivanov at oracle.com Sat Apr 20 01:44:59 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 19 Apr 2019 18:44:59 -0700 Subject: Can we dump more pretty CFG file format for C1 compiler HIR In-Reply-To: References: Message-ID: (BCCing hotspot-dev) > `+XX:+PrintCFGToFile` could dump ssa-based HIR into file, but its format is messy:```cfg > begin_states > begin_locals > size 3 > method "static jint com.github.kelthuzadx.HelloWorld.jerkSum(jint)" > 1 i7 "[R570|I]" > end_locals > end_states > begin_HIR > .22 0 i15 ireturn i7 <|@ > end_HIR > begin_LIR > 42 label [label:0x000002136a396400] <|@ > 44 move [R570|I] [rax|I] <|@ > 46 return [rax|I] <|@ > end_LIR > end_block > end_cfg > begin_intervals > name "Before Register Allocation" > 3 fixed "[rax|I]" 3 570 [0, 1[ [44, 46[ "no spill store" > 4 fixed "[rdx|I]" 4 -1 [0, 4[ "no definition" > 569 int 569 4 [4, 10[ 4 M 10 S "no spill store" > 570 int 570 569 [10, 26[ [38, 44[ 10 M 26 S 38 M 41 L 44 S "no optimization" > 571 int 571 573 [12, 30[ [36, 42[ 12 M 18 M 28 S 30 S 36 M 41 L "no optimization" > 572 int 572 570 [26, 38[ 26 M 28 M 38 S "no spill store" > 573 int 573 571 [30, 36[ 30 M 32 M 36 S "no spill store" > end_intervals > > > ``` > So can we produce more readable format? Do you have any particular format in mind or just looking for something more readable? Are you satisfied with what -XX:+PrintIR [1] / -XX:+PrintLIR [2] produce? BTW it is confusing to see -XX:+PrintCFGToFile dumping HIR/LIR info as well when -XX:+PrintCFG dumps exclusively CFG. Best regards, Vladimir Ivanov [1] -XX:+PrintIR IR after parsing B1 [0, 0] -> B0 sux: B0 empty stack inlining depth 0 __bci__use__tid____instr____________________________________ . 0 0 13 std entry B0 B0 (SV) [0, 12] pred: B1 empty stack inlining depth 0 __bci__use__tid____instr____________________________________ 0 0 a5 4 0 l7 16L . 9 0 i8 compareAndSetLong(a1, l7, l2, l3) 12 0 i9 1 12 0 i10 i8 & i9 . 12 0 i11 ireturn i10 [2] -XX:+PrintLIR LIR: B1 [0, 0] sux: B0 __id_Instruction___________________________________________ 0 label [label:0x00007ffd2688a6c0] 2 std_entry B0 std [0, 12] preds: B1 __id_Instruction___________________________________________ 12 label [label:0x00007ffd268895e0] 14 leal [Base:[rsi|L] Disp: 16|J] [rsirsi|J] 16 move [rdxrdx|J] [raxrax|J] 18 move [rcxrcx|J] [rbxrbx|J] 20 cas_long [rsirsi|J] [raxrax|J] [rbxrbx|J] 22 cmove [EQ] [int:1|I] [int:0|I] [rax|I] 26 logic_and [rax|I] [int:1|I] [rax|I] 30 return [rax|I] From fujie at loongson.cn Sat Apr 20 02:57:34 2019 From: fujie at loongson.cn (Jie Fu) Date: Sat, 20 Apr 2019 10:57:34 +0800 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: References: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> <182d87da-0d99-3f33-fbe7-ef5818be0422@loongson.cn> <0936427d-f4d2-299a-87ce-860dce5e57e1@loongson.cn> <574d59f5-3437-738f-e10c-796dcb02b42e@oracle.com> Message-ID: <5275854c-ab35-f160-f6f0-6ab9ac86e3d0@loongson.cn> Hi Vladimir, > I'm OK with dropping was_executed_more_than(1) check. I hoped it could > help catching the case when exception is thrown before the call, but > the check itself causes more problems than I thought. Thank you. > After some explorations I decided to keep original behavior for > immature profiles (profile.count == -1). I agree. I have two questions here. 1. What's the difference of the following two if statements? ------------------------------------------------- +? if (!callee_method->was_executed_more_than(0))? return true; // callee was never executed + +? if (caller_method->is_not_reached(caller_bci))? return true; // call site not resolved ------------------------------------------------- I think only one of them is needed. 2. Does the assert in InlineTree::is_not_reached(...) make sense? Since we have ------------------------------------------------- if (profile.count() > 0)?? return false; // reachable according to profile ------------------------------------------------- and ------------------------------------------------- if (profile.count() == -1) {...} ------------------------------------------------- before ------------------------------------------------- assert(profile.count() == 0, "sanity"); ------------------------------------------------- is the assert redundant? What do you think? Thanks. Best regards, Jie From vladimir.x.ivanov at oracle.com Sat Apr 20 03:18:08 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 19 Apr 2019 20:18:08 -0700 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: <5275854c-ab35-f160-f6f0-6ab9ac86e3d0@loongson.cn> References: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> <182d87da-0d99-3f33-fbe7-ef5818be0422@loongson.cn> <0936427d-f4d2-299a-87ce-860dce5e57e1@loongson.cn> <574d59f5-3437-738f-e10c-796dcb02b42e@oracle.com> <5275854c-ab35-f160-f6f0-6ab9ac86e3d0@loongson.cn> Message-ID: <8bc507fe-b6db-d697-8821-0547860de232@oracle.com> >> After some explorations I decided to keep original behavior for >> immature profiles (profile.count == -1). > > I agree. > > I have two questions here. > > 1. What's the difference of the following two if statements? > ------------------------------------------------- > +? if (!callee_method->was_executed_more_than(0))? return true; // > callee was never executed > + > +? if (caller_method->is_not_reached(caller_bci))? return true; // call > site not resolved > ------------------------------------------------- > I think only one of them is needed. The checks are complimentary: one inspects callee and the other looks at call site. "!callee_method->was_executed_more_than(0)" ensures that callee was executed at least once. "caller_method->is_not_reached(caller_bci)" inspects the state of the call site. If corresponding CP entry is not resolved, then the call site isn't reached. If is_not_reached() returns false, it's not a definitive answer: there's still a chance the site is not reached - consider the case of virtual calls where callee_method may differ for the same resolved method. > 2. Does the assert in InlineTree::is_not_reached(...) make sense? > Since we have > ------------------------------------------------- > if (profile.count() > 0)?? return false; // reachable according to profile > ------------------------------------------------- > and > ------------------------------------------------- > if (profile.count() == -1) {...} > ------------------------------------------------- > before > ------------------------------------------------- > assert(profile.count() == 0, "sanity"); > ------------------------------------------------- > is the assert redundant? Asserts are intended to be redundant :-) But still catch bugs from time to time. This one, in particular, checks invariant on profile.count() >= -1 (which is not very useful by itself), but also stresses that "profile.count() == 0" case is being processed. Best regards, Vladimir Ivanov From fujie at loongson.cn Sat Apr 20 03:35:26 2019 From: fujie at loongson.cn (Jie Fu) Date: Sat, 20 Apr 2019 11:35:26 +0800 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: <8bc507fe-b6db-d697-8821-0547860de232@oracle.com> References: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> <182d87da-0d99-3f33-fbe7-ef5818be0422@loongson.cn> <0936427d-f4d2-299a-87ce-860dce5e57e1@loongson.cn> <574d59f5-3437-738f-e10c-796dcb02b42e@oracle.com> <5275854c-ab35-f160-f6f0-6ab9ac86e3d0@loongson.cn> <8bc507fe-b6db-d697-8821-0547860de232@oracle.com> Message-ID: <1a398a1f-ed52-2197-5886-d9d5fd872974@loongson.cn> Ah, I got it. I like your patch and benefit a lot from you. Thank you so much, Vladimir. Any comments from other reviewers? Thanks. Best regards, Jie On 2019/4/20 ??11:18, Vladimir Ivanov wrote: > >>> After some explorations I decided to keep original behavior for >>> immature profiles (profile.count == -1). >> >> I agree. >> >> I have two questions here. >> >> 1. What's the difference of the following two if statements? >> ------------------------------------------------- >> +? if (!callee_method->was_executed_more_than(0))? return true; // >> callee was never executed >> + >> +? if (caller_method->is_not_reached(caller_bci))? return true; // >> call site not resolved >> ------------------------------------------------- >> I think only one of them is needed. > > The checks are complimentary: one inspects callee and the other looks > at call site. > > "!callee_method->was_executed_more_than(0)" ensures that callee was > executed at least once. > > "caller_method->is_not_reached(caller_bci)" inspects the state of the > call site. If corresponding CP entry is not resolved, then the call > site isn't reached. If is_not_reached() returns false, it's not a > definitive answer: there's still a chance the site is not reached - > consider the case of virtual calls where callee_method may differ for > the same resolved method. > >> 2. Does the assert in InlineTree::is_not_reached(...) make sense? >> Since we have >> ------------------------------------------------- >> if (profile.count() > 0)?? return false; // reachable according to >> profile >> ------------------------------------------------- >> and >> ------------------------------------------------- >> if (profile.count() == -1) {...} >> ------------------------------------------------- >> before >> ------------------------------------------------- >> assert(profile.count() == 0, "sanity"); >> ------------------------------------------------- >> is the assert redundant? > > Asserts are intended to be redundant :-) But still catch bugs from > time to time. > > This one, in particular, checks invariant on profile.count() >= -1 > (which is not very useful by itself), but also stresses that > "profile.count() == 0" case is being processed. > > Best regards, > Vladimir Ivanov From xxinliu at amazon.com Sat Apr 20 06:19:57 2019 From: xxinliu at amazon.com (Liu, Xin) Date: Sat, 20 Apr 2019 06:19:57 +0000 Subject: 8222670 patch review: prevent downgraded tasks from recompiling In-Reply-To: <99aae03d0315482c723abda2f2cb530b4b52f82d.camel@redhat.com> References: <99aae03d0315482c723abda2f2cb530b4b52f82d.camel@redhat.com> Message-ID: hi, Severin, Thanks for reviewing. Yes, it's irrelevant. I revert it. please check it out. https://cr.openjdk.java.net/~xliu/8222670/webrev.02/ Please note that I added an assertion InstanceKlass::add_osr_nmethod(nmethod* n) in this webrev. In my understanding, it is a potential memleak of codecache. If there's no higher level of osr compilation, those dups will stay in codecache forever. Further, it doesn?t make sense to recompile with the same level and same bci. With this assertion, the following tests in tier1-test failed. test/hotspot/jtreg/compiler/intrinsics/unsafe/DirectByteBufferTest.java test/hotspot/jtreg/compiler/intrinsics/unsafe/HeapByteBufferTest.java test/jdk/java/util/stream/test/org/openjdk/tests/java/util/stream/ToArrayOpTest.java test/jdk/tools/pack200/Pack200Test.java test/jdk/java/util/Arrays/SortingNearlySortedPrimitive.java All crashes happen as I described in JDK-8222670. Eg. duplicated OSR compilations occur for level2. Program received signal SIGSEGV, Segmentation fault. # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/instanceKlass.cpp:2972 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/src/src/hotspot/share/oops/instanceKlass.cpp:2972), pid=8347, tid=8361 # assert(prev == __null || !prev->is_in_use()) failed: redundunt OSR recompilation detected. memory leak in CodeCache! # # JRE version: OpenJDK Runtime Environment (13.0) (slowdebug build 13-internal+0-adhoc..src) # Java VM: OpenJDK 64-Bit Server VM (slowdebug 13-internal+0-adhoc..src, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xb3dbb4] InstanceKlass::add_osr_nmethod(nmethod*)+0xc4 # # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /build/JTwork/scratch/hs_err_pid8347.log Program received signal SIGSEGV, Segmentation fault. Compiled method (c1) 19032 752 % 2 ByteBufferTest::stepUsingAccessors @ 382 (633 bytes) total in heap [0x00007fffd8f9ff90,0x00007fffd8fac628] = 50840 relocation [0x00007fffd8fa0110,0x00007fffd8fa1388] = 4728 main code [0x00007fffd8fa13a0,0x00007fffd8fa7f80] = 27616 stub code [0x00007fffd8fa7f80,0x00007fffd8fa86c0] = 1856 oops [0x00007fffd8fa86c0,0x00007fffd8fa86c8] = 8 metadata [0x00007fffd8fa86c8,0x00007fffd8fa8800] = 312 scopes data [0x00007fffd8fa8800,0x00007fffd8fa9ff8] = 6136 scopes pcs [0x00007fffd8fa9ff8,0x00007fffd8fac408] = 9232 dependencies [0x00007fffd8fac408,0x00007fffd8fac418] = 16 nul chk table [0x00007fffd8fac418,0x00007fffd8fac628] = 528 Compiled method (c1) 19032 752 % 2 ByteBufferTest::stepUsingAccessors @ 382 (633 bytes) total in heap [0x00007fffd8f9ff90,0x00007fffd8fac628] = 50840 relocation [0x00007fffd8fa0110,0x00007fffd8fa1388] = 4728 main code [0x00007fffd8fa13a0,0x00007fffd8fa7f80] = 27616 stub code [0x00007fffd8fa7f80,0x00007fffd8fa86c0] = 1856 oops [0x00007fffd8fa86c0,0x00007fffd8fa86c8] = 8 metadata [0x00007fffd8fa86c8,0x00007fffd8fa8800] = 312 scopes data [0x00007fffd8fa8800,0x00007fffd8fa9ff8] = 6136 scopes pcs [0x00007fffd8fa9ff8,0x00007fffd8fac408] = 9232 dependencies [0x00007fffd8fac408,0x00007fffd8fac418] = 16 nul chk table [0x00007fffd8fac418,0x00007fffd8fac628] = 528 Thanks, --lx ?On 4/19/19, 9:31 AM, "Severin Gehwolf" wrote: On Thu, 2019-04-18 at 19:46 +0000, Liu, Xin wrote: > Hi, hotspot-compiler group, > > Could you review this webrev for JDK-8222670? > https://cr.openjdk.java.net/~xliu/8222670/webrev.01/ +++ new/test/hotspot/jtreg/compiler/tiered/TieredLevelsTest.java 2019-04-18 12:18:38.000000000 -0700 @@ -89,7 +89,7 @@ && actual == COMP_LEVEL_LIMITED_PROFILE) { // for simple method full_profile may be replaced by limited_profile if (IS_VERBOSE) { - System.out.printf("Level check: full profiling was replaced " + System.out.println("Level check: full profiling was replaced " + "by limited profiling. Expected: %d, actual:%d", expected, actual); This seems an unintended change, is it? Thanks, Severin From felix.yang at huawei.com Mon Apr 22 03:02:22 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 22 Apr 2019 03:02:22 +0000 Subject: RFR: 8222785: aarch64: add necessary masking for immediate shift counts Message-ID: Hi, Please review this patch adding necessary masking for immediate shift counts in aarch64.ad. Bug: https://bugs.openjdk.java.net/browse/JDK-8222785 Webrev: http://cr.openjdk.java.net/~fyang/webrev.00/ Previous discussion is here: https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2019-April/007173.html As C2 does not mask immediate shifts, it's necessary to mask immediate shift counts in aarch64.ad. Jtreg tested with an aarch64 fastdebug build. Thanks, Felix From aph at redhat.com Tue Apr 23 09:51:44 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 23 Apr 2019 10:51:44 +0100 Subject: [aarch64-port-dev ] RFR: 8222785: aarch64: add necessary masking for immediate shift counts In-Reply-To: References: Message-ID: On 4/22/19 4:02 AM, Yangfei (Felix) wrote: > Please review this patch adding necessary masking for immediate shift counts in aarch64.ad. > Bug: https://bugs.openjdk.java.net/browse/JDK-8222785 > Webrev: http://cr.openjdk.java.net/~fyang/webrev.00/ > Thank you, that's a very welcome tidy-up. I still don't really understand why C2 doesn't do the masking itself, but it's not worth pursuing. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From robbin.ehn at oracle.com Tue Apr 23 19:38:13 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 23 Apr 2019 21:38:13 +0200 Subject: RFR(s): 8222640: Remove deopt suspend In-Reply-To: References: <2f4eeddd-1480-4dab-8c64-187282586348@oracle.com> <70e28808-14a2-d063-9dcf-cf49ca3fa7f6@oracle.com> Message-ID: <8d4cffa7-9980-dabd-b446-be8a07efca7f@oracle.com> Hi Dean, Is this what you had in mind: diff -r 295029840379 src/hotspot/share/runtime/frame.cpp --- a/src/hotspot/share/runtime/frame.cpp Tue Apr 23 09:58:55 2019 +0200 +++ b/src/hotspot/share/runtime/frame.cpp Tue Apr 23 21:32:00 2019 +0200 @@ -272,4 +272,6 @@ void frame::deoptimize(JavaThread* thread) { + assert(thread->frame_anchor()->has_last_Java_frame() && + thread->frame_anchor()->walkable(), "must be"); // Schedule deoptimization of an nmethod activation with this frame. assert(_cb != NULL && _cb->is_compiled(), "must be"); Passes t1-5. v2: http://cr.openjdk.java.net/~rehn/8222640/2/webrev/ Inc: http://cr.openjdk.java.net/~rehn/8222640/2/inc/webrev/ Thanks, Robbin On 2019-04-18 06:22, dean.long at oracle.com wrote: > In frame::deoptimize(), can we assert that we have an anchor frame and that it is walkable? > > dl > > On 4/17/19 3:09 AM, Robbin Ehn wrote: >> Adding compiler. >> >> /Robbin >> >> On 4/17/19 10:35 AM, Robbin Ehn wrote: >>> Hi all, please consider this change. >>> >>> The code for deopt suspend is no longer needed since today the register window >>> is always flushed when this code executes. Exactly when this code was needed is not clear, entered via duke changeset >>> 1. I did not dig since we no longer have such use case. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~rehn/8222640/webrev/ >>> Issue: >>> https://bugs.openjdk.java.net/browse/JDK-8222640 >>> >>> Passes t1-5. >>> >>> Thanks, Robbin > From jesper.wilhelmsson at oracle.com Tue Apr 23 20:50:09 2019 From: jesper.wilhelmsson at oracle.com (jesper.wilhelmsson at oracle.com) Date: Tue, 23 Apr 2019 22:50:09 +0200 Subject: RFR: JDK-8221598 - Update Graal In-Reply-To: <2976f9cb-90f5-731e-f30f-ee354032d6f7@oracle.com> References: <2976f9cb-90f5-731e-f30f-ee354032d6f7@oracle.com> Message-ID: Thank you! /Jesper > On 19 Apr 2019, at 21:06, Vladimir Kozlov wrote: > > On 4/19/19 11:29 AM, dean.long at oracle.com wrote: >> I only see one failure in tiers 1-4, and it looks like JDK-8222550. > > Okay. We can push it then. > > Vladimir > >> dl >> On 4/19/19 10:15 AM, Vladimir Kozlov wrote: >>> Changes looks good. >>> >>> I looked on tests results and most of them are timeouts because Graal was run with -Xcomp. >>> I was not able to identify serious issues because there were >200 failed tests - difficult to search. >>> Someone have to look on results and see if there are new failures. >>> >>> Thanks, >>> Vladimir >>> >>> On 4/17/19 3:13 PM, jesper.wilhelmsson at oracle.com wrote: >>>> Hi, >>>> >>>> Please review the patch to integrate recent Graal changes into OpenJDK. >>>> Graal tip to integrate: 20f370437efb6b2a3f455a238da6141dc101d38c >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8221598 >>>> Webrev: http://cr.openjdk.java.net/~jwilhelm/8221598/webrev.00/ >>>> >>>> Thanks, >>>> /Jesper >>>> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From dean.long at oracle.com Tue Apr 23 21:17:42 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 23 Apr 2019 14:17:42 -0700 Subject: RFR(s): 8222640: Remove deopt suspend In-Reply-To: <8d4cffa7-9980-dabd-b446-be8a07efca7f@oracle.com> References: <2f4eeddd-1480-4dab-8c64-187282586348@oracle.com> <70e28808-14a2-d063-9dcf-cf49ca3fa7f6@oracle.com> <8d4cffa7-9980-dabd-b446-be8a07efca7f@oracle.com> Message-ID: <7010d33e-13d1-77a4-2a36-0cbe03209bed@oracle.com> Yes, looks good! dl On 4/23/19 12:38 PM, Robbin Ehn wrote: > Hi Dean, > > Is this what you had in mind: > diff -r 295029840379 src/hotspot/share/runtime/frame.cpp > --- a/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 09:58:55 > 2019 +0200 > +++ b/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 21:32:00 > 2019 +0200 > @@ -272,4 +272,6 @@ > > ?void frame::deoptimize(JavaThread* thread) { > +? assert(thread->frame_anchor()->has_last_Java_frame() && > +???????? thread->frame_anchor()->walkable(), "must be"); > ?? // Schedule deoptimization of an nmethod activation with this frame. > ?? assert(_cb != NULL && _cb->is_compiled(), "must be"); > > Passes t1-5. > > v2: > http://cr.openjdk.java.net/~rehn/8222640/2/webrev/ > Inc: > http://cr.openjdk.java.net/~rehn/8222640/2/inc/webrev/ > > Thanks, Robbin > > On 2019-04-18 06:22, dean.long at oracle.com wrote: >> In frame::deoptimize(), can we assert that we have an anchor frame >> and that it is walkable? >> >> dl >> >> On 4/17/19 3:09 AM, Robbin Ehn wrote: >>> Adding compiler. >>> >>> /Robbin >>> >>> On 4/17/19 10:35 AM, Robbin Ehn wrote: >>>> Hi all, please consider this change. >>>> >>>> The code for deopt suspend is no longer needed since today the >>>> register window >>>> is always flushed when this code executes. Exactly when this code >>>> was needed is not clear, entered via duke changeset 1. I did not >>>> dig since we no longer have such use case. >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~rehn/8222640/webrev/ >>>> Issue: >>>> https://bugs.openjdk.java.net/browse/JDK-8222640 >>>> >>>> Passes t1-5. >>>> >>>> Thanks, Robbin >> From robbin.ehn at oracle.com Tue Apr 23 21:32:12 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Tue, 23 Apr 2019 23:32:12 +0200 Subject: RFR(s): 8222640: Remove deopt suspend In-Reply-To: <7010d33e-13d1-77a4-2a36-0cbe03209bed@oracle.com> References: <2f4eeddd-1480-4dab-8c64-187282586348@oracle.com> <70e28808-14a2-d063-9dcf-cf49ca3fa7f6@oracle.com> <8d4cffa7-9980-dabd-b446-be8a07efca7f@oracle.com> <7010d33e-13d1-77a4-2a36-0cbe03209bed@oracle.com> Message-ID: <544b8a83-e5bb-eead-7797-70e08ca15ee8@oracle.com> Thanks Dean! /Robbin On 2019-04-23 23:17, dean.long at oracle.com wrote: > Yes, looks good! > > dl > > On 4/23/19 12:38 PM, Robbin Ehn wrote: >> Hi Dean, >> >> Is this what you had in mind: >> diff -r 295029840379 src/hotspot/share/runtime/frame.cpp >> --- a/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 09:58:55 2019 +0200 >> +++ b/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 21:32:00 2019 +0200 >> @@ -272,4 +272,6 @@ >> >> ?void frame::deoptimize(JavaThread* thread) { >> +? assert(thread->frame_anchor()->has_last_Java_frame() && >> +???????? thread->frame_anchor()->walkable(), "must be"); >> ?? // Schedule deoptimization of an nmethod activation with this frame. >> ?? assert(_cb != NULL && _cb->is_compiled(), "must be"); >> >> Passes t1-5. >> >> v2: >> http://cr.openjdk.java.net/~rehn/8222640/2/webrev/ >> Inc: >> http://cr.openjdk.java.net/~rehn/8222640/2/inc/webrev/ >> >> Thanks, Robbin >> >> On 2019-04-18 06:22, dean.long at oracle.com wrote: >>> In frame::deoptimize(), can we assert that we have an anchor frame and that it is walkable? >>> >>> dl >>> >>> On 4/17/19 3:09 AM, Robbin Ehn wrote: >>>> Adding compiler. >>>> >>>> /Robbin >>>> >>>> On 4/17/19 10:35 AM, Robbin Ehn wrote: >>>>> Hi all, please consider this change. >>>>> >>>>> The code for deopt suspend is no longer needed since today the register window >>>>> is always flushed when this code executes. Exactly when this code was needed is not clear, entered via duke >>>>> changeset 1. I did not dig since we no longer have such use case. >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~rehn/8222640/webrev/ >>>>> Issue: >>>>> https://bugs.openjdk.java.net/browse/JDK-8222640 >>>>> >>>>> Passes t1-5. >>>>> >>>>> Thanks, Robbin >>> > From aph at redhat.com Wed Apr 24 08:04:23 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 24 Apr 2019 09:04:23 +0100 Subject: [11u] RFR 8188133: C2: Static field accesses in clinit can trigger deoptimizations In-Reply-To: <56a46cfd-14c4-e1c6-c76b-542f6f0c8504@redhat.com> References: <56a46cfd-14c4-e1c6-c76b-542f6f0c8504@redhat.com> Message-ID: <75959e53-e8da-8f26-b48e-51eb99b833f3@redhat.com> On 4/16/19 12:38 PM, Aleksey Shipilev wrote: > Original bug: > https://bugs.openjdk.java.net/browse/JDK-8188133 > > Original fix: > http://hg.openjdk.java.net/jdk/jdk/rev/d620a4a1d5ed > > The patch does not apply to 11u cleanly due to a different patch context in bytecodeInfo.cpp. The > changed lines in the patch itself seem to be the same as the original. > > 11u webrev: > http://cr.openjdk.java.net/~shade/8188133/wevrev.11u.01/ OK, thanks. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rkennke at redhat.com Wed Apr 24 09:03:24 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 24 Apr 2019 11:03:24 +0200 Subject: RFR(S): 8222738: Shenandoah: assert(is_Proj()) failed when running cometd benchmarks In-Reply-To: <87zhonnwoq.fsf@redhat.com> References: <87zhonnwoq.fsf@redhat.com> Message-ID: <0f1a9600-2f2d-f360-9bc5-aa44f49d8990@redhat.com> Looks good to me, thanks! Roman > http://cr.openjdk.java.net/~roland/8222738/webrev.00/ > > The failure occurs because Shenandoah barrier expansion code tries to > expand a barrier whose control is a call node. That happens because one > of the uses of the barrier is a CastPP whose control dominates the call > while one its input is dependent on the return from the call. That, in > turn, occurs because the null check that the CastPP depends on is > optimized out by ConnectionGraph::optimize_ptr_compare() during EA. Then > the CastPP depends on another unrelated check and further optimization > of that check causes the CastPP control to change to something that > dominates the call. > > Barrier expansion already has logic to deal with a barrier on the > control projection of a call because it's shared between the exception > and fallthrough paths. The fix I propose is to simply piggy back on that > logic (by making clones for the exception and fallthrough paths). The > conditions under which this particular bug shows up seem rare enough > that going with the simplest fix is the wisest thing to do even though > it causes a barrier to be emitted in both the exception path and the > fallthrough path when it's possible only one of those paths really needs > a barrier. > > Roland. > From rahul.v.raghavan at oracle.com Wed Apr 24 12:22:55 2019 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Wed, 24 Apr 2019 17:52:55 +0530 Subject: [13] RFR: 8202414: Unsafe write after primitive array creation may result in array length change In-Reply-To: <18115aa8-edaa-31b9-02a6-06721d9fbfc9@oracle.com> References: <7e900022-4e16-2ab9-1f4d-89e1510e2646@oracle.com> <392c665f-869c-29af-4fc5-e6f844820846@oracle.com> <3db5d7ab-ad99-310b-e891-fc36d25da338@oracle.com> <7b03a213-7fee-a87f-b48d-250662e730ef@oracle.com> <959abf54-d1da-95ee-9cf6-6c6d8ec5e4a1@oracle.com> <18115aa8-edaa-31b9-02a6-06721d9fbfc9@oracle.com> Message-ID: <939f3f5d-b8e7-939f-8953-d34a0f3ff6c9@oracle.com> Thank you Vladimir Kozlov, Vladimir Ivanov, Dean Long for the comments and suggestions. Please review the following points -- simple fix in InitializeNode::can_capture_store() as suggested by Vladimir Ivanov: >> "It seems the problem is due to mismatched unsafe store being captured as a initializing one. Why not check for it explicitly? if (st->is_unaligned_access() || st->is_mismatched_access()) { return FAIL; }" > "I don't think we can use is_mismatched_access(), because we seem to have the same problem even if int[] is used." Yes, reconfirmed assert failure for this fix. # assert((end_offset % BytesPerInt) == 0) failed: odd end offset -- New fix in InitializeNode::complete_stores() as suggested by Dean. https://bugs.openjdk.java.net/browse/JDK-8202414?focusedCommentId=14254276&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14254276 Found following assert failure for this fix proposal, with the new Test8202414.java test (similar to the case with earlier initial fix proposal from Vladimir Kozlov) # assert(!do_zeroing || zeroes_done >= next_init_off) failed: don't miss any Could not find another correct working fix in InitializeNode::complete_stores(). -- Latest webrev with last working fix in InitializeNode::can_capture_store() with updated Test8202414: - http://cr.openjdk.java.net/~rraghavan/8202414/webrev.03/ Could not find any failure issues with this fix. So can we go ahead with this fix in InitializeNode::can_capture_store() itself? Thanks, Rahul On 03/04/19 5:32 AM, Vladimir Ivanov wrote: > >> I agree that we need better regression tests if we go this route. Do >> we have enough regression tests for the is_unaligned_access() case to >> enable that optimization first? > > I haven't done any extensive research, but I believe existing tests > provide poor coverage for initializing stores. The tests I encountered > under test/hotspot/jtreg/compiler/unsafe/ don't look applicable here. > > Best regards, > Vladimir Ivanov > >>> Forbidding mismatched accesses in InitializeNode::can_capture_store >>> (both marked as such and based on actual offset) looks like a safer >>> fix to me: it keeps InitializeNode::complete_stores() exposed only to >>> well-behaved accessed. >>> >>> How much do we lose by not capturing mismatched/unaligned initialized >>> stores? Does it worth optimizing for it? >>> >> >> It does seem like it would be rare that optimizing it would make a >> difference, unless we had a microbenchmark that focuses on it. >> >> dl >> >>> Best regards, >>> Vladimir Ivanov >>> >>>> On 3/27/19 10:12 AM, Vladimir Ivanov wrote: >>>>> First, I'd like to note that it's a good practice to include >>>>> problem & root cause descriptions in the request. Otherwise, >>>>> reviewers have to find that information themselves which >>>>> complicates review process. >>>>> >>>>> (In this particular case, I found some analysis from the submitter >>>>> [1] in the bug only after carefully reading through it.) >>>>> >>>>> On 27/03/2019 06:44, Rahul Raghavan wrote: >>>>>> Hi, >>>>>> >>>>>> Thank you Vladimir. >>>>>> >>>>>> Yes, tried following fix. >>>>>> (needed to add checks to avoid SIGFPE crash). >>>>>> >>>>>> +? int size_in_bytes = st->memory_size(); >>>>>> +? if ((size_in_bytes != 0) && (get_store_offset(st, phase) % >>>>>> size_in_bytes) != 0) { >>>>>> +??? return FAIL; >>>>>> +? } >>>>>> >>>>>> >>>>>> - >>>>>> http://cr.openjdk.java.net/~rraghavan/8202414/webrev.02/ >>>>> >>>>> It seems the problem is due to mismatched unsafe store being >>>>> captured as a initializing one. Why not check for it explicitly? >>>>> >>>>> ?? if (st->is_unaligned_access() || st->is_mismatched_access()) { >>>>> ???? return FAIL; >>>>> ?? } >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>> >>>>> [1] >>>>> >>>>> For your convenience, our analysis shows the problem may relate to >>>>> array InitializeNode logic. >>>>> It `capture_store` the the memory write of Unsafe.putInt. >>>>> Since the putInt occupied offset range [17, 21] from the array >>>>> pointer, >>>>> then it decided to `clear_memory` of offset range [16, 17] of the >>>>> array pointer. >>>>> This range actually cannot pass the assert "assert((end_offset % >>>>> BytesPerInt) == 0, "odd end offset")". >>>>> While in jvm product mode, without the assert, the compiler falsely >>>>> calculated to clear range [13, 17], >>>>> which will clear the three most significant bytes of the `length` >>>>> of this array. >>>>> >>>>> >>>>>> >>>>>> Confirmed no issues with testing for this revised fix. >>>>>> >>>>>> Thanks, >>>>>> Rahul >>>>>> >>>>>> On 26/03/19 1:03 AM, Vladimir Kozlov wrote: >>>>>>> >>>>>>> Suggestion: >>>>>>> >>>>>>> if ((get_store_offset(st, phase) % st->memory_size()) != 0) { >>>>>>> >>>>>>> Vladimir >>>>>>> >>>>>>> >>>> >> From dean.long at oracle.com Wed Apr 24 18:26:35 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 24 Apr 2019 11:26:35 -0700 Subject: [13] RFR: 8202414: Unsafe write after primitive array creation may result in array length change In-Reply-To: <939f3f5d-b8e7-939f-8953-d34a0f3ff6c9@oracle.com> References: <7e900022-4e16-2ab9-1f4d-89e1510e2646@oracle.com> <392c665f-869c-29af-4fc5-e6f844820846@oracle.com> <3db5d7ab-ad99-310b-e891-fc36d25da338@oracle.com> <7b03a213-7fee-a87f-b48d-250662e730ef@oracle.com> <959abf54-d1da-95ee-9cf6-6c6d8ec5e4a1@oracle.com> <18115aa8-edaa-31b9-02a6-06721d9fbfc9@oracle.com> <939f3f5d-b8e7-939f-8953-d34a0f3ff6c9@oracle.com> Message-ID: <33fb4ceb-b215-d8d2-3927-2ce3194803ca@oracle.com> That seems like the safest fix for now. dl On 4/24/19 5:22 AM, Rahul Raghavan wrote: > Thank you Vladimir Kozlov, Vladimir Ivanov, Dean Long for the comments > and suggestions. > > Please review the following points > > -- simple fix in InitializeNode::can_capture_store() as suggested by > Vladimir Ivanov: > ? >> "It seems the problem is due to mismatched unsafe store being > captured as a initializing one. Why not check for it explicitly? > ? if (st->is_unaligned_access() || st->is_mismatched_access()) { > ??? return FAIL; > ? }" > ? > "I don't think we can use is_mismatched_access(), because we seem > to have the same problem even if int[] is used." > > Yes, reconfirmed assert failure for this fix. > #? assert((end_offset % BytesPerInt) == 0) failed: odd end offset > > > > -- New fix in InitializeNode::complete_stores() as suggested by Dean. > https://bugs.openjdk.java.net/browse/JDK-8202414?focusedCommentId=14254276&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14254276 > > > Found following assert failure for this fix proposal, with the new > Test8202414.java test (similar to the case with earlier initial fix > proposal from Vladimir Kozlov) > #? assert(!do_zeroing || zeroes_done >= next_init_off) failed: don't > miss any > > Could not find another correct working fix in > InitializeNode::complete_stores(). > > > > -- Latest webrev with last working fix in > InitializeNode::can_capture_store() with updated Test8202414: > > - http://cr.openjdk.java.net/~rraghavan/8202414/webrev.03/ > > Could not find any failure issues with this fix. > So can we go ahead with this fix in > InitializeNode::can_capture_store() itself? > > > Thanks, > Rahul > > > On 03/04/19 5:32 AM, Vladimir Ivanov wrote: >> >>> I agree that we need better regression tests if we go this route. Do >>> we have enough regression tests for the is_unaligned_access() case >>> to enable that optimization first? >> >> I haven't done any extensive research, but I believe existing tests >> provide poor coverage for initializing stores. The tests I >> encountered under test/hotspot/jtreg/compiler/unsafe/ don't look >> applicable here. >> >> Best regards, >> Vladimir Ivanov >> >>>> Forbidding mismatched accesses in InitializeNode::can_capture_store >>>> (both marked as such and based on actual offset) looks like a safer >>>> fix to me: it keeps InitializeNode::complete_stores() exposed only >>>> to well-behaved accessed. >>>> >>>> How much do we lose by not capturing mismatched/unaligned >>>> initialized stores? Does it worth optimizing for it? >>>> >>> >>> It does seem like it would be rare that optimizing it would make a >>> difference, unless we had a microbenchmark that focuses on it. >>> >>> dl >>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>> On 3/27/19 10:12 AM, Vladimir Ivanov wrote: >>>>>> First, I'd like to note that it's a good practice to include >>>>>> problem & root cause descriptions in the request. Otherwise, >>>>>> reviewers have to find that information themselves which >>>>>> complicates review process. >>>>>> >>>>>> (In this particular case, I found some analysis from the >>>>>> submitter [1] in the bug only after carefully reading through it.) >>>>>> >>>>>> On 27/03/2019 06:44, Rahul Raghavan wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Thank you Vladimir. >>>>>>> >>>>>>> Yes, tried following fix. >>>>>>> (needed to add checks to avoid SIGFPE crash). >>>>>>> >>>>>>> +? int size_in_bytes = st->memory_size(); >>>>>>> +? if ((size_in_bytes != 0) && (get_store_offset(st, phase) % >>>>>>> size_in_bytes) != 0) { >>>>>>> +??? return FAIL; >>>>>>> +? } >>>>>>> >>>>>>> >>>>>>> - >>>>>>> http://cr.openjdk.java.net/~rraghavan/8202414/webrev.02/ >>>>>> >>>>>> It seems the problem is due to mismatched unsafe store being >>>>>> captured as a initializing one. Why not check for it explicitly? >>>>>> >>>>>> ?? if (st->is_unaligned_access() || st->is_mismatched_access()) { >>>>>> ???? return FAIL; >>>>>> ?? } >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>> >>>>>> [1] >>>>>> >>>>>> For your convenience, our analysis shows the problem may relate >>>>>> to array InitializeNode logic. >>>>>> It `capture_store` the the memory write of Unsafe.putInt. >>>>>> Since the putInt occupied offset range [17, 21] from the array >>>>>> pointer, >>>>>> then it decided to `clear_memory` of offset range [16, 17] of the >>>>>> array pointer. >>>>>> This range actually cannot pass the assert "assert((end_offset % >>>>>> BytesPerInt) == 0, "odd end offset")". >>>>>> While in jvm product mode, without the assert, the compiler >>>>>> falsely calculated to clear range [13, 17], >>>>>> which will clear the three most significant bytes of the `length` >>>>>> of this array. >>>>>> >>>>>> >>>>>>> >>>>>>> Confirmed no issues with testing for this revised fix. >>>>>>> >>>>>>> Thanks, >>>>>>> Rahul >>>>>>> >>>>>>> On 26/03/19 1:03 AM, Vladimir Kozlov wrote: >>>>>>>> >>>>>>>> Suggestion: >>>>>>>> >>>>>>>> if ((get_store_offset(st, phase) % st->memory_size()) != 0) { >>>>>>>> >>>>>>>> Vladimir >>>>>>>> >>>>>>>> >>>>> >>> From vladimir.x.ivanov at oracle.com Wed Apr 24 19:06:18 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 24 Apr 2019 12:06:18 -0700 Subject: [13] RFR: 8202414: Unsafe write after primitive array creation may result in array length change In-Reply-To: <939f3f5d-b8e7-939f-8953-d34a0f3ff6c9@oracle.com> References: <7e900022-4e16-2ab9-1f4d-89e1510e2646@oracle.com> <392c665f-869c-29af-4fc5-e6f844820846@oracle.com> <3db5d7ab-ad99-310b-e891-fc36d25da338@oracle.com> <7b03a213-7fee-a87f-b48d-250662e730ef@oracle.com> <959abf54-d1da-95ee-9cf6-6c6d8ec5e4a1@oracle.com> <18115aa8-edaa-31b9-02a6-06721d9fbfc9@oracle.com> <939f3f5d-b8e7-939f-8953-d34a0f3ff6c9@oracle.com> Message-ID: <259ef902-778b-7eef-46e2-d1927950d21c@oracle.com> > - http://cr.openjdk.java.net/~rraghavan/8202414/webrev.03/ A couple of suggestions: src/hotspot/share/opto/memnode.cpp: + int size_in_bytes = st->memory_size(); + if ((size_in_bytes != 0) && (get_store_offset(st, phase) % size_in_bytes) != 0) { + return FAIL; + } I'd just add "if (st->is_mismatched_access()) { return FAIL; }" in IN::can_capture_store and move the rest into IN::captured_store_insertion_point(). It looks like InitializeNode::captured_store_insertion_point() is a better place to inspect get_store_offset() and fail on mismatch. It also handles negative values returned by get_store_offset() and has a consistent handling of "size_in_bytes == 0" case. PS: Frankly speaking, I'm not comfortable with current handling of "size_in_bytes == 0" case: // If size_in_bytes is zero, do not bother with overlap checks. int InitializeNode::captured_store_insertion_point(intptr_t start, int size_in_bytes, PhaseTransform* phase) { "size_in_bytes == 0" is the case for StoreVectorNodes. I believe the logic is there to accommodate bulk initialization with vector stores (haven't found the proofs in the code though), but it looks fragile (especially when vector operations become more common). It would be nice to enable overlap checks for vector operations. But I'm perfectly fine with handling that separately. Best regards, Vladimir Ivanov > On 03/04/19 5:32 AM, Vladimir Ivanov wrote: >> >>> I agree that we need better regression tests if we go this route. Do >>> we have enough regression tests for the is_unaligned_access() case to >>> enable that optimization first? >> >> I haven't done any extensive research, but I believe existing tests >> provide poor coverage for initializing stores. The tests I encountered >> under test/hotspot/jtreg/compiler/unsafe/ don't look applicable here. >> >> Best regards, >> Vladimir Ivanov >> >>>> Forbidding mismatched accesses in InitializeNode::can_capture_store >>>> (both marked as such and based on actual offset) looks like a safer >>>> fix to me: it keeps InitializeNode::complete_stores() exposed only >>>> to well-behaved accessed. >>>> >>>> How much do we lose by not capturing mismatched/unaligned >>>> initialized stores? Does it worth optimizing for it? >>>> >>> >>> It does seem like it would be rare that optimizing it would make a >>> difference, unless we had a microbenchmark that focuses on it. >>> >>> dl >>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>> On 3/27/19 10:12 AM, Vladimir Ivanov wrote: >>>>>> First, I'd like to note that it's a good practice to include >>>>>> problem & root cause descriptions in the request. Otherwise, >>>>>> reviewers have to find that information themselves which >>>>>> complicates review process. >>>>>> >>>>>> (In this particular case, I found some analysis from the submitter >>>>>> [1] in the bug only after carefully reading through it.) >>>>>> >>>>>> On 27/03/2019 06:44, Rahul Raghavan wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Thank you Vladimir. >>>>>>> >>>>>>> Yes, tried following fix. >>>>>>> (needed to add checks to avoid SIGFPE crash). >>>>>>> >>>>>>> +? int size_in_bytes = st->memory_size(); >>>>>>> +? if ((size_in_bytes != 0) && (get_store_offset(st, phase) % >>>>>>> size_in_bytes) != 0) { >>>>>>> +??? return FAIL; >>>>>>> +? } >>>>>>> >>>>>>> >>>>>>> - >>>>>>> http://cr.openjdk.java.net/~rraghavan/8202414/webrev.02/ >>>>>> >>>>>> It seems the problem is due to mismatched unsafe store being >>>>>> captured as a initializing one. Why not check for it explicitly? >>>>>> >>>>>> ?? if (st->is_unaligned_access() || st->is_mismatched_access()) { >>>>>> ???? return FAIL; >>>>>> ?? } >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>> >>>>>> [1] >>>>>> >>>>>> For your convenience, our analysis shows the problem may relate to >>>>>> array InitializeNode logic. >>>>>> It `capture_store` the the memory write of Unsafe.putInt. >>>>>> Since the putInt occupied offset range [17, 21] from the array >>>>>> pointer, >>>>>> then it decided to `clear_memory` of offset range [16, 17] of the >>>>>> array pointer. >>>>>> This range actually cannot pass the assert "assert((end_offset % >>>>>> BytesPerInt) == 0, "odd end offset")". >>>>>> While in jvm product mode, without the assert, the compiler >>>>>> falsely calculated to clear range [13, 17], >>>>>> which will clear the three most significant bytes of the `length` >>>>>> of this array. >>>>>> >>>>>> >>>>>>> >>>>>>> Confirmed no issues with testing for this revised fix. >>>>>>> >>>>>>> Thanks, >>>>>>> Rahul >>>>>>> >>>>>>> On 26/03/19 1:03 AM, Vladimir Kozlov wrote: >>>>>>>> >>>>>>>> Suggestion: >>>>>>>> >>>>>>>> if ((get_store_offset(st, phase) % st->memory_size()) != 0) { >>>>>>>> >>>>>>>> Vladimir >>>>>>>> >>>>>>>> >>>>> >>> From robbin.ehn at oracle.com Thu Apr 25 08:53:57 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 25 Apr 2019 10:53:57 +0200 Subject: RFR(s): 8222637: Obsolete NeedsDeoptSuspend (was RFR(s): 8222640: Remove deopt suspend) In-Reply-To: <45681477-9a04-a623-6686-c72b36be6c38@oracle.com> References: <2f4eeddd-1480-4dab-8c64-187282586348@oracle.com> <70e28808-14a2-d063-9dcf-cf49ca3fa7f6@oracle.com> <8d4cffa7-9980-dabd-b446-be8a07efca7f@oracle.com> <7010d33e-13d1-77a4-2a36-0cbe03209bed@oracle.com> <544b8a83-e5bb-eead-7797-70e08ca15ee8@oracle.com> <45681477-9a04-a623-6686-c72b36be6c38@oracle.com> Message-ID: Hi, The same patch as in 8222640 but with obsoleting of the flag also. Issue: https://bugs.openjdk.java.net/browse/JDK-8222637 CSR: https://bugs.openjdk.java.net/browse/JDK-8222639 The incremental change is thus: http://cr.openjdk.java.net/~rehn/8222637/webrev/src/hotspot/share/runtime/globals.hpp.sdiff.html http://cr.openjdk.java.net/~rehn/8222637/webrev/src/hotspot/share/runtime/arguments.cpp.sdiff.html Full: http://cr.openjdk.java.net/~rehn/8222637/webrev/ Dead and Coleen had previously review 8222640, so if they can acknowledge this inc change. Thanks, Robbin On 4/24/19 1:49 AM, Robbin Ehn wrote: > Thanks Coleen! > > /Robbin > > On 2019-04-24 00:47, coleen.phillimore at oracle.com wrote: >> +1? This looks good! >> Coleen >> >> On 4/23/19 5:32 PM, Robbin Ehn wrote: >>> Thanks Dean! >>> >>> /Robbin >>> >>> On 2019-04-23 23:17, dean.long at oracle.com wrote: >>>> Yes, looks good! >>>> >>>> dl >>>> >>>> On 4/23/19 12:38 PM, Robbin Ehn wrote: >>>>> Hi Dean, >>>>> >>>>> Is this what you had in mind: >>>>> diff -r 295029840379 src/hotspot/share/runtime/frame.cpp >>>>> --- a/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 09:58:55 2019 +0200 >>>>> +++ b/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 21:32:00 2019 +0200 >>>>> @@ -272,4 +272,6 @@ >>>>> >>>>> ?void frame::deoptimize(JavaThread* thread) { >>>>> +? assert(thread->frame_anchor()->has_last_Java_frame() && >>>>> +???????? thread->frame_anchor()->walkable(), "must be"); >>>>> ?? // Schedule deoptimization of an nmethod activation with this frame. >>>>> ?? assert(_cb != NULL && _cb->is_compiled(), "must be"); >>>>> >>>>> Passes t1-5. >>>>> >>>>> v2: >>>>> http://cr.openjdk.java.net/~rehn/8222640/2/webrev/ >>>>> Inc: >>>>> http://cr.openjdk.java.net/~rehn/8222640/2/inc/webrev/ >>>>> >>>>> Thanks, Robbin >>>>> >>>>> On 2019-04-18 06:22, dean.long at oracle.com wrote: >>>>>> In frame::deoptimize(), can we assert that we have an anchor frame and >>>>>> that it is walkable? >>>>>> >>>>>> dl >>>>>> >>>>>> On 4/17/19 3:09 AM, Robbin Ehn wrote: >>>>>>> Adding compiler. >>>>>>> >>>>>>> /Robbin >>>>>>> >>>>>>> On 4/17/19 10:35 AM, Robbin Ehn wrote: >>>>>>>> Hi all, please consider this change. >>>>>>>> >>>>>>>> The code for deopt suspend is no longer needed since today the register >>>>>>>> window >>>>>>>> is always flushed when this code executes. Exactly when this code was >>>>>>>> needed is not clear, entered via duke changeset 1. I did not dig since >>>>>>>> we no longer have such use case. >>>>>>>> >>>>>>>> Webrev: >>>>>>>> http://cr.openjdk.java.net/~rehn/8222640/webrev/ >>>>>>>> Issue: >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8222640 >>>>>>>> >>>>>>>> Passes t1-5. >>>>>>>> >>>>>>>> Thanks, Robbin >>>>>> >>>> >> From david.holmes at oracle.com Thu Apr 25 10:48:45 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 25 Apr 2019 20:48:45 +1000 Subject: RFR(s): 8222637: Obsolete NeedsDeoptSuspend (was RFR(s): 8222640: Remove deopt suspend) In-Reply-To: References: <2f4eeddd-1480-4dab-8c64-187282586348@oracle.com> <70e28808-14a2-d063-9dcf-cf49ca3fa7f6@oracle.com> <8d4cffa7-9980-dabd-b446-be8a07efca7f@oracle.com> <7010d33e-13d1-77a4-2a36-0cbe03209bed@oracle.com> <544b8a83-e5bb-eead-7797-70e08ca15ee8@oracle.com> <45681477-9a04-a623-6686-c72b36be6c38@oracle.com> Message-ID: <9d09a876-a520-6218-a781-d711cdf58bc1@oracle.com> Looks good Robbin! Nice to see things simplified. Thanks, David On 25/04/2019 6:53 pm, Robbin Ehn wrote: > Hi, > > The same patch as in 8222640 but with obsoleting of the flag also. > > Issue: > https://bugs.openjdk.java.net/browse/JDK-8222637 > CSR: > https://bugs.openjdk.java.net/browse/JDK-8222639 > > The incremental change is thus: > http://cr.openjdk.java.net/~rehn/8222637/webrev/src/hotspot/share/runtime/globals.hpp.sdiff.html > > http://cr.openjdk.java.net/~rehn/8222637/webrev/src/hotspot/share/runtime/arguments.cpp.sdiff.html > > > Full: > http://cr.openjdk.java.net/~rehn/8222637/webrev/ > > Dead and Coleen had previously review 8222640, so if they can > acknowledge this inc change. > > Thanks, Robbin > > On 4/24/19 1:49 AM, Robbin Ehn wrote: >> Thanks Coleen! >> >> /Robbin >> >> On 2019-04-24 00:47, coleen.phillimore at oracle.com wrote: >>> +1? This looks good! >>> Coleen >>> >>> On 4/23/19 5:32 PM, Robbin Ehn wrote: >>>> Thanks Dean! >>>> >>>> /Robbin >>>> >>>> On 2019-04-23 23:17, dean.long at oracle.com wrote: >>>>> Yes, looks good! >>>>> >>>>> dl >>>>> >>>>> On 4/23/19 12:38 PM, Robbin Ehn wrote: >>>>>> Hi Dean, >>>>>> >>>>>> Is this what you had in mind: >>>>>> diff -r 295029840379 src/hotspot/share/runtime/frame.cpp >>>>>> --- a/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 >>>>>> 09:58:55 2019 +0200 >>>>>> +++ b/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 >>>>>> 21:32:00 2019 +0200 >>>>>> @@ -272,4 +272,6 @@ >>>>>> >>>>>> ?void frame::deoptimize(JavaThread* thread) { >>>>>> +? assert(thread->frame_anchor()->has_last_Java_frame() && >>>>>> +???????? thread->frame_anchor()->walkable(), "must be"); >>>>>> ?? // Schedule deoptimization of an nmethod activation with this >>>>>> frame. >>>>>> ?? assert(_cb != NULL && _cb->is_compiled(), "must be"); >>>>>> >>>>>> Passes t1-5. >>>>>> >>>>>> v2: >>>>>> http://cr.openjdk.java.net/~rehn/8222640/2/webrev/ >>>>>> Inc: >>>>>> http://cr.openjdk.java.net/~rehn/8222640/2/inc/webrev/ >>>>>> >>>>>> Thanks, Robbin >>>>>> >>>>>> On 2019-04-18 06:22, dean.long at oracle.com wrote: >>>>>>> In frame::deoptimize(), can we assert that we have an anchor >>>>>>> frame and that it is walkable? >>>>>>> >>>>>>> dl >>>>>>> >>>>>>> On 4/17/19 3:09 AM, Robbin Ehn wrote: >>>>>>>> Adding compiler. >>>>>>>> >>>>>>>> /Robbin >>>>>>>> >>>>>>>> On 4/17/19 10:35 AM, Robbin Ehn wrote: >>>>>>>>> Hi all, please consider this change. >>>>>>>>> >>>>>>>>> The code for deopt suspend is no longer needed since today the >>>>>>>>> register window >>>>>>>>> is always flushed when this code executes. Exactly when this >>>>>>>>> code was needed is not clear, entered via duke changeset 1. I >>>>>>>>> did not dig since we no longer have such use case. >>>>>>>>> >>>>>>>>> Webrev: >>>>>>>>> http://cr.openjdk.java.net/~rehn/8222640/webrev/ >>>>>>>>> Issue: >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8222640 >>>>>>>>> >>>>>>>>> Passes t1-5. >>>>>>>>> >>>>>>>>> Thanks, Robbin >>>>>>> >>>>> >>> From robbin.ehn at oracle.com Thu Apr 25 12:05:28 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 25 Apr 2019 14:05:28 +0200 Subject: RFR(m): 8221734: Deoptimize with handshakes Message-ID: <89b00912-1f84-3458-d53b-fbe6d372affe@oracle.com> Hi all, please review. Let's deopt with handshakes. Removed VM op Deoptimize, instead we handshake. Locks needs to be inflate since we are not in a safepoint. Goes on top of: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-April/033491.html Code: http://cr.openjdk.java.net/~rehn/8221734/v1/webrev/index.html Issue: https://bugs.openjdk.java.net/browse/JDK-8221734 Passes t1-7 and multiple t1-5 runs. A few startup benchmark see a small speedup. Thanks, Robbin From robbin.ehn at oracle.com Thu Apr 25 12:07:24 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 25 Apr 2019 14:07:24 +0200 Subject: RFR(s): 8222637: Obsolete NeedsDeoptSuspend (was RFR(s): 8222640: Remove deopt suspend) In-Reply-To: <9d09a876-a520-6218-a781-d711cdf58bc1@oracle.com> References: <2f4eeddd-1480-4dab-8c64-187282586348@oracle.com> <70e28808-14a2-d063-9dcf-cf49ca3fa7f6@oracle.com> <8d4cffa7-9980-dabd-b446-be8a07efca7f@oracle.com> <7010d33e-13d1-77a4-2a36-0cbe03209bed@oracle.com> <544b8a83-e5bb-eead-7797-70e08ca15ee8@oracle.com> <45681477-9a04-a623-6686-c72b36be6c38@oracle.com> <9d09a876-a520-6218-a781-d711cdf58bc1@oracle.com> Message-ID: <2b64144a-62bb-bdb6-77d2-196bb2936668@oracle.com> Thanks Coleen! /Robbin Ops, s/Dead/Dean/ , sorry :) On 4/25/19 12:48 PM, wrote: > Looks good Robbin! > > Nice to see things simplified. > > Thanks, > David > > On 25/04/2019 6:53 pm, Robbin Ehn wrote: >> Hi, >> >> The same patch as in 8222640 but with obsoleting of the flag also. >> >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8222637 >> CSR: >> https://bugs.openjdk.java.net/browse/JDK-8222639 >> >> The incremental change is thus: >> http://cr.openjdk.java.net/~rehn/8222637/webrev/src/hotspot/share/runtime/globals.hpp.sdiff.html >> >> http://cr.openjdk.java.net/~rehn/8222637/webrev/src/hotspot/share/runtime/arguments.cpp.sdiff.html >> >> >> Full: >> http://cr.openjdk.java.net/~rehn/8222637/webrev/ >> >> Dead and Coleen had previously review 8222640, so if they can acknowledge this >> inc change. >> >> Thanks, Robbin >> >> On 4/24/19 1:49 AM, Robbin Ehn wrote: >>> Thanks Coleen! >>> >>> /Robbin >>> >>> On 2019-04-24 00:47, coleen.phillimore at oracle.com wrote: >>>> +1? This looks good! >>>> Coleen >>>> >>>> On 4/23/19 5:32 PM, Robbin Ehn wrote: >>>>> Thanks Dean! >>>>> >>>>> /Robbin >>>>> >>>>> On 2019-04-23 23:17, dean.long at oracle.com wrote: >>>>>> Yes, looks good! >>>>>> >>>>>> dl >>>>>> >>>>>> On 4/23/19 12:38 PM, Robbin Ehn wrote: >>>>>>> Hi Dean, >>>>>>> >>>>>>> Is this what you had in mind: >>>>>>> diff -r 295029840379 src/hotspot/share/runtime/frame.cpp >>>>>>> --- a/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 09:58:55 2019 >>>>>>> +0200 >>>>>>> +++ b/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 21:32:00 2019 >>>>>>> +0200 >>>>>>> @@ -272,4 +272,6 @@ >>>>>>> >>>>>>> ?void frame::deoptimize(JavaThread* thread) { >>>>>>> +? assert(thread->frame_anchor()->has_last_Java_frame() && >>>>>>> +???????? thread->frame_anchor()->walkable(), "must be"); >>>>>>> ?? // Schedule deoptimization of an nmethod activation with this frame. >>>>>>> ?? assert(_cb != NULL && _cb->is_compiled(), "must be"); >>>>>>> >>>>>>> Passes t1-5. >>>>>>> >>>>>>> v2: >>>>>>> http://cr.openjdk.java.net/~rehn/8222640/2/webrev/ >>>>>>> Inc: >>>>>>> http://cr.openjdk.java.net/~rehn/8222640/2/inc/webrev/ >>>>>>> >>>>>>> Thanks, Robbin >>>>>>> >>>>>>> On 2019-04-18 06:22, dean.long at oracle.com wrote: >>>>>>>> In frame::deoptimize(), can we assert that we have an anchor frame and >>>>>>>> that it is walkable? >>>>>>>> >>>>>>>> dl >>>>>>>> >>>>>>>> On 4/17/19 3:09 AM, Robbin Ehn wrote: >>>>>>>>> Adding compiler. >>>>>>>>> >>>>>>>>> /Robbin >>>>>>>>> >>>>>>>>> On 4/17/19 10:35 AM, Robbin Ehn wrote: >>>>>>>>>> Hi all, please consider this change. >>>>>>>>>> >>>>>>>>>> The code for deopt suspend is no longer needed since today the >>>>>>>>>> register window >>>>>>>>>> is always flushed when this code executes. Exactly when this code was >>>>>>>>>> needed is not clear, entered via duke changeset 1. I did not dig since >>>>>>>>>> we no longer have such use case. >>>>>>>>>> >>>>>>>>>> Webrev: >>>>>>>>>> http://cr.openjdk.java.net/~rehn/8222640/webrev/ >>>>>>>>>> Issue: >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8222640 >>>>>>>>>> >>>>>>>>>> Passes t1-5. >>>>>>>>>> >>>>>>>>>> Thanks, Robbin >>>>>>>> >>>>>> >>>> From coleen.phillimore at oracle.com Thu Apr 25 12:10:45 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 25 Apr 2019 08:10:45 -0400 Subject: RFR(s): 8222637: Obsolete NeedsDeoptSuspend (was RFR(s): 8222640: Remove deopt suspend) In-Reply-To: <2b64144a-62bb-bdb6-77d2-196bb2936668@oracle.com> References: <2f4eeddd-1480-4dab-8c64-187282586348@oracle.com> <70e28808-14a2-d063-9dcf-cf49ca3fa7f6@oracle.com> <8d4cffa7-9980-dabd-b446-be8a07efca7f@oracle.com> <7010d33e-13d1-77a4-2a36-0cbe03209bed@oracle.com> <544b8a83-e5bb-eead-7797-70e08ca15ee8@oracle.com> <45681477-9a04-a623-6686-c72b36be6c38@oracle.com> <9d09a876-a520-6218-a781-d711cdf58bc1@oracle.com> <2b64144a-62bb-bdb6-77d2-196bb2936668@oracle.com> Message-ID: :)? Looks awesome, Robbin! Thanks for fixing this! Coleen, not Dean or David On 4/25/19 8:07 AM, Robbin Ehn wrote: > Thanks Coleen! > > /Robbin > > Ops, s/Dead/Dean/ , sorry :) > > On 4/25/19 12:48 PM,? wrote: >> Looks good Robbin! >> >> Nice to see things simplified. >> >> Thanks, >> David >> >> On 25/04/2019 6:53 pm, Robbin Ehn wrote: >>> Hi, >>> >>> The same patch as in 8222640 but with obsoleting of the flag also. >>> >>> Issue: >>> https://bugs.openjdk.java.net/browse/JDK-8222637 >>> CSR: >>> https://bugs.openjdk.java.net/browse/JDK-8222639 >>> >>> The incremental change is thus: >>> http://cr.openjdk.java.net/~rehn/8222637/webrev/src/hotspot/share/runtime/globals.hpp.sdiff.html >>> >>> http://cr.openjdk.java.net/~rehn/8222637/webrev/src/hotspot/share/runtime/arguments.cpp.sdiff.html >>> >>> >>> Full: >>> http://cr.openjdk.java.net/~rehn/8222637/webrev/ >>> >>> Dead and Coleen had previously review 8222640, so if they can >>> acknowledge this inc change. >>> >>> Thanks, Robbin >>> >>> On 4/24/19 1:49 AM, Robbin Ehn wrote: >>>> Thanks Coleen! >>>> >>>> /Robbin >>>> >>>> On 2019-04-24 00:47, coleen.phillimore at oracle.com wrote: >>>>> +1? This looks good! >>>>> Coleen >>>>> >>>>> On 4/23/19 5:32 PM, Robbin Ehn wrote: >>>>>> Thanks Dean! >>>>>> >>>>>> /Robbin >>>>>> >>>>>> On 2019-04-23 23:17, dean.long at oracle.com wrote: >>>>>>> Yes, looks good! >>>>>>> >>>>>>> dl >>>>>>> >>>>>>> On 4/23/19 12:38 PM, Robbin Ehn wrote: >>>>>>>> Hi Dean, >>>>>>>> >>>>>>>> Is this what you had in mind: >>>>>>>> diff -r 295029840379 src/hotspot/share/runtime/frame.cpp >>>>>>>> --- a/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 >>>>>>>> 09:58:55 2019 +0200 >>>>>>>> +++ b/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 >>>>>>>> 21:32:00 2019 +0200 >>>>>>>> @@ -272,4 +272,6 @@ >>>>>>>> >>>>>>>> ?void frame::deoptimize(JavaThread* thread) { >>>>>>>> + assert(thread->frame_anchor()->has_last_Java_frame() && >>>>>>>> +???????? thread->frame_anchor()->walkable(), "must be"); >>>>>>>> ?? // Schedule deoptimization of an nmethod activation with >>>>>>>> this frame. >>>>>>>> ?? assert(_cb != NULL && _cb->is_compiled(), "must be"); >>>>>>>> >>>>>>>> Passes t1-5. >>>>>>>> >>>>>>>> v2: >>>>>>>> http://cr.openjdk.java.net/~rehn/8222640/2/webrev/ >>>>>>>> Inc: >>>>>>>> http://cr.openjdk.java.net/~rehn/8222640/2/inc/webrev/ >>>>>>>> >>>>>>>> Thanks, Robbin >>>>>>>> >>>>>>>> On 2019-04-18 06:22, dean.long at oracle.com wrote: >>>>>>>>> In frame::deoptimize(), can we assert that we have an anchor >>>>>>>>> frame and that it is walkable? >>>>>>>>> >>>>>>>>> dl >>>>>>>>> >>>>>>>>> On 4/17/19 3:09 AM, Robbin Ehn wrote: >>>>>>>>>> Adding compiler. >>>>>>>>>> >>>>>>>>>> /Robbin >>>>>>>>>> >>>>>>>>>> On 4/17/19 10:35 AM, Robbin Ehn wrote: >>>>>>>>>>> Hi all, please consider this change. >>>>>>>>>>> >>>>>>>>>>> The code for deopt suspend is no longer needed since today >>>>>>>>>>> the register window >>>>>>>>>>> is always flushed when this code executes. Exactly when this >>>>>>>>>>> code was needed is not clear, entered via duke changeset 1. >>>>>>>>>>> I did not dig since we no longer have such use case. >>>>>>>>>>> >>>>>>>>>>> Webrev: >>>>>>>>>>> http://cr.openjdk.java.net/~rehn/8222640/webrev/ >>>>>>>>>>> Issue: >>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8222640 >>>>>>>>>>> >>>>>>>>>>> Passes t1-5. >>>>>>>>>>> >>>>>>>>>>> Thanks, Robbin >>>>>>>>> >>>>>>> >>>>> From robbin.ehn at oracle.com Thu Apr 25 12:13:04 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 25 Apr 2019 14:13:04 +0200 Subject: RFR(s): 8222637: Obsolete NeedsDeoptSuspend (was RFR(s): 8222640: Remove deopt suspend) In-Reply-To: References: <2f4eeddd-1480-4dab-8c64-187282586348@oracle.com> <70e28808-14a2-d063-9dcf-cf49ca3fa7f6@oracle.com> <8d4cffa7-9980-dabd-b446-be8a07efca7f@oracle.com> <7010d33e-13d1-77a4-2a36-0cbe03209bed@oracle.com> <544b8a83-e5bb-eead-7797-70e08ca15ee8@oracle.com> <45681477-9a04-a623-6686-c72b36be6c38@oracle.com> <9d09a876-a520-6218-a781-d711cdf58bc1@oracle.com> <2b64144a-62bb-bdb6-77d2-196bb2936668@oracle.com> Message-ID: <2f3b9d8c-e45d-9e14-c83f-ae70b72e597f@oracle.com> Thanks Coleen! On 4/25/19 2:10 PM, coleen.phillimore at oracle.com wrote: > > :)? Looks awesome, Robbin! > Thanks for fixing this! > Coleen, not Dean or David Ah, not my day... > > On 4/25/19 8:07 AM, Robbin Ehn wrote: >> Thanks Coleen! s/Coleen/David :) Thanks for helping with CSR David! /Robbin >> >> /Robbin >> >> Ops, s/Dead/Dean/ , sorry :) >> >> On 4/25/19 12:48 PM,? wrote: >>> Looks good Robbin! >>> >>> Nice to see things simplified. >>> >>> Thanks, >>> David >>> >>> On 25/04/2019 6:53 pm, Robbin Ehn wrote: >>>> Hi, >>>> >>>> The same patch as in 8222640 but with obsoleting of the flag also. >>>> >>>> Issue: >>>> https://bugs.openjdk.java.net/browse/JDK-8222637 >>>> CSR: >>>> https://bugs.openjdk.java.net/browse/JDK-8222639 >>>> >>>> The incremental change is thus: >>>> http://cr.openjdk.java.net/~rehn/8222637/webrev/src/hotspot/share/runtime/globals.hpp.sdiff.html >>>> >>>> http://cr.openjdk.java.net/~rehn/8222637/webrev/src/hotspot/share/runtime/arguments.cpp.sdiff.html >>>> >>>> >>>> Full: >>>> http://cr.openjdk.java.net/~rehn/8222637/webrev/ >>>> >>>> Dead and Coleen had previously review 8222640, so if they can acknowledge >>>> this inc change. >>>> >>>> Thanks, Robbin >>>> >>>> On 4/24/19 1:49 AM, Robbin Ehn wrote: >>>>> Thanks Coleen! >>>>> >>>>> /Robbin >>>>> >>>>> On 2019-04-24 00:47, coleen.phillimore at oracle.com wrote: >>>>>> +1? This looks good! >>>>>> Coleen >>>>>> >>>>>> On 4/23/19 5:32 PM, Robbin Ehn wrote: >>>>>>> Thanks Dean! >>>>>>> >>>>>>> /Robbin >>>>>>> >>>>>>> On 2019-04-23 23:17, dean.long at oracle.com wrote: >>>>>>>> Yes, looks good! >>>>>>>> >>>>>>>> dl >>>>>>>> >>>>>>>> On 4/23/19 12:38 PM, Robbin Ehn wrote: >>>>>>>>> Hi Dean, >>>>>>>>> >>>>>>>>> Is this what you had in mind: >>>>>>>>> diff -r 295029840379 src/hotspot/share/runtime/frame.cpp >>>>>>>>> --- a/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 09:58:55 >>>>>>>>> 2019 +0200 >>>>>>>>> +++ b/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 21:32:00 >>>>>>>>> 2019 +0200 >>>>>>>>> @@ -272,4 +272,6 @@ >>>>>>>>> >>>>>>>>> ?void frame::deoptimize(JavaThread* thread) { >>>>>>>>> + assert(thread->frame_anchor()->has_last_Java_frame() && >>>>>>>>> +???????? thread->frame_anchor()->walkable(), "must be"); >>>>>>>>> ?? // Schedule deoptimization of an nmethod activation with this frame. >>>>>>>>> ?? assert(_cb != NULL && _cb->is_compiled(), "must be"); >>>>>>>>> >>>>>>>>> Passes t1-5. >>>>>>>>> >>>>>>>>> v2: >>>>>>>>> http://cr.openjdk.java.net/~rehn/8222640/2/webrev/ >>>>>>>>> Inc: >>>>>>>>> http://cr.openjdk.java.net/~rehn/8222640/2/inc/webrev/ >>>>>>>>> >>>>>>>>> Thanks, Robbin >>>>>>>>> >>>>>>>>> On 2019-04-18 06:22, dean.long at oracle.com wrote: >>>>>>>>>> In frame::deoptimize(), can we assert that we have an anchor frame and >>>>>>>>>> that it is walkable? >>>>>>>>>> >>>>>>>>>> dl >>>>>>>>>> >>>>>>>>>> On 4/17/19 3:09 AM, Robbin Ehn wrote: >>>>>>>>>>> Adding compiler. >>>>>>>>>>> >>>>>>>>>>> /Robbin >>>>>>>>>>> >>>>>>>>>>> On 4/17/19 10:35 AM, Robbin Ehn wrote: >>>>>>>>>>>> Hi all, please consider this change. >>>>>>>>>>>> >>>>>>>>>>>> The code for deopt suspend is no longer needed since today the >>>>>>>>>>>> register window >>>>>>>>>>>> is always flushed when this code executes. Exactly when this code >>>>>>>>>>>> was needed is not clear, entered via duke changeset 1. I did not dig >>>>>>>>>>>> since we no longer have such use case. >>>>>>>>>>>> >>>>>>>>>>>> Webrev: >>>>>>>>>>>> http://cr.openjdk.java.net/~rehn/8222640/webrev/ >>>>>>>>>>>> Issue: >>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8222640 >>>>>>>>>>>> >>>>>>>>>>>> Passes t1-5. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, Robbin >>>>>>>>>> >>>>>>>> >>>>>> > From dean.long at oracle.com Thu Apr 25 14:49:57 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 25 Apr 2019 07:49:57 -0700 Subject: RFR(s): 8222637: Obsolete NeedsDeoptSuspend (was RFR(s): 8222640: Remove deopt suspend) In-Reply-To: References: <2f4eeddd-1480-4dab-8c64-187282586348@oracle.com> <70e28808-14a2-d063-9dcf-cf49ca3fa7f6@oracle.com> <8d4cffa7-9980-dabd-b446-be8a07efca7f@oracle.com> <7010d33e-13d1-77a4-2a36-0cbe03209bed@oracle.com> <544b8a83-e5bb-eead-7797-70e08ca15ee8@oracle.com> <45681477-9a04-a623-6686-c72b36be6c38@oracle.com> Message-ID: <4296a391-9bae-daa7-2190-4d28acaa1074@oracle.com> Looks good. dl On 4/25/19 1:53 AM, Robbin Ehn wrote: > Hi, > > The same patch as in 8222640 but with obsoleting of the flag also. > > Issue: > https://bugs.openjdk.java.net/browse/JDK-8222637 > CSR: > https://bugs.openjdk.java.net/browse/JDK-8222639 > > The incremental change is thus: > http://cr.openjdk.java.net/~rehn/8222637/webrev/src/hotspot/share/runtime/globals.hpp.sdiff.html > > http://cr.openjdk.java.net/~rehn/8222637/webrev/src/hotspot/share/runtime/arguments.cpp.sdiff.html > > > Full: > http://cr.openjdk.java.net/~rehn/8222637/webrev/ > > Dead and Coleen had previously review 8222640, so if they can > acknowledge this inc change. > > Thanks, Robbin > > On 4/24/19 1:49 AM, Robbin Ehn wrote: >> Thanks Coleen! >> >> /Robbin >> >> On 2019-04-24 00:47, coleen.phillimore at oracle.com wrote: >>> +1? This looks good! >>> Coleen >>> >>> On 4/23/19 5:32 PM, Robbin Ehn wrote: >>>> Thanks Dean! >>>> >>>> /Robbin >>>> >>>> On 2019-04-23 23:17, dean.long at oracle.com wrote: >>>>> Yes, looks good! >>>>> >>>>> dl >>>>> >>>>> On 4/23/19 12:38 PM, Robbin Ehn wrote: >>>>>> Hi Dean, >>>>>> >>>>>> Is this what you had in mind: >>>>>> diff -r 295029840379 src/hotspot/share/runtime/frame.cpp >>>>>> --- a/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 >>>>>> 09:58:55 2019 +0200 >>>>>> +++ b/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 >>>>>> 21:32:00 2019 +0200 >>>>>> @@ -272,4 +272,6 @@ >>>>>> >>>>>> ?void frame::deoptimize(JavaThread* thread) { >>>>>> + assert(thread->frame_anchor()->has_last_Java_frame() && >>>>>> +???????? thread->frame_anchor()->walkable(), "must be"); >>>>>> ?? // Schedule deoptimization of an nmethod activation with this >>>>>> frame. >>>>>> ?? assert(_cb != NULL && _cb->is_compiled(), "must be"); >>>>>> >>>>>> Passes t1-5. >>>>>> >>>>>> v2: >>>>>> http://cr.openjdk.java.net/~rehn/8222640/2/webrev/ >>>>>> Inc: >>>>>> http://cr.openjdk.java.net/~rehn/8222640/2/inc/webrev/ >>>>>> >>>>>> Thanks, Robbin >>>>>> >>>>>> On 2019-04-18 06:22, dean.long at oracle.com wrote: >>>>>>> In frame::deoptimize(), can we assert that we have an anchor >>>>>>> frame and that it is walkable? >>>>>>> >>>>>>> dl >>>>>>> >>>>>>> On 4/17/19 3:09 AM, Robbin Ehn wrote: >>>>>>>> Adding compiler. >>>>>>>> >>>>>>>> /Robbin >>>>>>>> >>>>>>>> On 4/17/19 10:35 AM, Robbin Ehn wrote: >>>>>>>>> Hi all, please consider this change. >>>>>>>>> >>>>>>>>> The code for deopt suspend is no longer needed since today the >>>>>>>>> register window >>>>>>>>> is always flushed when this code executes. Exactly when this >>>>>>>>> code was needed is not clear, entered via duke changeset 1. I >>>>>>>>> did not dig since we no longer have such use case. >>>>>>>>> >>>>>>>>> Webrev: >>>>>>>>> http://cr.openjdk.java.net/~rehn/8222640/webrev/ >>>>>>>>> Issue: >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8222640 >>>>>>>>> >>>>>>>>> Passes t1-5. >>>>>>>>> >>>>>>>>> Thanks, Robbin >>>>>>> >>>>> >>> From dean.long at oracle.com Thu Apr 25 18:53:28 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 25 Apr 2019 11:53:28 -0700 Subject: RFR(M) 8219403: JVMCIRuntime::adjust_comp_level should be replaced Message-ID: <3f0271ca-b8e6-dfcb-8787-8c36f49265fe@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8219403 http://cr.openjdk.java.net/~dlong/8219403/webrev.2/ This change removes the problematic JVMCIRuntime::adjust_comp_level.? It is based on previous work in Graal, graal-jvmci-8, and Metropolis by Tom, Doug, and Vladimir. I also problem-listed several tests that were causing noise in the test results. dl From dean.long at oracle.com Thu Apr 25 19:39:39 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 25 Apr 2019 12:39:39 -0700 Subject: RFR(m): 8221734: Deoptimize with handshakes In-Reply-To: <89b00912-1f84-3458-d53b-fbe6d372affe@oracle.com> References: <89b00912-1f84-3458-d53b-fbe6d372affe@oracle.com> Message-ID: I don't pretend to understand the biased locking changing :-)? Could you explain the need for CompiledMethod_lock, and the other lock rank changes? Replacing 1089 if (_method->code() == this) { 1090 _method->clear_code(); // Break a cycle 1091 } with 1087?? Method::unlink_code(_method, this); // Break a cycle doesn't look equivalent.? The new unlink_code has a stronger check, also checking the entry point.? I'm not sure that's OK here. In Method::Method(), don't we want to be able to initialize these fields without grabbing CompiledMethod_lock? dl On 4/25/19 5:05 AM, Robbin Ehn wrote: > Hi all, please review. > > Let's deopt with handshakes. > Removed VM op Deoptimize, instead we handshake. > Locks needs to be inflate since we are not in a safepoint. > > Goes on top of: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-April/033491.html > > > Code: > http://cr.openjdk.java.net/~rehn/8221734/v1/webrev/index.html > Issue: > https://bugs.openjdk.java.net/browse/JDK-8221734 > > Passes t1-7 and multiple t1-5 runs. > > A few startup benchmark see a small speedup. > > Thanks, Robbin -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Apr 25 20:09:47 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 25 Apr 2019 13:09:47 -0700 Subject: RFR(M) 8219403: JVMCIRuntime::adjust_comp_level should be replaced In-Reply-To: <3f0271ca-b8e6-dfcb-8787-8c36f49265fe@oracle.com> References: <3f0271ca-b8e6-dfcb-8787-8c36f49265fe@oracle.com> Message-ID: <91f3d9be-5612-2556-ae77-a60102fc07ae@oracle.com> In general looks good. Only minor issue: in jvmciJavaClasses.hpp indent of '\' is not adjusted in changes line. Thanks, Vladimir On 4/25/19 11:53 AM, dean.long at oracle.com wrote: > https://bugs.openjdk.java.net/browse/JDK-8219403 > http://cr.openjdk.java.net/~dlong/8219403/webrev.2/ > > This change removes the problematic JVMCIRuntime::adjust_comp_level.? It > is based on previous work in Graal, graal-jvmci-8, and Metropolis by Tom, > Doug, and Vladimir. > > I also problem-listed several tests that were causing noise in the test results. > > dl From daniel.daugherty at oracle.com Thu Apr 25 20:43:22 2019 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 25 Apr 2019 16:43:22 -0400 Subject: RFR(m): 8221734: Deoptimize with handshakes In-Reply-To: <89b00912-1f84-3458-d53b-fbe6d372affe@oracle.com> References: <89b00912-1f84-3458-d53b-fbe6d372affe@oracle.com> Message-ID: <2536572e-158f-0913-889b-ef76d6122c79@oracle.com> On 4/25/19 8:05 AM, Robbin Ehn wrote: > Hi all, please review. > > Let's deopt with handshakes. > Removed VM op Deoptimize, instead we handshake. > Locks needs to be inflate since we are not in a safepoint. > > Goes on top of: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-April/033491.html > > > Code: > http://cr.openjdk.java.net/~rehn/8221734/v1/webrev/index.html Robbin, you'll have to merge with Coleen's recent MutexLocker fix (8222518). src/hotspot/share/aot/aotCodeHeap.cpp ??? No comments. src/hotspot/share/aot/aotCompiledMethod.cpp ??? L163: bool AOTCompiledMethod::make_not_entrant_helper(int new_state) { ??? L207: bool AOTCompiledMethod::make_entrant() { ??????? So the Compiler team is on board with switching from the ??????? Patching_lock to the CompiledMethod_lock? src/hotspot/share/code/codeCache.cpp ??? No comments. src/hotspot/share/code/nmethod.cpp ??? L1180: bool nmethod::make_not_entrant_or_zombie(int state) { ??? L2853: void nmethod::clear_jvmci_installed_code() { ??? L2861: void nmethod::clear_speculation_log() { ??? L2869: void nmethod::maybe_invalidate_installed_code() { ??? L2904: void nmethod::invalidate_installed_code(Handle installedCode, TRAPS) { ??????? So the Compiler team is on board with switching from the ??????? Patching_lock to the CompiledMethod_lock? src/hotspot/share/code/nmethod.hpp ??? No comments. src/hotspot/share/gc/z/zBarrierSetNMethod.cpp ??? No comments (copyright year needs update). src/hotspot/share/gc/z/zNMethod.cpp ??? No comments (copyright year needs update). src/hotspot/share/oops/markOop.hpp ??? L180: ? bool biased_locker_is(JavaThread* thread) const { ??????? Is this for a different project? ??? L181: ??? if (!has_bias_pattern()) { ??? L182: ????? return false; ??? L183: ??? } ??? L184: ??? // If current thread is not the owner it can be unbiased at anytime. ??? L185: ??? JavaThread* jt = (JavaThread*) ((intptr_t) (mask_bits(value(), ~(biased_lock_mask_in_place | age_mask_in_place | epoch_mask_in_place)))); ??????? So you don't want to use this: ? ? ? ? ? JavaThread* biased_locker() const { ? ? ? ?? ?? assert(has_bias_pattern(), "should not call this otherwise"); ? ? ? ?? ?? return (JavaThread*) ((intptr_t) (mask_bits(value(), ~(biased_lock_mask_in_place | age_mask_in_place | epoch_mask_in_place)))); ? ? ? ? ? } ??????? because of the assert(). ??????? I think biased_locker() and biased_locker_is() both need to work ??????? with a copy of the markOop so that it can't change dynamically. ??????? Something like: ????????? JavaThread* biased_locker() const { ??????????? markOop copy = this; ??????????? assert(copy.has_bias_pattern(), "should not call this otherwise"); ??????????? return (JavaThread*) ((intptr_t) (copy.mask_bits(value(), ~(biased_lock_mask_in_place | age_mask_in_place | epoch_mask_in_place)))); ????????? } ????????? bool biased_locker_is(JavaThread* thread) const { ??????????? markOop copy = this; ??????????? if (!copy.has_bias_pattern()) { ????????????? return false; ??????????? } ??????????? return copy.biased_locker(); ????????? } src/hotspot/share/oops/method.cpp ??? old L104: ? clear_code(false /* don't need a lock */); // from_c/from_i get set to c2i/i2i ??????? Is the comment after '//' not useful? ??? L946: ??? MutexLockerEx ml(CompiledMethod_lock->owned_by_self() ? NULL : CompiledMethod_lock, Mutex::_no_safepoint_check_flag); ??????? So the Compiler team is on board with switching from the ??????? Patching_lock to the CompiledMethod_lock? src/hotspot/share/oops/method.hpp ??? No comments. src/hotspot/share/prims/jvmtiEventController.cpp ??? No comments (copyright year needs update). src/hotspot/share/prims/methodHandles.cpp ??? No comments. src/hotspot/share/prims/whitebox.cpp ??? No comments. src/hotspot/share/runtime/biasedLocking.cpp src/hotspot/share/runtime/biasedLocking.hpp ??? Hmmm... More Biased Locking changes. I didn't take a close ??? look at these. src/hotspot/share/runtime/deoptimization.cpp ??? No comments (but some overlap with Biased Locking, ouch) src/hotspot/share/runtime/deoptimization.hpp ??? No comments. src/hotspot/share/runtime/mutex.hpp ??? old L65: ?????? special??????? = tty??????????? +?? 1, ??? new L65: ?????? special??????? = tty??????????? +?? 2, ??????? Why? src/hotspot/share/runtime/mutexLocker.cpp ??? No comments. src/hotspot/share/runtime/mutexLocker.hpp ??? L34: extern Mutex*?? Patching_lock;?????????????????? // a lock used to guard code patching of compiled code ??? L35: extern Mutex*?? CompiledMethod_lock; ??????? A comment is traditional here... src/hotspot/share/runtime/synchronizer.cpp ??? old L1317: ???????? !SafepointSynchronize::is_at_safepoint(), "invariant"); ??? new L1317: ???????? !Universe::heap()->is_gc_active(), "invariant"); ??????? Why? ? ? L1446: ??????? ResourceMark rm; ??? L1496: ????? ResourceMark rm; ??????? Why drop 'Self'? That makes the ResourceMark more expensive.. src/hotspot/share/runtime/thread.cpp ??? No comments. src/hotspot/share/runtime/thread.hpp ??? No comments. src/hotspot/share/runtime/vmOperations.cpp ??? No comments. src/hotspot/share/runtime/vmOperations.hpp ??? No comments. src/hotspot/share/services/dtraceAttacher.cpp ??? No comments (copyright year needs update). I don't think I've found anything that's "must fix". Please let me know if I should re-review: ??? src/hotspot/share/runtime/biasedLocking.cpp ??? src/hotspot/share/runtime/biasedLocking.hpp ??? src/hotspot/share/runtime/deoptimization.cpp because the Biased Locking changes are critical for this project. Dan > Issue: > https://bugs.openjdk.java.net/browse/JDK-8221734 > > Passes t1-7 and multiple t1-5 runs. > > A few startup benchmark see a small speedup. > > Thanks, Robbin From dean.long at oracle.com Thu Apr 25 22:00:38 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 25 Apr 2019 15:00:38 -0700 Subject: RFR(M) 8219403: JVMCIRuntime::adjust_comp_level should be replaced In-Reply-To: <91f3d9be-5612-2556-ae77-a60102fc07ae@oracle.com> References: <3f0271ca-b8e6-dfcb-8787-8c36f49265fe@oracle.com> <91f3d9be-5612-2556-ae77-a60102fc07ae@oracle.com> Message-ID: <0c7df5d0-7638-7b44-65fd-18fe1279a0b3@oracle.com> Fixed.? Thanks for the review. dl On 4/25/19 1:09 PM, Vladimir Kozlov wrote: > In general looks good. Only minor issue: > > in jvmciJavaClasses.hpp indent of '\' is not adjusted in changes line. > > Thanks, > Vladimir > > On 4/25/19 11:53 AM, dean.long at oracle.com wrote: >> https://bugs.openjdk.java.net/browse/JDK-8219403 >> http://cr.openjdk.java.net/~dlong/8219403/webrev.2/ >> >> This change removes the problematic JVMCIRuntime::adjust_comp_level.? It >> is based on previous work in Graal, graal-jvmci-8, and Metropolis by >> Tom, >> Doug, and Vladimir. >> >> I also problem-listed several tests that were causing noise in the >> test results. >> >> dl From robbin.ehn at oracle.com Fri Apr 26 08:16:15 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Fri, 26 Apr 2019 10:16:15 +0200 Subject: RFR(s): 8222637: Obsolete NeedsDeoptSuspend (was RFR(s): 8222640: Remove deopt suspend) In-Reply-To: <4296a391-9bae-daa7-2190-4d28acaa1074@oracle.com> References: <2f4eeddd-1480-4dab-8c64-187282586348@oracle.com> <70e28808-14a2-d063-9dcf-cf49ca3fa7f6@oracle.com> <8d4cffa7-9980-dabd-b446-be8a07efca7f@oracle.com> <7010d33e-13d1-77a4-2a36-0cbe03209bed@oracle.com> <544b8a83-e5bb-eead-7797-70e08ca15ee8@oracle.com> <45681477-9a04-a623-6686-c72b36be6c38@oracle.com> <4296a391-9bae-daa7-2190-4d28acaa1074@oracle.com> Message-ID: <1fd15fbd-88b1-d044-169a-40c61d526002@oracle.com> Thanks Dean! /Robbin On 4/25/19 4:49 PM, dean.long at oracle.com wrote: > Looks good. > > dl > > On 4/25/19 1:53 AM, Robbin Ehn wrote: >> Hi, >> >> The same patch as in 8222640 but with obsoleting of the flag also. >> >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8222637 >> CSR: >> https://bugs.openjdk.java.net/browse/JDK-8222639 >> >> The incremental change is thus: >> http://cr.openjdk.java.net/~rehn/8222637/webrev/src/hotspot/share/runtime/globals.hpp.sdiff.html >> >> http://cr.openjdk.java.net/~rehn/8222637/webrev/src/hotspot/share/runtime/arguments.cpp.sdiff.html >> >> >> Full: >> http://cr.openjdk.java.net/~rehn/8222637/webrev/ >> >> Dead and Coleen had previously review 8222640, so if they can acknowledge this >> inc change. >> >> Thanks, Robbin >> >> On 4/24/19 1:49 AM, Robbin Ehn wrote: >>> Thanks Coleen! >>> >>> /Robbin >>> >>> On 2019-04-24 00:47, coleen.phillimore at oracle.com wrote: >>>> +1? This looks good! >>>> Coleen >>>> >>>> On 4/23/19 5:32 PM, Robbin Ehn wrote: >>>>> Thanks Dean! >>>>> >>>>> /Robbin >>>>> >>>>> On 2019-04-23 23:17, dean.long at oracle.com wrote: >>>>>> Yes, looks good! >>>>>> >>>>>> dl >>>>>> >>>>>> On 4/23/19 12:38 PM, Robbin Ehn wrote: >>>>>>> Hi Dean, >>>>>>> >>>>>>> Is this what you had in mind: >>>>>>> diff -r 295029840379 src/hotspot/share/runtime/frame.cpp >>>>>>> --- a/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 09:58:55 2019 >>>>>>> +0200 >>>>>>> +++ b/src/hotspot/share/runtime/frame.cpp?????? Tue Apr 23 21:32:00 2019 >>>>>>> +0200 >>>>>>> @@ -272,4 +272,6 @@ >>>>>>> >>>>>>> ?void frame::deoptimize(JavaThread* thread) { >>>>>>> + assert(thread->frame_anchor()->has_last_Java_frame() && >>>>>>> +???????? thread->frame_anchor()->walkable(), "must be"); >>>>>>> ?? // Schedule deoptimization of an nmethod activation with this frame. >>>>>>> ?? assert(_cb != NULL && _cb->is_compiled(), "must be"); >>>>>>> >>>>>>> Passes t1-5. >>>>>>> >>>>>>> v2: >>>>>>> http://cr.openjdk.java.net/~rehn/8222640/2/webrev/ >>>>>>> Inc: >>>>>>> http://cr.openjdk.java.net/~rehn/8222640/2/inc/webrev/ >>>>>>> >>>>>>> Thanks, Robbin >>>>>>> >>>>>>> On 2019-04-18 06:22, dean.long at oracle.com wrote: >>>>>>>> In frame::deoptimize(), can we assert that we have an anchor frame and >>>>>>>> that it is walkable? >>>>>>>> >>>>>>>> dl >>>>>>>> >>>>>>>> On 4/17/19 3:09 AM, Robbin Ehn wrote: >>>>>>>>> Adding compiler. >>>>>>>>> >>>>>>>>> /Robbin >>>>>>>>> >>>>>>>>> On 4/17/19 10:35 AM, Robbin Ehn wrote: >>>>>>>>>> Hi all, please consider this change. >>>>>>>>>> >>>>>>>>>> The code for deopt suspend is no longer needed since today the >>>>>>>>>> register window >>>>>>>>>> is always flushed when this code executes. Exactly when this code was >>>>>>>>>> needed is not clear, entered via duke changeset 1. I did not dig since >>>>>>>>>> we no longer have such use case. >>>>>>>>>> >>>>>>>>>> Webrev: >>>>>>>>>> http://cr.openjdk.java.net/~rehn/8222640/webrev/ >>>>>>>>>> Issue: >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8222640 >>>>>>>>>> >>>>>>>>>> Passes t1-5. >>>>>>>>>> >>>>>>>>>> Thanks, Robbin >>>>>>>> >>>>>> >>>> > From rahul.v.raghavan at oracle.com Fri Apr 26 09:20:49 2019 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Fri, 26 Apr 2019 14:50:49 +0530 Subject: [13] RFR: 8202414: Unsafe write after primitive array creation may result in array length change In-Reply-To: <259ef902-778b-7eef-46e2-d1927950d21c@oracle.com> References: <7e900022-4e16-2ab9-1f4d-89e1510e2646@oracle.com> <392c665f-869c-29af-4fc5-e6f844820846@oracle.com> <3db5d7ab-ad99-310b-e891-fc36d25da338@oracle.com> <7b03a213-7fee-a87f-b48d-250662e730ef@oracle.com> <959abf54-d1da-95ee-9cf6-6c6d8ec5e4a1@oracle.com> <18115aa8-edaa-31b9-02a6-06721d9fbfc9@oracle.com> <939f3f5d-b8e7-939f-8953-d34a0f3ff6c9@oracle.com> <259ef902-778b-7eef-46e2-d1927950d21c@oracle.com> Message-ID: <73f7c647-3194-2a65-6cc6-a15cbf6c82be@oracle.com> Thank you Vladimir Ivanov for suggestions. Request help to review the notes inline below to finalize the fix. On 25/04/19 12:36 AM, Vladimir Ivanov wrote: > > A couple of suggestions: > > src/hotspot/share/opto/memnode.cpp: > > +? int size_in_bytes = st->memory_size(); > +? if ((size_in_bytes != 0) && (get_store_offset(st, phase) % > size_in_bytes) != 0) { > +??? return FAIL; > +? } > > I'd just add "if (st->is_mismatched_access()) { return FAIL; }" in > IN::can_capture_store and move the rest into > IN::captured_store_insertion_point(). > Yes, I will add following change also along with any final fix. intptr_t InitializeNode::can_capture_store(StoreNode* st, PhaseTransform* phase, bool can_reshape) { const int FAIL = 0; - if (st->is_unaligned_access()) { + if (st->is_unaligned_access() || st->is_mismatched_access()) { return FAIL; } > It looks like InitializeNode::captured_store_insertion_point() is a > better place to inspect get_store_offset() and fail on mismatch. It also > handles negative values returned by get_store_offset() and has a > consistent handling of "size_in_bytes == 0" case. > (i) At first tried following *wrong* change. (confirmed this is not a fix for the issue) int InitializeNode::captured_store_insertion_point(intptr_t start, int size_in_bytes, PhaseTransform* phase) { ........... for (uint i = InitializeNode::RawStores, limit = req(); ; ) { if (i >= limit) return -(int)i; // not found; here is where to put it Node* st = in(i); intptr_t st_off = get_store_offset(st, phase); - if (st_off < 0) { + if ((size_in_bytes != 0) && (st_off % size_in_bytes) != 0) { + return FAIL; + } else if (st_off < 0) { if (st != zero_memory()) { ............ (ii) When working for correct fix in captured_store_insertion_point(), (was looking to get required captured store / get_store_offset value in) found that following additions in captured_store_insertion_point() actually fixes reported issue; also could not find any other issue with testing. ............ if (start >= ti_limit) return FAIL; + if ((size_in_bytes != 0) && (start % size_in_bytes) != 0) { + return FAIL; + } + for (uint i = InitializeNode::RawStores, limit = req(); ; ) { ........... (iii) So if above (ii) is a legal/valid fix, then again can the best location of the fix be as following in can_capture_store() itself? intptr_t InitializeNode::can_capture_store(StoreNode* st, PhaseTransform* phase, bool can_reshape) { ................... AllocateNode* alloc = AllocateNode::Ideal_allocation(adr, phase, offset); if (alloc == NULL) return FAIL; // inscrutable address if (alloc != allocation()) return FAIL; // wrong allocation! (store needs to float up) + int size_in_bytes = st->memory_size(); + if ((size_in_bytes != 0) && (offset % size_in_bytes) != 0) { + return FAIL; + } Node* val = st->in(MemNode::ValueIn); ........... return offset; // success } (iv) OR something like following in capture_store() ? Node* InitializeNode::capture_store(StoreNode* st, intptr_t start, PhaseTransform* phase, bool can_reshape) { assert(stores_are_sane(phase), ""); if (start < 0) return NULL; assert(can_capture_store(st, phase, can_reshape) == start, "sanity"); + int size_in_bytes = st->memory_size(); + if ((size_in_bytes != 0) && (start % size_in_bytes) != 0) { + return NULL; + } Compile* C = phase->C; - int size_in_bytes = st->memory_size(); int i = captured_store_insertion_point(start, size_in_bytes, phase); if (i == 0) return NULL; // bail out .................. (v) Please tell me if I am missing something and none above is valid fix. Then should work for a correct fix in captured_store_insertion_point(). > > PS: Frankly speaking, I'm not comfortable with current handling of > "size_in_bytes == 0" case: > > // If size_in_bytes is zero, do not bother with overlap checks. > int InitializeNode::captured_store_insertion_point(intptr_t start, > ?????????????????????????????????????????????????? int size_in_bytes, > ?????????????????????????????????????????????????? PhaseTransform* > phase) { > > "size_in_bytes == 0" is the case for StoreVectorNodes. I believe the > logic is there to accommodate bulk initialization with vector stores > (haven't found the proofs in the code though), but it looks fragile > (especially when vector operations become more common). It would be nice > to enable overlap checks for vector operations. > > But I'm perfectly fine with handling that separately. > Yes, I will open a new JBS task to address this. Thanks, Rahul > Best regards, > Vladimir Ivanov From tobias.hartmann at oracle.com Fri Apr 26 12:14:31 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 26 Apr 2019 14:14:31 +0200 Subject: [13] 8221592: C2 compilation failed with assert(!q->is_MergeMem()) Message-ID: <9a052954-90ea-5d45-37d0-cc6f95c1bc84@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8221592 http://cr.openjdk.java.net/~thartmann/8221592/webrev.00/ We hit an assert during parsing with incremental inlining when merging memory edges into a target block because of a MergeMem that has another MergeMem as input. I've traced the MergeMem back to this code: http://hg.openjdk.java.net/jdk/jdk/file/47a8fdf84424/src/hotspot/share/opto/parse1.cpp#l1028 The memory input of the _exit map is a MergeMem with a useless Phi input that has another MergeMem as input (see bug comments for details). After calling transform on this slice, the Phi is removed and we end up with a MergeMem that is then set as input to the original MergeMem. This later triggers the assert. The proposed fix is to transform the original MergeMem after transforming the slices to get rid of MergeMem inputs. The problem only shows up with the fix for JDK-8059241 [1] which reduced the number of times PhaseRemoveUseless is executed with incremental inlining. I don't think it's related though but just triggers the bug. Tested with multiple runs of the microbenchmark that triggered the assert and hs-tier1,hs-tier2,hs-tier3,hs-precheckin-comp,jdk-tier1,jdk-tier2,jdk-tier3 (running). Thanks, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8059241 From robbin.ehn at oracle.com Fri Apr 26 12:50:25 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Fri, 26 Apr 2019 14:50:25 +0200 Subject: RFR(m): 8221734: Deoptimize with handshakes In-Reply-To: <2536572e-158f-0913-889b-ef76d6122c79@oracle.com> References: <89b00912-1f84-3458-d53b-fbe6d372affe@oracle.com> <2536572e-158f-0913-889b-ef76d6122c79@oracle.com> Message-ID: <64233d26-fc9f-9eb4-2c83-34186a669832@oracle.com> Hi Dan, thanks for looking at this! On 4/25/19 10:43 PM, Daniel D. Daugherty wrote: >> Code: >> http://cr.openjdk.java.net/~rehn/8221734/v1/webrev/index.html > > Robbin, you'll have to merge with Coleen's recent MutexLocker fix (8222518). Yes, done! > src/hotspot/share/aot/aotCompiledMethod.cpp > ??? L163: bool AOTCompiledMethod::make_not_entrant_helper(int new_state) { > ??? L207: bool AOTCompiledMethod::make_entrant() { > ??????? So the Compiler team is on board with switching from the > ??????? Patching_lock to the CompiledMethod_lock? Patching_lock originally intended to protect code patching AFAIK, is now used for a lot of different unrelated cases. To iterate the compiled methods we need to hold CodeCache_lock, which is a leaf lock. Sometimes we take CodeCache_lock while holding Patching_lock, so we can't take Patching_lock while iterating. By having a new leaf lock for the compiledmethod we can update them while iterating over them. > src/hotspot/share/gc/z/zBarrierSetNMethod.cpp > ??? No comments (copyright year needs update). Fixed. > > src/hotspot/share/gc/z/zNMethod.cpp > ??? No comments (copyright year needs update). Fixed. > > src/hotspot/share/oops/markOop.hpp > ??? L180: ? bool biased_locker_is(JavaThread* thread) const { > ??????? Is this for a different project? > > ??? L181: ??? if (!has_bias_pattern()) { > ??? L182: ????? return false; > ??? L183: ??? } > ??? L184: ??? // If current thread is not the owner it can be unbiased at anytime. > ??? L185: ??? JavaThread* jt = (JavaThread*) ((intptr_t) (mask_bits(value(), > ~(biased_lock_mask_in_place | age_mask_in_place | epoch_mask_in_place)))); > ??????? So you don't want to use this: > > ? ? ? ? ? JavaThread* biased_locker() const { > ? ? ? ?? ?? assert(has_bias_pattern(), "should not call this otherwise"); > ? ? ? ?? ?? return (JavaThread*) ((intptr_t) (mask_bits(value(), > ~(biased_lock_mask_in_place | age_mask_in_place | epoch_mask_in_place)))); > ? ? ? ? ? } > > ??????? because of the assert(). > > ??????? I think biased_locker() and biased_locker_is() both need to work > ??????? with a copy of the markOop so that it can't change dynamically. > ??????? Something like: > > ????????? JavaThread* biased_locker() const { > ??????????? markOop copy = this; > ??????????? assert(copy.has_bias_pattern(), "should not call this otherwise"); > ??????????? return (JavaThread*) ((intptr_t) (copy.mask_bits(value(), > ~(biased_lock_mask_in_place | age_mask_in_place | epoch_mask_in_place)))); > ????????? } > > ????????? bool biased_locker_is(JavaThread* thread) const { > ??????????? markOop copy = this; > ??????????? if (!copy.has_bias_pattern()) { > ????????????? return false; > ??????????? } > ??????????? return copy.biased_locker(); > ????????? } Yes, thanks I did a slightly different fix, since we already have a copy. > > src/hotspot/share/oops/method.cpp > ??? old L104: ? clear_code(false /* don't need a lock */); // from_c/from_i get > set to c2i/i2i > ??????? Is the comment after '//' not useful? Added it back. > > src/hotspot/share/runtime/biasedLocking.cpp > src/hotspot/share/runtime/biasedLocking.hpp > ??? Hmmm... More Biased Locking changes. I didn't take a close > ??? look at these. Yes, please. > > src/hotspot/share/runtime/mutex.hpp > ??? old L65: ?????? special??????? = tty??????????? +?? 1, > ??? new L65: ?????? special??????? = tty??????????? +?? 2, > ??????? Why? CompiledMethod_lock must be under CodeCache_lock. There is no easy way to push locks up, without just hoping testing asserts. This way I only need to look at the new lock. Also the compiler locks are very coarse grained, there should be more locks in here :) > > src/hotspot/share/runtime/mutexLocker.cpp > ??? No comments. > > src/hotspot/share/runtime/mutexLocker.hpp > ??? L34: extern Mutex*?? Patching_lock;?????????????????? // a lock used to > guard code patching of compiled code > ??? L35: extern Mutex*?? CompiledMethod_lock; > ??????? A comment is traditional here... > Fixed. > src/hotspot/share/runtime/synchronizer.cpp > ??? old L1317: ???????? !SafepointSynchronize::is_at_safepoint(), "invariant"); > ??? new L1317: ???????? !Universe::heap()->is_gc_active(), "invariant"); > ??????? Why? If we use handshake fallback path (obsolete in JDK 14) we execute the handshake inside a safepoint. Thus when inflating we can be at a safepoint. > > ? ? L1446: ??????? ResourceMark rm; > ??? L1496: ????? ResourceMark rm; > ??????? Why drop 'Self'? That makes the ResourceMark more expensive.. During the handshake when the VM thread executes the handshake on behalf of the JavaThread, thus inflating the monitor for that thread, meaning Self is not Thread::current(). > src/hotspot/share/services/dtraceAttacher.cpp > ??? No comments (copyright year needs update). Fixed. > > > I don't think I've found anything that's "must fix". Please let me know > if I should re-review: > > ??? src/hotspot/share/runtime/biasedLocking.cpp > ??? src/hotspot/share/runtime/biasedLocking.hpp > ??? src/hotspot/share/runtime/deoptimization.cpp > > because the Biased Locking changes are critical for this project. Yes, the inflation part is trickiest. More eyes is better! I have some small changes to biasedLocking.[h|c}pp which I'll send through some tiers of testing. Please hold of re-exmine that until I'll send out the updated and tested code. Thanks! /Robbin > > Dan > > >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8221734 >> >> Passes t1-7 and multiple t1-5 runs. >> >> A few startup benchmark see a small speedup. >> >> Thanks, Robbin > From robbin.ehn at oracle.com Fri Apr 26 13:11:26 2019 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Fri, 26 Apr 2019 15:11:26 +0200 Subject: RFR(m): 8221734: Deoptimize with handshakes In-Reply-To: References: <89b00912-1f84-3458-d53b-fbe6d372affe@oracle.com> Message-ID: Hi Dean, On 4/25/19 9:39 PM, dean.long at oracle.com wrote: > I don't pretend to understand the biased locking changing :-)? Could you explain > the need for CompiledMethod_lock, and the other lock rank changes? To iterate the compiled methods we need to hold CodeCache_lock. (since we do this outside of a safepoint) To be able to make a compiled method not reentrant it must be protected by a lower ranked lock. So we protect the compiled method with this new lock instead of Patching_lock (higher ranked). > > Replacing > > 1089 if (_method->code() == this) { > 1090 _method->clear_code(); // Break a cycle > 1091 } > > > with > > > 1087?? Method::unlink_code(_method, this); // Break a cycle > > doesn't look equivalent.? The new unlink_code has a stronger check, also > checking the entry point.? I'm not sure that's OK here. Thanks, yes. I had a slightly different version before. I'll go over this again. > > In Method::Method(), don't we want to be able to initialize these fields without > grabbing CompiledMethod_lock? This should just be an uncontended cas. So I went with simplicity, disregarding this small overhead. Let me know what you feel about that. I'll send out a new updated and tested version next week. Thanks! /Robbin > > dl > > On 4/25/19 5:05 AM, Robbin Ehn wrote: >> Hi all, please review. >> >> Let's deopt with handshakes. >> Removed VM op Deoptimize, instead we handshake. >> Locks needs to be inflate since we are not in a safepoint. >> >> Goes on top of: >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-April/033491.html >> >> >> Code: >> http://cr.openjdk.java.net/~rehn/8221734/v1/webrev/index.html >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8221734 >> >> Passes t1-7 and multiple t1-5 runs. >> >> A few startup benchmark see a small speedup. >> >> Thanks, Robbin > From xxinliu at amazon.com Fri Apr 26 07:36:43 2019 From: xxinliu at amazon.com (Liu, Xin) Date: Fri, 26 Apr 2019 07:36:43 +0000 Subject: 8222670 patch review: prevent downgraded tasks from recompiling In-Reply-To: References: <99aae03d0315482c723abda2f2cb530b4b52f82d.camel@redhat.com> Message-ID: <427BC0A9-DAB2-43A3-AF93-F96414EC1E7E@amazon.com> Gently ping. Bug: https://bugs.openjdk.java.net/browse/JDK-8222670 I got the new revision. https://cr.openjdk.java.net/~xliu/8222670/webrev.03/ I finish up test Level2RecompilationTest.java. if you want to start a OSR compilation, you have to specify bci which points to the begin of a BB. Give them bci = 0 is good enough for general cases. Thanks, --lx ?On 4/19/19, 11:19 PM, "Liu, Xin" wrote: hi, Severin, Thanks for reviewing. Yes, it's irrelevant. I revert it. please check it out. https://cr.openjdk.java.net/~xliu/8222670/webrev.02/ Please note that I added an assertion InstanceKlass::add_osr_nmethod(nmethod* n) in this webrev. In my understanding, it is a potential memleak of codecache. If there's no higher level of osr compilation, those dups will stay in codecache forever. Further, it doesn?t make sense to recompile with the same level and same bci. With this assertion, the following tests in tier1-test failed. test/hotspot/jtreg/compiler/intrinsics/unsafe/DirectByteBufferTest.java test/hotspot/jtreg/compiler/intrinsics/unsafe/HeapByteBufferTest.java test/jdk/java/util/stream/test/org/openjdk/tests/java/util/stream/ToArrayOpTest.java test/jdk/tools/pack200/Pack200Test.java test/jdk/java/util/Arrays/SortingNearlySortedPrimitive.java All crashes happen as I described in JDK-8222670. Eg. duplicated OSR compilations occur for level2. Program received signal SIGSEGV, Segmentation fault. # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/instanceKlass.cpp:2972 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/src/src/hotspot/share/oops/instanceKlass.cpp:2972), pid=8347, tid=8361 # assert(prev == __null || !prev->is_in_use()) failed: redundunt OSR recompilation detected. memory leak in CodeCache! # # JRE version: OpenJDK Runtime Environment (13.0) (slowdebug build 13-internal+0-adhoc..src) # Java VM: OpenJDK 64-Bit Server VM (slowdebug 13-internal+0-adhoc..src, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xb3dbb4] InstanceKlass::add_osr_nmethod(nmethod*)+0xc4 # # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /build/JTwork/scratch/hs_err_pid8347.log Program received signal SIGSEGV, Segmentation fault. Compiled method (c1) 19032 752 % 2 ByteBufferTest::stepUsingAccessors @ 382 (633 bytes) total in heap [0x00007fffd8f9ff90,0x00007fffd8fac628] = 50840 relocation [0x00007fffd8fa0110,0x00007fffd8fa1388] = 4728 main code [0x00007fffd8fa13a0,0x00007fffd8fa7f80] = 27616 stub code [0x00007fffd8fa7f80,0x00007fffd8fa86c0] = 1856 oops [0x00007fffd8fa86c0,0x00007fffd8fa86c8] = 8 metadata [0x00007fffd8fa86c8,0x00007fffd8fa8800] = 312 scopes data [0x00007fffd8fa8800,0x00007fffd8fa9ff8] = 6136 scopes pcs [0x00007fffd8fa9ff8,0x00007fffd8fac408] = 9232 dependencies [0x00007fffd8fac408,0x00007fffd8fac418] = 16 nul chk table [0x00007fffd8fac418,0x00007fffd8fac628] = 528 Compiled method (c1) 19032 752 % 2 ByteBufferTest::stepUsingAccessors @ 382 (633 bytes) total in heap [0x00007fffd8f9ff90,0x00007fffd8fac628] = 50840 relocation [0x00007fffd8fa0110,0x00007fffd8fa1388] = 4728 main code [0x00007fffd8fa13a0,0x00007fffd8fa7f80] = 27616 stub code [0x00007fffd8fa7f80,0x00007fffd8fa86c0] = 1856 oops [0x00007fffd8fa86c0,0x00007fffd8fa86c8] = 8 metadata [0x00007fffd8fa86c8,0x00007fffd8fa8800] = 312 scopes data [0x00007fffd8fa8800,0x00007fffd8fa9ff8] = 6136 scopes pcs [0x00007fffd8fa9ff8,0x00007fffd8fac408] = 9232 dependencies [0x00007fffd8fac408,0x00007fffd8fac418] = 16 nul chk table [0x00007fffd8fac418,0x00007fffd8fac628] = 528 Thanks, --lx On 4/19/19, 9:31 AM, "Severin Gehwolf" wrote: On Thu, 2019-04-18 at 19:46 +0000, Liu, Xin wrote: > Hi, hotspot-compiler group, > > Could you review this webrev for JDK-8222670? > https://cr.openjdk.java.net/~xliu/8222670/webrev.01/ +++ new/test/hotspot/jtreg/compiler/tiered/TieredLevelsTest.java 2019-04-18 12:18:38.000000000 -0700 @@ -89,7 +89,7 @@ && actual == COMP_LEVEL_LIMITED_PROFILE) { // for simple method full_profile may be replaced by limited_profile if (IS_VERBOSE) { - System.out.printf("Level check: full profiling was replaced " + System.out.println("Level check: full profiling was replaced " + "by limited profiling. Expected: %d, actual:%d", expected, actual); This seems an unintended change, is it? Thanks, Severin From vladimir.kozlov at oracle.com Fri Apr 26 16:31:29 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 26 Apr 2019 09:31:29 -0700 Subject: [13] 8221592: C2 compilation failed with assert(!q->is_MergeMem()) In-Reply-To: <9a052954-90ea-5d45-37d0-cc6f95c1bc84@oracle.com> References: <9a052954-90ea-5d45-37d0-cc6f95c1bc84@oracle.com> Message-ID: Looks good. Thanks, Vladimir On 4/26/19 5:14 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8221592 > http://cr.openjdk.java.net/~thartmann/8221592/webrev.00/ > > We hit an assert during parsing with incremental inlining when merging memory edges into a target > block because of a MergeMem that has another MergeMem as input. I've traced the MergeMem back to > this code: > http://hg.openjdk.java.net/jdk/jdk/file/47a8fdf84424/src/hotspot/share/opto/parse1.cpp#l1028 > > The memory input of the _exit map is a MergeMem with a useless Phi input that has another MergeMem > as input (see bug comments for details). After calling transform on this slice, the Phi is removed > and we end up with a MergeMem that is then set as input to the original MergeMem. This later > triggers the assert. > > The proposed fix is to transform the original MergeMem after transforming the slices to get rid of > MergeMem inputs. > > The problem only shows up with the fix for JDK-8059241 [1] which reduced the number of times > PhaseRemoveUseless is executed with incremental inlining. I don't think it's related though but just > triggers the bug. > > Tested with multiple runs of the microbenchmark that triggered the assert and > hs-tier1,hs-tier2,hs-tier3,hs-precheckin-comp,jdk-tier1,jdk-tier2,jdk-tier3 (running). > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8059241 > From dean.long at oracle.com Fri Apr 26 17:17:43 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 26 Apr 2019 10:17:43 -0700 Subject: RFR(m): 8221734: Deoptimize with handshakes In-Reply-To: References: <89b00912-1f84-3458-d53b-fbe6d372affe@oracle.com> Message-ID: On 4/26/19 6:11 AM, Robbin Ehn wrote: > Hi Dean, > > On 4/25/19 9:39 PM, dean.long at oracle.com wrote: >> I don't pretend to understand the biased locking changing :-)? Could >> you explain the need for CompiledMethod_lock, and the other lock rank >> changes? > > To iterate the compiled methods we need to hold CodeCache_lock. > (since we do this outside of a safepoint) > To be able to make a compiled method not reentrant it must be > protected by a > lower ranked lock. So we protect the compiled method with this new > lock instead > of Patching_lock (higher ranked). > OK. >> >> Replacing >> >> 1089 if (_method->code() == this) { >> 1090 _method->clear_code(); // Break a cycle >> 1091 } >> >> >> with >> >> >> 1087?? Method::unlink_code(_method, this); // Break a cycle >> >> doesn't look equivalent.? The new unlink_code has a stronger check, >> also checking the entry point.? I'm not sure that's OK here. > > Thanks, yes. I had a slightly different version before. > I'll go over this again. > >> >> In Method::Method(), don't we want to be able to initialize these >> fields without grabbing CompiledMethod_lock? > > This should just be an uncontended cas. I don't see why the cas on a global lock would be uncontended. > So I went with simplicity, disregarding this small overhead. > Let me know what you feel about that. > I'd rather there be no locking here, like the original code. > I'll send out a new updated and tested version next week. > Sounds good. dl > Thanks! > > /Robbin > >> >> dl >> >> On 4/25/19 5:05 AM, Robbin Ehn wrote: >>> Hi all, please review. >>> >>> Let's deopt with handshakes. >>> Removed VM op Deoptimize, instead we handshake. >>> Locks needs to be inflate since we are not in a safepoint. >>> >>> Goes on top of: >>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-April/033491.html >>> >>> >>> Code: >>> http://cr.openjdk.java.net/~rehn/8221734/v1/webrev/index.html >>> Issue: >>> https://bugs.openjdk.java.net/browse/JDK-8221734 >>> >>> Passes t1-7 and multiple t1-5 runs. >>> >>> A few startup benchmark see a small speedup. >>> >>> Thanks, Robbin >> From dean.long at oracle.com Fri Apr 26 19:09:38 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 26 Apr 2019 12:09:38 -0700 Subject: RFR(S) 8218700: infinite loop in HotSpotJVMCIMetaAccessContext.fromClass after OutOfMemoryError Message-ID: <53bcf718-e543-d40c-5486-58b98f66bcee@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8218700 http://cr.openjdk.java.net/~dlong/8218700/webrev.2/ If we throw an OutOfMemoryError in the right place (see JDK-8222941), HotSpotJVMCIMetaAccessContext.fromClass can go into an infinite loop calling ClassValue.remove.? To work around the problem, reset the value in a mutable cell instead of calling remove. dl From vladimir.x.ivanov at oracle.com Sat Apr 27 02:30:57 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 26 Apr 2019 19:30:57 -0700 Subject: [13] RFR: 8202414: Unsafe write after primitive array creation may result in array length change In-Reply-To: <73f7c647-3194-2a65-6cc6-a15cbf6c82be@oracle.com> References: <7e900022-4e16-2ab9-1f4d-89e1510e2646@oracle.com> <392c665f-869c-29af-4fc5-e6f844820846@oracle.com> <3db5d7ab-ad99-310b-e891-fc36d25da338@oracle.com> <7b03a213-7fee-a87f-b48d-250662e730ef@oracle.com> <959abf54-d1da-95ee-9cf6-6c6d8ec5e4a1@oracle.com> <18115aa8-edaa-31b9-02a6-06721d9fbfc9@oracle.com> <939f3f5d-b8e7-939f-8953-d34a0f3ff6c9@oracle.com> <259ef902-778b-7eef-46e2-d1927950d21c@oracle.com> <73f7c647-3194-2a65-6cc6-a15cbf6c82be@oracle.com> Message-ID: <37837126-c9d5-1bb1-fc9a-6fb9b848efbe@oracle.com> >> I'd just add "if (st->is_mismatched_access()) { return FAIL; }" in >> IN::can_capture_store and move the rest into >> IN::captured_store_insertion_point(). >> > Yes, I will add following change also along with any final fix. > > ?intptr_t InitializeNode::can_capture_store(StoreNode* st, > PhaseTransform* phase, bool can_reshape) { > ?? const int FAIL = 0; > -? if (st->is_unaligned_access()) { > +? if (st->is_unaligned_access() || st->is_mismatched_access()) { > ???? return FAIL; > ?? } Looks good. >> It looks like InitializeNode::captured_store_insertion_point() is a >> better place to inspect get_store_offset() and fail on mismatch. It >> also handles negative values returned by get_store_offset() and has a >> consistent handling of "size_in_bytes == 0" case. >> > (i) At first tried following *wrong* change. > (confirmed this is not a fix for the issue) > > ?int InitializeNode::captured_store_insertion_point(intptr_t start, > ??????????????????????????????????????????????????? int size_in_bytes, > ??????????????????????????????????????????????????? PhaseTransform* > phase) { > ? ........... > ?? for (uint i = InitializeNode::RawStores, limit = req(); ; ) { > ???? if (i >= limit)? return -(int)i; // not found; here is where to > put it > > ???? Node*??? st???? = in(i); > ???? intptr_t st_off = get_store_offset(st, phase); > -??? if (st_off < 0) { > +??? if ((size_in_bytes != 0) && (st_off % size_in_bytes) != 0) { > +????? return FAIL; > +??? } else if (st_off < 0) { > ?????? if (st != zero_memory()) { > ?? ............ > > > > (ii) When working for correct fix in captured_store_insertion_point(), > (was looking to get required captured store / get_store_offset value in) > found that following additions in captured_store_insertion_point() > actually fixes reported issue; also could not find any other issue with > testing. > > ............ > ?? if (start >= ti_limit)? return FAIL; > > +? if ((size_in_bytes != 0) && (start % size_in_bytes) != 0) { > +??? return FAIL; > +? } > + > ?? for (uint i = InitializeNode::RawStores, limit = req(); ; ) { > ........... > > > > (iii) So if above (ii) is a legal/valid fix, then again can the best > location of the fix be as following in can_capture_store() itself? > > intptr_t InitializeNode::can_capture_store(StoreNode* st, > PhaseTransform* phase, bool can_reshape) { > ?? ................... > ?? AllocateNode* alloc = AllocateNode::Ideal_allocation(adr, phase, > offset); > ?? if (alloc == NULL) > ???? return FAIL;??????????????? // inscrutable address > ?? if (alloc != allocation()) > ???? return FAIL;??????????????? // wrong allocation!? (store needs to > float up) > +? int size_in_bytes = st->memory_size(); > +? if ((size_in_bytes != 0) && (offset % size_in_bytes) != 0) { > +??? return FAIL; > +? } > ?? Node* val = st->in(MemNode::ValueIn); > ?? ........... > ?? return offset;??????????????? // success > ?} > > > > (iv) OR? something like following in capture_store() ? > ?Node* InitializeNode::capture_store(StoreNode* st, intptr_t start, > ???????????????????????????????????? PhaseTransform* phase, bool > can_reshape) { > ?? assert(stores_are_sane(phase), ""); > > ?? if (start < 0)? return NULL; > ?? assert(can_capture_store(st, phase, can_reshape) == start, "sanity"); > > +? int size_in_bytes = st->memory_size(); > +? if ((size_in_bytes != 0) && (start % size_in_bytes) != 0) { > +??? return NULL; > +? } > ?? Compile* C = phase->C; > -? int size_in_bytes = st->memory_size(); > ?? int i = captured_store_insertion_point(start, size_in_bytes, phase); > ?? if (i == 0)? return NULL;???? // bail out > ?? .................. > > > > (v) Please tell me if I am missing something and none above is valid > fix. Then should work for a correct fix in > captured_store_insertion_point(). You are right, I missed that IN::captured_store_insertion_point() inspects already other stores which are already captured. Sorry for the confusion. I agree that IN::can_capture_store() is the right place to put the fix in and I like (iii). (Just add a comment, "// mismatched access" is enough) Best regards, Vladimir Ivanov >> >> PS: Frankly speaking, I'm not comfortable with current handling of >> "size_in_bytes == 0" case: >> >> // If size_in_bytes is zero, do not bother with overlap checks. >> int InitializeNode::captured_store_insertion_point(intptr_t start, >> ??????????????????????????????????????????????????? int size_in_bytes, >> ??????????????????????????????????????????????????? PhaseTransform* >> phase) { >> >> "size_in_bytes == 0" is the case for StoreVectorNodes. I believe the >> logic is there to accommodate bulk initialization with vector stores >> (haven't found the proofs in the code though), but it looks fragile >> (especially when vector operations become more common). It would be >> nice to enable overlap checks for vector operations. >> >> But I'm perfectly fine with handling that separately. >> > Yes, I will open a new JBS task to address this. > > > Thanks, > Rahul > >> Best regards, >> Vladimir Ivanov From vladimir.x.ivanov at oracle.com Sat Apr 27 06:18:49 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 26 Apr 2019 23:18:49 -0700 Subject: [13] RFR: 8202414: Unsafe write after primitive array creation may result in array length change In-Reply-To: <37837126-c9d5-1bb1-fc9a-6fb9b848efbe@oracle.com> References: <7e900022-4e16-2ab9-1f4d-89e1510e2646@oracle.com> <392c665f-869c-29af-4fc5-e6f844820846@oracle.com> <3db5d7ab-ad99-310b-e891-fc36d25da338@oracle.com> <7b03a213-7fee-a87f-b48d-250662e730ef@oracle.com> <959abf54-d1da-95ee-9cf6-6c6d8ec5e4a1@oracle.com> <18115aa8-edaa-31b9-02a6-06721d9fbfc9@oracle.com> <939f3f5d-b8e7-939f-8953-d34a0f3ff6c9@oracle.com> <259ef902-778b-7eef-46e2-d1927950d21c@oracle.com> <73f7c647-3194-2a65-6cc6-a15cbf6c82be@oracle.com> <37837126-c9d5-1bb1-fc9a-6fb9b848efbe@oracle.com> Message-ID: <28955bc6-020a-29e1-953c-e9f48932cd56@oracle.com> On 26/04/2019 19:30, Vladimir Ivanov wrote: > >>> I'd just add "if (st->is_mismatched_access()) { return FAIL; }" in >>> IN::can_capture_store and move the rest into >>> IN::captured_store_insertion_point(). >>> >> Yes, I will add following change also along with any final fix. >> >> ??intptr_t InitializeNode::can_capture_store(StoreNode* st, >> PhaseTransform* phase, bool can_reshape) { >> ??? const int FAIL = 0; >> -? if (st->is_unaligned_access()) { >> +? if (st->is_unaligned_access() || st->is_mismatched_access()) { >> ????? return FAIL; >> ??? } > > Looks good. After thinking more about it, I believe new offset alignment check supersedes is_unaligned_access(). And is_mismatched_access() is too conservative here: what is_mismatched_access() adds here (in addition to existing alignment & size checks) is whether type match between location and stored value, but what matters for IN are sizes and offsets only. Type mismatches (e.g., byte vs boolean, char vs short) may cause problems when consequent loads are replaced with values from initializing stores, but it should be already handled in MemNode::can_see_stored_value() and Load?Node::Ideal(). So, it seems both checks (is_unaligned_access() & is_mismatched_access()) can be safely omitted. Best regards, Vladimir Ivanov >>> It looks like InitializeNode::captured_store_insertion_point() is a >>> better place to inspect get_store_offset() and fail on mismatch. It >>> also handles negative values returned by get_store_offset() and has a >>> consistent handling of "size_in_bytes == 0" case. >>> >> (i) At first tried following *wrong* change. >> (confirmed this is not a fix for the issue) >> >> ??int InitializeNode::captured_store_insertion_point(intptr_t start, >> ???????????????????????????????????????????????????? int size_in_bytes, >> ???????????????????????????????????????????????????? PhaseTransform* >> phase) { >> ?? ........... >> ??? for (uint i = InitializeNode::RawStores, limit = req(); ; ) { >> ????? if (i >= limit)? return -(int)i; // not found; here is where to >> put it >> >> ????? Node*??? st???? = in(i); >> ????? intptr_t st_off = get_store_offset(st, phase); >> -??? if (st_off < 0) { >> +??? if ((size_in_bytes != 0) && (st_off % size_in_bytes) != 0) { >> +????? return FAIL; >> +??? } else if (st_off < 0) { >> ??????? if (st != zero_memory()) { >> ??? ............ >> >> >> >> (ii) When working for correct fix in captured_store_insertion_point(), >> (was looking to get required captured store / get_store_offset value in) >> found that following additions in captured_store_insertion_point() >> actually fixes reported issue; also could not find any other issue >> with testing. >> >> ............ >> ??? if (start >= ti_limit)? return FAIL; >> >> +? if ((size_in_bytes != 0) && (start % size_in_bytes) != 0) { >> +??? return FAIL; >> +? } >> + >> ??? for (uint i = InitializeNode::RawStores, limit = req(); ; ) { >> ........... >> >> >> >> (iii) So if above (ii) is a legal/valid fix, then again can the best >> location of the fix be as following in can_capture_store() itself? >> >> intptr_t InitializeNode::can_capture_store(StoreNode* st, >> PhaseTransform* phase, bool can_reshape) { >> ??? ................... >> ??? AllocateNode* alloc = AllocateNode::Ideal_allocation(adr, phase, >> offset); >> ??? if (alloc == NULL) >> ????? return FAIL;??????????????? // inscrutable address >> ??? if (alloc != allocation()) >> ????? return FAIL;??????????????? // wrong allocation!? (store needs >> to float up) >> +? int size_in_bytes = st->memory_size(); >> +? if ((size_in_bytes != 0) && (offset % size_in_bytes) != 0) { >> +??? return FAIL; >> +? } >> ??? Node* val = st->in(MemNode::ValueIn); >> ??? ........... >> ??? return offset;??????????????? // success >> ??} >> >> >> >> (iv) OR? something like following in capture_store() ? >> ??Node* InitializeNode::capture_store(StoreNode* st, intptr_t start, >> ????????????????????????????????????? PhaseTransform* phase, bool >> can_reshape) { >> ??? assert(stores_are_sane(phase), ""); >> >> ??? if (start < 0)? return NULL; >> ??? assert(can_capture_store(st, phase, can_reshape) == start, "sanity"); >> >> +? int size_in_bytes = st->memory_size(); >> +? if ((size_in_bytes != 0) && (start % size_in_bytes) != 0) { >> +??? return NULL; >> +? } >> ??? Compile* C = phase->C; >> -? int size_in_bytes = st->memory_size(); >> ??? int i = captured_store_insertion_point(start, size_in_bytes, phase); >> ??? if (i == 0)? return NULL;???? // bail out >> ??? .................. >> >> >> >> (v) Please tell me if I am missing something and none above is valid >> fix. Then should work for a correct fix in >> captured_store_insertion_point(). > > You are right, I missed that IN::captured_store_insertion_point() > inspects already other stores which are already captured. Sorry for the > confusion. > > I agree that IN::can_capture_store() is the right place to put the fix > in and I like (iii). (Just add a comment, "// mismatched access" is enough) > > Best regards, > Vladimir Ivanov > >>> >>> PS: Frankly speaking, I'm not comfortable with current handling of >>> "size_in_bytes == 0" case: >>> >>> // If size_in_bytes is zero, do not bother with overlap checks. >>> int InitializeNode::captured_store_insertion_point(intptr_t start, >>> ??????????????????????????????????????????????????? int size_in_bytes, >>> ??????????????????????????????????????????????????? PhaseTransform* >>> phase) { >>> >>> "size_in_bytes == 0" is the case for StoreVectorNodes. I believe the >>> logic is there to accommodate bulk initialization with vector stores >>> (haven't found the proofs in the code though), but it looks fragile >>> (especially when vector operations become more common). It would be >>> nice to enable overlap checks for vector operations. >>> >>> But I'm perfectly fine with handling that separately. >>> >> Yes, I will open a new JBS task to address this. >> >> >> Thanks, >> Rahul >> >>> Best regards, >>> Vladimir Ivanov From felix.yang at huawei.com Sun Apr 28 01:44:30 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Sun, 28 Apr 2019 01:44:30 +0000 Subject: RFR: 8223020: aarch64: expand minI_rReg and maxI_rReg patterns into separate instructions Message-ID: Hi, Please review: JBS: https://bugs.openjdk.java.net/browse/JDK-8223020 Webrev: http://cr.openjdk.java.net/~fyang/8223020/webrev.00 Currently, two instructions will be emitted for minI_rReg/maxI_rReg patterns: cmpw + cselw. As these two instructions are always emitted together, the GCM (Global Code Motion) phase will not be able to schedule them independently. Patch expands minI_rReg and maxI_rReg patterns into separate instructions. For the small test case on the JBS, GCM can do a better schedule with this change. Jtreg tested with a fastdebug aarch64 build. The test also reveals another issue: minI_rReg and maxI_rReg patterns are not taking advantage of the aarch64 zero register, so for maxI_rReg pattern we see code like this: 0x0000ffffa34148f0: mov w12, wzr <======== 0x0000ffffa34148f4: cmp w10, w12 <======== 0x0000ffffa34148f8: csel w10, w10, w12, gt <======== which can be further simplified into: 0x0000ffffa34148f4: cmp w10, wzr <======== 0x0000ffffa34148f8: csel w10, w10, wzr, gt <======== But looks like it's hard to find a test case to trigger this issue for minI_rReg pattern. I can fix this for maxI_rReg pattern with another patch if necessary. Thanks, Felix From nils.eliasson at oracle.com Sun Apr 28 09:11:01 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Sun, 28 Apr 2019 11:11:01 +0200 Subject: [13] 8221592: C2 compilation failed with assert(!q->is_MergeMem()) In-Reply-To: References: <9a052954-90ea-5d45-37d0-cc6f95c1bc84@oracle.com> Message-ID: +1 // Nils > > Thanks, > Vladimir > > On 4/26/19 5:14 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8221592 >> http://cr.openjdk.java.net/~thartmann/8221592/webrev.00/ >> >> We hit an assert during parsing with incremental inlining when >> merging memory edges into a target >> block because of a MergeMem that has another MergeMem as input. I've >> traced the MergeMem back to >> this code: >> http://hg.openjdk.java.net/jdk/jdk/file/47a8fdf84424/src/hotspot/share/opto/parse1.cpp#l1028 >> >> >> The memory input of the _exit map is a MergeMem with a useless Phi >> input that has another MergeMem >> as input (see bug comments for details). After calling transform on >> this slice, the Phi is removed >> and we end up with a MergeMem that is then set as input to the >> original MergeMem. This later >> triggers the assert. >> >> The proposed fix is to transform the original MergeMem after >> transforming the slices to get rid of >> MergeMem inputs. >> >> The problem only shows up with the fix for JDK-8059241 [1] which >> reduced the number of times >> PhaseRemoveUseless is executed with incremental inlining. I don't >> think it's related though but just >> triggers the bug. >> >> Tested with multiple runs of the microbenchmark that triggered the >> assert and >> hs-tier1,hs-tier2,hs-tier3,hs-precheckin-comp,jdk-tier1,jdk-tier2,jdk-tier3 >> (running). >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8059241 >> From tobias.hartmann at oracle.com Mon Apr 29 07:01:24 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 29 Apr 2019 09:01:24 +0200 Subject: [13] 8221592: C2 compilation failed with assert(!q->is_MergeMem()) In-Reply-To: References: <9a052954-90ea-5d45-37d0-cc6f95c1bc84@oracle.com> Message-ID: Thanks Vladimir. Best regards, Tobias On 26.04.19 18:31, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 4/26/19 5:14 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8221592 >> http://cr.openjdk.java.net/~thartmann/8221592/webrev.00/ >> >> We hit an assert during parsing with incremental inlining when merging memory edges into a target >> block because of a MergeMem that has another MergeMem as input. I've traced the MergeMem back to >> this code: >> http://hg.openjdk.java.net/jdk/jdk/file/47a8fdf84424/src/hotspot/share/opto/parse1.cpp#l1028 >> >> The memory input of the _exit map is a MergeMem with a useless Phi input that has another MergeMem >> as input (see bug comments for details). After calling transform on this slice, the Phi is removed >> and we end up with a MergeMem that is then set as input to the original MergeMem. This later >> triggers the assert. >> >> The proposed fix is to transform the original MergeMem after transforming the slices to get rid of >> MergeMem inputs. >> >> The problem only shows up with the fix for JDK-8059241 [1] which reduced the number of times >> PhaseRemoveUseless is executed with incremental inlining. I don't think it's related though but just >> triggers the bug. >> >> Tested with multiple runs of the microbenchmark that triggered the assert and >> hs-tier1,hs-tier2,hs-tier3,hs-precheckin-comp,jdk-tier1,jdk-tier2,jdk-tier3 (running). >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8059241 >> From tobias.hartmann at oracle.com Mon Apr 29 07:01:47 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 29 Apr 2019 09:01:47 +0200 Subject: [13] 8221592: C2 compilation failed with assert(!q->is_MergeMem()) In-Reply-To: References: <9a052954-90ea-5d45-37d0-cc6f95c1bc84@oracle.com> Message-ID: <9a2d52c8-5291-c5e1-0999-9c03c717ab8f@oracle.com> Thanks Nils. Best regards, Tobias On 28.04.19 11:11, Nils Eliasson wrote: > +1 > > // Nils > >> >> Thanks, >> Vladimir >> >> On 4/26/19 5:14 AM, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8221592 >>> http://cr.openjdk.java.net/~thartmann/8221592/webrev.00/ >>> >>> We hit an assert during parsing with incremental inlining when merging memory edges into a target >>> block because of a MergeMem that has another MergeMem as input. I've traced the MergeMem back to >>> this code: >>> http://hg.openjdk.java.net/jdk/jdk/file/47a8fdf84424/src/hotspot/share/opto/parse1.cpp#l1028 >>> >>> The memory input of the _exit map is a MergeMem with a useless Phi input that has another MergeMem >>> as input (see bug comments for details). After calling transform on this slice, the Phi is removed >>> and we end up with a MergeMem that is then set as input to the original MergeMem. This later >>> triggers the assert. >>> >>> The proposed fix is to transform the original MergeMem after transforming the slices to get rid of >>> MergeMem inputs. >>> >>> The problem only shows up with the fix for JDK-8059241 [1] which reduced the number of times >>> PhaseRemoveUseless is executed with incremental inlining. I don't think it's related though but just >>> triggers the bug. >>> >>> Tested with multiple runs of the microbenchmark that triggered the assert and >>> hs-tier1,hs-tier2,hs-tier3,hs-precheckin-comp,jdk-tier1,jdk-tier2,jdk-tier3 (running). >>> >>> Thanks, >>> Tobias >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8059241 >>> From vladimir.x.ivanov at oracle.com Mon Apr 29 07:15:59 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 29 Apr 2019 00:15:59 -0700 Subject: [13] 8221592: C2 compilation failed with assert(!q->is_MergeMem()) In-Reply-To: <9a052954-90ea-5d45-37d0-cc6f95c1bc84@oracle.com> References: <9a052954-90ea-5d45-37d0-cc6f95c1bc84@oracle.com> Message-ID: src/hotspot/share/opto/parse1.cpp: + _gvn.transform(_exits.merged_memory()); I'm curious why the node isn't put on worklist when it is created/modified. It seems it should have solved the problem. Also, the code around usually does it a bit differently. For example, Parse::do_all_blocks(): Node* result = _gvn.transform_no_reclaim(control()); ... if (result != top()) { record_for_igvn(result); Best regards, Vladimir Ivanov On 26/04/2019 05:14, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8221592 > http://cr.openjdk.java.net/~thartmann/8221592/webrev.00/ > > We hit an assert during parsing with incremental inlining when merging memory edges into a target > block because of a MergeMem that has another MergeMem as input. I've traced the MergeMem back to > this code: > http://hg.openjdk.java.net/jdk/jdk/file/47a8fdf84424/src/hotspot/share/opto/parse1.cpp#l1028 > > The memory input of the _exit map is a MergeMem with a useless Phi input that has another MergeMem > as input (see bug comments for details). After calling transform on this slice, the Phi is removed > and we end up with a MergeMem that is then set as input to the original MergeMem. This later > triggers the assert. > > The proposed fix is to transform the original MergeMem after transforming the slices to get rid of > MergeMem inputs. > > The problem only shows up with the fix for JDK-8059241 [1] which reduced the number of times > PhaseRemoveUseless is executed with incremental inlining. I don't think it's related though but just > triggers the bug. > > Tested with multiple runs of the microbenchmark that triggered the assert and > hs-tier1,hs-tier2,hs-tier3,hs-precheckin-comp,jdk-tier1,jdk-tier2,jdk-tier3 (running). > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8059241 > From tobias.hartmann at oracle.com Mon Apr 29 09:08:15 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 29 Apr 2019 11:08:15 +0200 Subject: [13] 8221592: C2 compilation failed with assert(!q->is_MergeMem()) In-Reply-To: References: <9a052954-90ea-5d45-37d0-cc6f95c1bc84@oracle.com> Message-ID: Hi Vladimir, Sorry, I've already pushed this patch before you've sent the email (but for some reason, the hgupdater didn't pick it up yet - maybe it's still down for maintenance). On 29.04.19 09:15, Vladimir Ivanov wrote: > src/hotspot/share/opto/parse1.cpp: > > +? _gvn.transform(_exits.merged_memory()); > > I'm curious why the node isn't put on worklist when it is created/modified. It seems it should have > solved the problem. The 41744 Phi (see IR below) is created in Parse::build_exits() and intentionally not transformed until do_exits() as this comment suggests: http://hg.openjdk.java.net/jdk/jdk/file/2f4393ec54d4/src/hotspot/share/opto/parse1.cpp#l771 The MergeMem input is then added in return_current(): http://hg.openjdk.java.net/jdk/jdk/file/2f4393ec54d4/src/hotspot/share/opto/parse1.cpp#l2196 Putting the _exits.merged_memory() node on the igvn worklist during creation wouldn't help because we don't execute a round of igvn between parsing of the inlined callee (where build_exits() creates the node) and parsing of the caller (where the assert happens). > Also, the code around usually does it a bit differently. For example, Parse::do_all_blocks(): > > ??????? Node* result = _gvn.transform_no_reclaim(control()); > ????... > ??????? if (result != top()) { > ????????? record_for_igvn(result); Do you mean that I should record the transformed _exits.merged_memory() for igvn in do_exits()? I don't think that's necessary. Thanks, Tobias 37057 MergeMem === _ 1 37093 37094 1 1 37095 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 37096 1 1 37097 1 1 1 1 1 37098 37099 37100 37101 37102 37103 37104 37105 [[ 37058 37064 37070 37077 37083 19776 19759 19760 4968 4968 18862 41708 41716 41718 41726 41730 41738 41740 41748 41744 ]] ... 41744 Phi === 41742 37057 [[ 41745 ]] #memory Memory: @BotPTR *+bot, idx=Bot; ... 41745 MergeMem === _ 1 41744 41751 1 1 41752 1 1 1 1 1 1 1 41753 1 1 1 1 1 1 1 1 1 1 1 41754 [[ 41741 ]] ... From aph at redhat.com Mon Apr 29 09:20:01 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 29 Apr 2019 10:20:01 +0100 Subject: [aarch64-port-dev ] RFR: 8223020: aarch64: expand minI_rReg and maxI_rReg patterns into separate instructions In-Reply-To: References: Message-ID: <3ce0d4a5-a1df-d03d-3224-21b638f50f45@redhat.com> On 4/28/19 2:44 AM, Yangfei (Felix) wrote: > JBS: https://bugs.openjdk.java.net/browse/JDK-8223020 > Webrev: http://cr.openjdk.java.net/~fyang/8223020/webrev.00 > > Currently, two instructions will be emitted for minI_rReg/maxI_rReg patterns: cmpw + cselw. > As these two instructions are always emitted together, the GCM (Global Code Motion) phase will > not be able to schedule them independently. Patch expands minI_rReg and maxI_rReg patterns > into separate instructions. For the small test case on the JBS, GCM can do a better schedule with > this change. OK. I imagine that there are other patterns which might benefit from doing something similar. I don't know if you are intending to do more of this kind of change, but if you are please group them together. It might be worth also producing a list beforehand and publishing that list here. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Mon Apr 29 13:24:47 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 29 Apr 2019 15:24:47 +0200 Subject: [13] RFR(S): 8219807: C2 crash in IfNode::up_one_dom(Node*, bool) Message-ID: <116a47d4-7e49-0d20-0ae3-fba85ea1f513@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8219807 http://cr.openjdk.java.net/~thartmann/8219807/webrev.00/ We crash in IfNode::up_one_dom() while walking up the dominator tree because two regions (din3 and din4) are degraded to copies (for details see the bug comments). The control inputs of these regions were set to NULL in RegionNode::Ideal() but the nodes were not removed because we were still parsing. Both regions are on the igvn worklist and will be replaced but the IfNode is processed first and therefore encounters the NULL control inputs. I think the code should guard against that. The bug is very old and reproduces since JDK 8 with the regression test. I will run extended testing as soon as Mach5 is up again. Thanks, Tobias From fujie at loongson.cn Mon Apr 29 13:43:06 2019 From: fujie at loongson.cn (Jie Fu) Date: Mon, 29 Apr 2019 21:43:06 +0800 Subject: RFR: 8221542: ~15% performance degradation due to less optimized inline decision In-Reply-To: <1a398a1f-ed52-2197-5886-d9d5fd872974@loongson.cn> References: <6aebd883-0be7-0b05-5364-262e138a1fbc@loongson.cn> <182d87da-0d99-3f33-fbe7-ef5818be0422@loongson.cn> <0936427d-f4d2-299a-87ce-860dce5e57e1@loongson.cn> <574d59f5-3437-738f-e10c-796dcb02b42e@oracle.com> <5275854c-ab35-f160-f6f0-6ab9ac86e3d0@loongson.cn> <8bc507fe-b6db-d697-8821-0547860de232@oracle.com> <1a398a1f-ed52-2197-5886-d9d5fd872974@loongson.cn> Message-ID: <5607f7ca-57b9-b409-3bce-efc1688f0678@loongson.cn> Hi all, May I have another review for this change [1] to finalize the fix? Thanks a lot. Best regards, Jie [1] http://cr.openjdk.java.net/~vlivanov/jiefu/8221542/webrev.02/ On 2019?04?20? 11:35, Jie Fu wrote: > Ah, I got it. > I like your patch and benefit a lot from you. > Thank you so much, Vladimir. > > Any comments from other reviewers? > Thanks. > > Best regards, > Jie > > On 2019/4/20 ??11:18, Vladimir Ivanov wrote: >> >>>> After some explorations I decided to keep original behavior for >>>> immature profiles (profile.count == -1). >>> >>> I agree. >>> >>> I have two questions here. >>> >>> 1. What's the difference of the following two if statements? >>> ------------------------------------------------- >>> + if (!callee_method->was_executed_more_than(0)) return true; // >>> callee was never executed >>> + >>> + if (caller_method->is_not_reached(caller_bci)) return true; // >>> call site not resolved >>> ------------------------------------------------- >>> I think only one of them is needed. >> >> The checks are complimentary: one inspects callee and the other looks >> at call site. >> >> "!callee_method->was_executed_more_than(0)" ensures that callee was >> executed at least once. >> >> "caller_method->is_not_reached(caller_bci)" inspects the state of the >> call site. If corresponding CP entry is not resolved, then the call >> site isn't reached. If is_not_reached() returns false, it's not a >> definitive answer: there's still a chance the site is not reached - >> consider the case of virtual calls where callee_method may differ for >> the same resolved method. >> >>> 2. Does the assert in InlineTree::is_not_reached(...) make sense? >>> Since we have >>> ------------------------------------------------- >>> if (profile.count() > 0) return false; // reachable according to >>> profile >>> ------------------------------------------------- >>> and >>> ------------------------------------------------- >>> if (profile.count() == -1) {...} >>> ------------------------------------------------- >>> before >>> ------------------------------------------------- >>> assert(profile.count() == 0, "sanity"); >>> ------------------------------------------------- >>> is the assert redundant? >> >> Asserts are intended to be redundant :-) But still catch bugs from >> time to time. >> >> This one, in particular, checks invariant on profile.count() >= -1 >> (which is not very useful by itself), but also stresses that >> "profile.count() == 0" case is being processed. >> >> Best regards, >> Vladimir Ivanov > From vladimir.kozlov at oracle.com Mon Apr 29 15:47:56 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 29 Apr 2019 08:47:56 -0700 Subject: [13] RFR(S): 8219807: C2 crash in IfNode::up_one_dom(Node*, bool) In-Reply-To: <116a47d4-7e49-0d20-0ae3-fba85ea1f513@oracle.com> References: <116a47d4-7e49-0d20-0ae3-fba85ea1f513@oracle.com> Message-ID: <293f4357-0c51-7d62-19ea-815802beb6b1@oracle.com> Please, add comment. Otherwise it is good. Thanks, Vladimir On 4/29/19 6:24 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8219807 > http://cr.openjdk.java.net/~thartmann/8219807/webrev.00/ > > We crash in IfNode::up_one_dom() while walking up the dominator tree because two regions (din3 and > din4) are degraded to copies (for details see the bug comments). The control inputs of these regions > were set to NULL in RegionNode::Ideal() but the nodes were not removed because we were still parsing. > > Both regions are on the igvn worklist and will be replaced but the IfNode is processed first and > therefore encounters the NULL control inputs. I think the code should guard against that. The bug is > very old and reproduces since JDK 8 with the regression test. > > I will run extended testing as soon as Mach5 is up again. > > Thanks, > Tobias > From doug.simon at oracle.com Mon Apr 29 17:57:15 2019 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 29 Apr 2019 19:57:15 +0200 Subject: RFR(M) 8219403: JVMCIRuntime::adjust_comp_level should be replaced In-Reply-To: <0c7df5d0-7638-7b44-65fd-18fe1279a0b3@oracle.com> References: <3f0271ca-b8e6-dfcb-8787-8c36f49265fe@oracle.com> <91f3d9be-5612-2556-ae77-a60102fc07ae@oracle.com> <0c7df5d0-7638-7b44-65fd-18fe1279a0b3@oracle.com> Message-ID: <88442FF7-4E94-4F34-9FE6-8507672C2FAE@oracle.com> Looks good to me. -Doug > On 26 Apr 2019, at 00:00, dean.long at oracle.com wrote: > > Fixed. Thanks for the review. > > dl > > On 4/25/19 1:09 PM, Vladimir Kozlov wrote: >> In general looks good. Only minor issue: >> >> in jvmciJavaClasses.hpp indent of '\' is not adjusted in changes line. >> >> Thanks, >> Vladimir >> >> On 4/25/19 11:53 AM, dean.long at oracle.com wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8219403 >>> http://cr.openjdk.java.net/~dlong/8219403/webrev.2/ >>> >>> This change removes the problematic JVMCIRuntime::adjust_comp_level. It >>> is based on previous work in Graal, graal-jvmci-8, and Metropolis by Tom, >>> Doug, and Vladimir. >>> >>> I also problem-listed several tests that were causing noise in the test results. >>> >>> dl > From tom.rodriguez at oracle.com Mon Apr 29 20:21:52 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 29 Apr 2019 13:21:52 -0700 Subject: RFR(M) 8219403: JVMCIRuntime::adjust_comp_level should be replaced In-Reply-To: <3f0271ca-b8e6-dfcb-8787-8c36f49265fe@oracle.com> References: <3f0271ca-b8e6-dfcb-8787-8c36f49265fe@oracle.com> Message-ID: Looks good. tom dean.long at oracle.com wrote on 4/25/19 11:53 AM: > https://bugs.openjdk.java.net/browse/JDK-8219403 > http://cr.openjdk.java.net/~dlong/8219403/webrev.2/ > > This change removes the problematic JVMCIRuntime::adjust_comp_level.? It > is based on previous work in Graal, graal-jvmci-8, and Metropolis by Tom, > Doug, and Vladimir. > > I also problem-listed several tests that were causing noise in the test > results. > > dl From dean.long at oracle.com Tue Apr 30 03:29:19 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 29 Apr 2019 20:29:19 -0700 Subject: RFR(M) 8219403: JVMCIRuntime::adjust_comp_level should be replaced In-Reply-To: References: <3f0271ca-b8e6-dfcb-8787-8c36f49265fe@oracle.com> Message-ID: Thanks Tom and Doug for the reviews. dl On 4/29/19 1:21 PM, Tom Rodriguez wrote: > Looks good. > > tom > > dean.long at oracle.com wrote on 4/25/19 11:53 AM: >> https://bugs.openjdk.java.net/browse/JDK-8219403 >> http://cr.openjdk.java.net/~dlong/8219403/webrev.2/ >> >> This change removes the problematic JVMCIRuntime::adjust_comp_level.? It >> is based on previous work in Graal, graal-jvmci-8, and Metropolis by >> Tom, >> Doug, and Vladimir. >> >> I also problem-listed several tests that were causing noise in the >> test results. >> >> dl From rahul.v.raghavan at oracle.com Tue Apr 30 07:04:44 2019 From: rahul.v.raghavan at oracle.com (Rahul Raghavan) Date: Tue, 30 Apr 2019 12:34:44 +0530 Subject: [13] RFR: 8202414: Unsafe write after primitive array creation may result in array length change In-Reply-To: <28955bc6-020a-29e1-953c-e9f48932cd56@oracle.com> References: <7e900022-4e16-2ab9-1f4d-89e1510e2646@oracle.com> <392c665f-869c-29af-4fc5-e6f844820846@oracle.com> <3db5d7ab-ad99-310b-e891-fc36d25da338@oracle.com> <7b03a213-7fee-a87f-b48d-250662e730ef@oracle.com> <959abf54-d1da-95ee-9cf6-6c6d8ec5e4a1@oracle.com> <18115aa8-edaa-31b9-02a6-06721d9fbfc9@oracle.com> <939f3f5d-b8e7-939f-8953-d34a0f3ff6c9@oracle.com> <259ef902-778b-7eef-46e2-d1927950d21c@oracle.com> <73f7c647-3194-2a65-6cc6-a15cbf6c82be@oracle.com> <37837126-c9d5-1bb1-fc9a-6fb9b848efbe@oracle.com> <28955bc6-020a-29e1-953c-e9f48932cd56@oracle.com> Message-ID: Thank you Vladimir Ivanov for suggestions. Please note following latest changes tried. - http://cr.openjdk.java.net/~rraghavan/8202414/webrev.04/ Hope did not miss any points. Confirmed no failures with the reported test cases. Also hs-tier1 to tier4, hs-precheckin-comp testing in progress. Thanks, Rahul On 27/04/19 11:48 AM, Vladimir Ivanov wrote: > On 26/04/2019 19:30, Vladimir Ivanov wrote: > > After thinking more about it, I believe new offset alignment check > supersedes is_unaligned_access(). And is_mismatched_access() is too > conservative here: what is_mismatched_access() adds here (in addition to > existing alignment & size checks) is whether type match between location > and stored value, but what matters for IN are sizes and offsets only. > > Type mismatches (e.g., byte vs boolean, char vs short) may cause > problems when consequent loads are replaced with values from > initializing stores, but it should be already handled in > MemNode::can_see_stored_value() and Load?Node::Ideal(). > > So, it seems both checks (is_unaligned_access() & > is_mismatched_access()) can be safely omitted. > > > > You are right, I missed that IN::captured_store_insertion_point() > inspects already other stores which are already captured. Sorry for the > confusion. > > I agree that IN::can_capture_store() is the right place to put the fix > in and I like (iii). (Just add a comment, "// mismatched access" is enough) > From felix.yang at huawei.com Tue Apr 30 08:17:42 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 30 Apr 2019 08:17:42 +0000 Subject: [aarch64-port-dev ] RFR: 8223020: aarch64: expand minI_rReg and maxI_rReg patterns into separate instructions In-Reply-To: <3ce0d4a5-a1df-d03d-3224-21b638f50f45@redhat.com> References: <3ce0d4a5-a1df-d03d-3224-21b638f50f45@redhat.com> Message-ID: Pushed. Thanks for reviewing. A rough search shows more patterns which are possible candidates for expanding. These were missed when I was doing a "size(8)" search. Some of these patterns use the reserved scratch registers when emitting code, which means we may use extra registers if we expand them in an elegant way. Should we expand them all? Any suggestions? bytes_reverse_short countTrailingZerosI countTrailingZerosL popCountI popCountI_mem popCountL popCountL_mem cmpL3_reg_reg cmpL3_reg_imm convI2B convP2B overflowMulI_reg overflowMulI_reg_branch overflowMulL_reg overflowMulL_reg_branch compF3_reg_reg compD3_reg_reg compF3_reg_immF0 compD3_reg_immD0 cmpLTMask_reg_reg cmpLTMask_reg_zero compareAndSwapB compareAndSwapS compareAndSwapI compareAndSwapL compareAndSwapP compareAndSwapN compareAndSwapBAcq compareAndSwapSAcq compareAndSwapIAcq compareAndSwapLAcq compareAndSwapPAcq compareAndSwapNAcq compareAndExchangeB compareAndExchangeS compareAndExchangeBAcq compareAndExchangeSAcq weakCompareAndSwapB weakCompareAndSwapS weakCompareAndSwapI weakCompareAndSwapL weakCompareAndSwapN weakCompareAndSwapP weakCompareAndSwapBAcq weakCompareAndSwapSAcq weakCompareAndSwapIAcq weakCompareAndSwapLAcq weakCompareAndSwapNAcq weakCompareAndSwapPAcq reduce_add2I reduce_add4I reduce_mul2I reduce_mul4I reduce_add2F reduce_add4F reduce_mul2F reduce_mul4F reduce_add2D reduce_mul2D reduce_max2F reduce_max4F reduce_max2D reduce_min2F reduce_min4F reduce_min2D vsra8B vsra16B vsrl8B vsrl16B vsra4S vsra8S vsrl4S vsrl8S vsra2I vsra4I vsrl2I vsrl4I vsra2L vsrl2L modI modL rolL_rReg rolI_rReg storePConditional storeLConditional storeIConditional aarch64_enc_cmp_imm aarch64_enc_cmpw_imm aarch64_enc_cmp_imm_addsub aarch64_enc_cmpw_imm_addsub > > On 4/28/19 2:44 AM, Yangfei (Felix) wrote: > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8223020 > > Webrev: http://cr.openjdk.java.net/~fyang/8223020/webrev.00 > > > > Currently, two instructions will be emitted for minI_rReg/maxI_rReg > patterns: cmpw + cselw. > > As these two instructions are always emitted together, the GCM (Global > Code Motion) phase will > > not be able to schedule them independently. Patch expands minI_rReg > and maxI_rReg patterns > > into separate instructions. For the small test case on the JBS, GCM can > do a better schedule with > > this change. > > OK. > > I imagine that there are other patterns which might benefit from doing > something similar. > > I don't know if you are intending to do more of this kind of change, > but if you are please group them together. It might be worth also > producing a list beforehand and publishing that list here. From tobias.hartmann at oracle.com Tue Apr 30 08:22:01 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 30 Apr 2019 10:22:01 +0200 Subject: [13] RFR(S): 8219807: C2 crash in IfNode::up_one_dom(Node*, bool) In-Reply-To: <293f4357-0c51-7d62-19ea-815802beb6b1@oracle.com> References: <116a47d4-7e49-0d20-0ae3-fba85ea1f513@oracle.com> <293f4357-0c51-7d62-19ea-815802beb6b1@oracle.com> Message-ID: <47a7e2b9-6b12-80ed-f87d-095a9c33424b@oracle.com> Thanks Vladimir, I'll add a comment before pushing. Best regards, Tobias On 29.04.19 17:47, Vladimir Kozlov wrote: > Please, add comment. Otherwise it is good. > > Thanks, > Vladimir > > On 4/29/19 6:24 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8219807 >> http://cr.openjdk.java.net/~thartmann/8219807/webrev.00/ >> >> We crash in IfNode::up_one_dom() while walking up the dominator tree because two regions (din3 and >> din4) are degraded to copies (for details see the bug comments). The control inputs of these regions >> were set to NULL in RegionNode::Ideal() but the nodes were not removed because we were still parsing. >> >> Both regions are on the igvn worklist and will be replaced but the IfNode is processed first and >> therefore encounters the NULL control inputs. I think the code should guard against that. The bug is >> very old and reproduces since JDK 8 with the regression test. >> >> I will run extended testing as soon as Mach5 is up again. >> >> Thanks, >> Tobias >> From aph at redhat.com Tue Apr 30 08:38:10 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 30 Apr 2019 09:38:10 +0100 Subject: [aarch64-port-dev ] RFR: 8223020: aarch64: expand minI_rReg and maxI_rReg patterns into separate instructions In-Reply-To: References: <3ce0d4a5-a1df-d03d-3224-21b638f50f45@redhat.com> Message-ID: <784286fb-8644-03fc-14d1-bf57adb5b8cc@redhat.com> On 4/30/19 9:17 AM, Yangfei (Felix) wrote: > Some of these patterns use the reserved scratch registers when > emitting code, which means we may use extra registers if we expand > them in an elegant way. > Should we expand them all? Any suggestions? I don't want to see them all expanded, no. I think it's probably too much churn for too little value. I'd be interested to see if any of these changes make any performance difference. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From patric.hedlin at oracle.com Tue Apr 30 14:08:59 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 16:08:59 +0200 Subject: RFR(T): 8223137: Rename predicate 'do_unroll_only()' to 'is_unroll_only()'. Message-ID: <68ae08c3-6b64-09cb-19dd-25369e30a55c@oracle.com> Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8223137 Webrev: http://cr.openjdk.java.net/~phedlin/tr8223137/ 8223137: Rename predicate 'do_unroll_only()' to 'is_unroll_only()'. ??? The "do" prefix is used as a command ditto elsewhere, while predicates typically use "is". Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, Kitchensink24h) Best regards, Patric From patric.hedlin at oracle.com Tue Apr 30 14:09:37 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 16:09:37 +0200 Subject: RFR(S): 8223138: Small clean-up in loop-tree support. Message-ID: Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8223138 Webrev: http://cr.openjdk.java.net/~phedlin/tr8223138/ 8223138: Small clean-up in loop-tree support. ??? Rename predicate 'is_inner()' to 'is_innermost()' to be accurate. ??? Add 'is_root()' predicate for root parent test in loop-tree. ??? Change definition of 'is_loop()' to always lazy-read the tail, ??? since it should never be NULL. Clean-up of 'tail()' definition. Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, Kitchensink24h) Best regards, Patric From patric.hedlin at oracle.com Tue Apr 30 14:10:47 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 16:10:47 +0200 Subject: RFR(T): 8223139: Rename mandatory policy-do routines. Message-ID: Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8223139 Webrev: http://cr.openjdk.java.net/~phedlin/tr8223139/ 8223139: Rename mandatory policy-do routines. ??? These routines do not implement any policy. The policy is to always ??? attempt these transforms if applicable. ??? 'policy_do_remove_empty_loop' -> 'do_remove_empty_loop'. ??? 'policy_do_one_iteration_loop' -> 'do_one_iteration_loop'. Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, Kitchensink24h) Best regards, Patric From patric.hedlin at oracle.com Tue Apr 30 14:11:24 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 16:11:24 +0200 Subject: RFR(S): 8223140: Clean-up in 'ok_to_convert()'. Message-ID: Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8223140 Webrev: http://cr.openjdk.java.net/~phedlin/tr8223140/ 8223140: Clean-up in 'ok_to_convert()' ??? Simplify logic in 'ok_to_convert()'. ??? Rename 'is_loop_iv()' to 'is_cloop_ind_var()'. ??? Adding precond/postcond macros. Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, Kitchensink24h) Best regards, Patric From patric.hedlin at oracle.com Tue Apr 30 14:12:23 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 16:12:23 +0200 Subject: RFR(T): 8223141: Change (count) suffix _ct into _cnt. Message-ID: Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8223141 Webrev: http://cr.openjdk.java.net/~phedlin/tr8223141/ 8223141: Change (count) suffix _ct into _cnt. ??? Align abbreviation with "common" usage. Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, Kitchensink24h) Best regards, Patric From patric.hedlin at oracle.com Tue Apr 30 14:13:12 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 16:13:12 +0200 Subject: RFR(T): 8223142: Clean-up WS and CB. Message-ID: <3c6f9217-b2f5-8e6d-8595-5e99e440c7b3@oracle.com> ** Trivial but DISRUPTIVE. ** Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8223142 Webrev: http://cr.openjdk.java.net/~phedlin/tr8223142/ 8223142: Clean-up WS and CB. ??? Rid of all the different spacing/WS styles in "loopTransform.cpp". Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, Kitchensink24h) Best regards, Patric From patric.hedlin at oracle.com Tue Apr 30 14:23:44 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 16:23:44 +0200 Subject: RFR(T): 8223143: Restructure/clean-up for 'loopexit_or_null()'. Message-ID: <50c0d3ab-75a7-8aea-7bf0-68b88a2a8e7c@oracle.com> Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8223143 Webrev: http://cr.openjdk.java.net/~phedlin/tr8223143/ 8223143: Restructure/clean-up for 'loopexit_or_null()'. ??? Minor restructure and clean-up for 'loopexit_or_null()' and its use. Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, Kitchensink24h) Best regards, Patric From patric.hedlin at oracle.com Tue Apr 30 14:28:02 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 16:28:02 +0200 Subject: RFR(M): 8216137: assert(Compile::current()->live_nodes() < Compile::current()->max_node_limit()) failed: Live Node limit exceeded limit Message-ID: <0553d83e-9d77-0295-acff-9fd3e8a44043@oracle.com> Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8216137 Webrev: http://cr.openjdk.java.net/~phedlin/tr8216137/ 8216137: assert(Compile::current()->live_nodes() < Compile::current()->max_node_limit()) failed: ???????? Live Node limit exceeded limit Also addressed: 8219520: assert(Compile::current()->live_nodes() < Compile::current()->max_node_limit()) failed: ???????? Live Node limit exceeded limit Approach: ??? Adding a simplistic (ad-hoc) node budget mechanism, applied during loop transforms. Testing: hs-tier1..4, hs-precheckin-comp, Kitchensink24h Caveat:? Testing and benchmarking needs to be reran but is currently experiencing issues. Best regards, Patric From vladimir.x.ivanov at oracle.com Tue Apr 30 16:45:51 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 30 Apr 2019 09:45:51 -0700 Subject: RFR(S): 8223138: Small clean-up in loop-tree support. In-Reply-To: References: Message-ID: <4883fb85-23ab-acf0-4687-3da50b070a4d@oracle.com> Looks good. Small nit: I find original version of IdealLoopTree::tail() easier to read. What do you think about the folloing? inline Node* IdealLoopTree::tail() { // Handle lazy update of _tail field if (_tail->in(0) == NULL) { _tail = _phase->get_ctrl(n); } return _tail; } Best regards, Vladimir Ivanov On 30/04/2019 07:09, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8223138 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8223138/ > > 8223138: Small clean-up in loop-tree support. > > ??? Rename predicate 'is_inner()' to 'is_innermost()' to be accurate. > ??? Add 'is_root()' predicate for root parent test in loop-tree. > ??? Change definition of 'is_loop()' to always lazy-read the tail, > ??? since it should never be NULL. Clean-up of 'tail()' definition. > > > Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, Kitchensink24h) > > > Best regards, > Patric > From vladimir.x.ivanov at oracle.com Tue Apr 30 16:46:33 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 30 Apr 2019 09:46:33 -0700 Subject: RFR(T): 8223139: Rename mandatory policy-do routines. In-Reply-To: References: Message-ID: Looks good. Best regards, Vladimir Ivanov On 30/04/2019 07:10, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8223139 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8223139/ > > 8223139: Rename mandatory policy-do routines. > > ??? These routines do not implement any policy. The policy is to always > ??? attempt these transforms if applicable. > ??? 'policy_do_remove_empty_loop' -> 'do_remove_empty_loop'. > ??? 'policy_do_one_iteration_loop' -> 'do_one_iteration_loop'. > > > Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, Kitchensink24h) > > > Best regards, > Patric > From vladimir.x.ivanov at oracle.com Tue Apr 30 16:54:36 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 30 Apr 2019 09:54:36 -0700 Subject: RFR(S): 8223140: Clean-up in 'ok_to_convert()'. In-Reply-To: References: Message-ID: Looks good. I like precond/postcond macros. +static bool is_cloop_increment(Node* inc) { + precond(inc->Opcode() == Op_AddI || inc->Opcode() == Op_AddL); Best regards, Vladimir Ivanov On 30/04/2019 07:11, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8223140 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8223140/ > > 8223140: Clean-up in 'ok_to_convert()' > > ??? Simplify logic in 'ok_to_convert()'. > ??? Rename 'is_loop_iv()' to 'is_cloop_ind_var()'. > ??? Adding precond/postcond macros. > > > Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, Kitchensink24h) > > > Best regards, > Patric > From vladimir.x.ivanov at oracle.com Tue Apr 30 16:55:17 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 30 Apr 2019 09:55:17 -0700 Subject: RFR(T): 8223141: Change (count) suffix _ct into _cnt. In-Reply-To: References: Message-ID: <32e91c7e-795a-9130-15bb-808f6fdf0f30@oracle.com> Looks good. Best regards, Vladimir Ivanov On 30/04/2019 07:12, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8223141 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8223141/ > > 8223141: Change (count) suffix _ct into _cnt. > > ??? Align abbreviation with "common" usage. > > > Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, Kitchensink24h) > > > Best regards, > Patric > From vladimir.x.ivanov at oracle.com Tue Apr 30 16:58:33 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 30 Apr 2019 09:58:33 -0700 Subject: RFR(T): 8223142: Clean-up WS and CB. In-Reply-To: <3c6f9217-b2f5-8e6d-8595-5e99e440c7b3@oracle.com> References: <3c6f9217-b2f5-8e6d-8595-5e99e440c7b3@oracle.com> Message-ID: <73b555c9-fb76-5289-7a3d-e8d9246ff40b@oracle.com> Looks good. Would the following code benefit from more curly braces as well? :-) - for( uint i = 0; i < _body.size(); i++ ) - if( _body[i]->is_Mem() ) + for (uint i = 0; i < _body.size(); i++) + if (_body[i]->is_Mem()) return false; Best regards, Vladimir Ivanov On 30/04/2019 07:13, Patric Hedlin wrote: > ** Trivial but DISRUPTIVE. ** > > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8223142 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8223142/ > > 8223142: Clean-up WS and CB. > > ??? Rid of all the different spacing/WS styles in "loopTransform.cpp". > > > Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, Kitchensink24h) > > > Best regards, > Patric > From vladimir.x.ivanov at oracle.com Tue Apr 30 17:04:34 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 30 Apr 2019 10:04:34 -0700 Subject: RFR(T): 8223143: Restructure/clean-up for 'loopexit_or_null()'. In-Reply-To: <50c0d3ab-75a7-8aea-7bf0-68b88a2a8e7c@oracle.com> References: <50c0d3ab-75a7-8aea-7bf0-68b88a2a8e7c@oracle.com> Message-ID: Looks good. Best regards, Vladimir Ivanov On 30/04/2019 07:23, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8223143 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8223143/ > > 8223143: Restructure/clean-up for 'loopexit_or_null()'. > > ??? Minor restructure and clean-up for 'loopexit_or_null()' and its use. > > > Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, Kitchensink24h) > > > Best regards, > Patric > From vladimir.x.ivanov at oracle.com Tue Apr 30 17:05:52 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 30 Apr 2019 10:05:52 -0700 Subject: RFR(T): 8223137: Rename predicate 'do_unroll_only()' to 'is_unroll_only()'. In-Reply-To: <68ae08c3-6b64-09cb-19dd-25369e30a55c@oracle.com> References: <68ae08c3-6b64-09cb-19dd-25369e30a55c@oracle.com> Message-ID: Looks good. Best regards, Vladimir Ivanov On 30/04/2019 07:08, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8223137 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8223137/ > > 8223137: Rename predicate 'do_unroll_only()' to 'is_unroll_only()'. > > ??? The "do" prefix is used as a command ditto elsewhere, while > predicates typically use "is". > > > Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, Kitchensink24h) > > > Best regards, > Patric > From vladimir.x.ivanov at oracle.com Tue Apr 30 17:11:57 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 30 Apr 2019 10:11:57 -0700 Subject: RFR(M): 8216137: assert(Compile::current()->live_nodes() < Compile::current()->max_node_limit()) failed: Live Node limit exceeded limit In-Reply-To: <0553d83e-9d77-0295-acff-9fd3e8a44043@oracle.com> References: <0553d83e-9d77-0295-acff-9fd3e8a44043@oracle.com> Message-ID: <7fac99fd-c8f7-1eb4-7ffa-412dbdb61d3b@oracle.com> Looks good. Best regards, Vladimir Ivanov On 30/04/2019 07:28, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8216137 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8216137/ > > 8216137: assert(Compile::current()->live_nodes() < > Compile::current()->max_node_limit()) failed: > ???????? Live Node limit exceeded limit > > Also addressed: > > 8219520: assert(Compile::current()->live_nodes() < > Compile::current()->max_node_limit()) failed: > ???????? Live Node limit exceeded limit > > Approach: > > ??? Adding a simplistic (ad-hoc) node budget mechanism, applied during > loop transforms. > > > Testing: hs-tier1..4, hs-precheckin-comp, Kitchensink24h > > > Caveat:? Testing and benchmarking needs to be reran but is currently > experiencing issues. > > > Best regards, > Patric > From vladimir.x.ivanov at oracle.com Tue Apr 30 17:15:19 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 30 Apr 2019 10:15:19 -0700 Subject: [13] RFR: 8202414: Unsafe write after primitive array creation may result in array length change In-Reply-To: References: <7e900022-4e16-2ab9-1f4d-89e1510e2646@oracle.com> <392c665f-869c-29af-4fc5-e6f844820846@oracle.com> <3db5d7ab-ad99-310b-e891-fc36d25da338@oracle.com> <7b03a213-7fee-a87f-b48d-250662e730ef@oracle.com> <959abf54-d1da-95ee-9cf6-6c6d8ec5e4a1@oracle.com> <18115aa8-edaa-31b9-02a6-06721d9fbfc9@oracle.com> <939f3f5d-b8e7-939f-8953-d34a0f3ff6c9@oracle.com> <259ef902-778b-7eef-46e2-d1927950d21c@oracle.com> <73f7c647-3194-2a65-6cc6-a15cbf6c82be@oracle.com> <37837126-c9d5-1bb1-fc9a-6fb9b848efbe@oracle.com> <28955bc6-020a-29e1-953c-e9f48932cd56@oracle.com> Message-ID: Looks good! Best regards, Vladimir Ivanov On 30/04/2019 00:04, Rahul Raghavan wrote: > Thank you Vladimir Ivanov for suggestions. > > Please note following latest changes tried. > - http://cr.openjdk.java.net/~rraghavan/8202414/webrev.04/ > > Hope did not miss any points. > Confirmed no failures with the reported test cases. > Also hs-tier1 to tier4, hs-precheckin-comp testing in progress. > > Thanks, > Rahul > > On 27/04/19 11:48 AM, Vladimir Ivanov wrote: >> On 26/04/2019 19:30, Vladimir Ivanov wrote: >> >> After thinking more about it, I believe new offset alignment check >> supersedes is_unaligned_access(). And is_mismatched_access() is too >> conservative here: what is_mismatched_access() adds here (in addition >> to existing alignment & size checks) is whether type match between >> location and stored value, but what matters for IN are sizes and >> offsets only. >> >> Type mismatches (e.g., byte vs boolean, char vs short) may cause >> problems when consequent loads are replaced with values from >> initializing stores, but it should be already handled in >> MemNode::can_see_stored_value() and Load?Node::Ideal(). >> >> So, it seems both checks (is_unaligned_access() & >> is_mismatched_access()) can be safely omitted. >> >> >> >> You are right, I missed that IN::captured_store_insertion_point() >> inspects already other stores which are already captured. Sorry for >> the confusion. >> >> I agree that IN::can_capture_store() is the right place to put the fix >> in and I like (iii). (Just add a comment, "// mismatched access" is >> enough) >> From patric.hedlin at oracle.com Tue Apr 30 17:25:19 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 19:25:19 +0200 Subject: RFR(S): 8223138: Small clean-up in loop-tree support. In-Reply-To: <4883fb85-23ab-acf0-4687-3da50b070a4d@oracle.com> References: <4883fb85-23ab-acf0-4687-3da50b070a4d@oracle.com> Message-ID: <757ff96c-ac8b-7d25-9222-dcd0830b1fed@oracle.com> Thanks Vladimir. On 2019-04-30 18:45, Vladimir Ivanov wrote: > Looks good. > > Small nit: I find original version of IdealLoopTree::tail() easier to > read. What do you think about the folloing? > > inline Node* IdealLoopTree::tail() { > ? // Handle lazy update of _tail field > ? if (_tail->in(0) == NULL) { > ??? _tail = _phase->get_ctrl(n); > ? } > ? return _tail; > } > Sure, I'm fine with a revised "old" version as well. /Patric > Best regards, > Vladimir Ivanov > > On 30/04/2019 07:09, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8223138 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8223138/ >> >> 8223138: Small clean-up in loop-tree support. >> >> ???? Rename predicate 'is_inner()' to 'is_innermost()' to be accurate. >> ???? Add 'is_root()' predicate for root parent test in loop-tree. >> ???? Change definition of 'is_loop()' to always lazy-read the tail, >> ???? since it should never be NULL. Clean-up of 'tail()' definition. >> >> >> Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, >> Kitchensink24h) >> >> >> Best regards, >> Patric >> From patric.hedlin at oracle.com Tue Apr 30 17:26:00 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 19:26:00 +0200 Subject: RFR(T): 8223139: Rename mandatory policy-do routines. In-Reply-To: References: Message-ID: <91a58abd-cb35-d6dc-cfc7-e8bd0f7f0ec7@oracle.com> Thanks Vladimir. /Patric On 2019-04-30 18:46, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 30/04/2019 07:10, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8223139 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8223139/ >> >> 8223139: Rename mandatory policy-do routines. >> >> ???? These routines do not implement any policy. The policy is to always >> ???? attempt these transforms if applicable. >> ???? 'policy_do_remove_empty_loop' -> 'do_remove_empty_loop'. >> ???? 'policy_do_one_iteration_loop' -> 'do_one_iteration_loop'. >> >> >> Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, >> Kitchensink24h) >> >> >> Best regards, >> Patric >> From patric.hedlin at oracle.com Tue Apr 30 17:27:05 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 19:27:05 +0200 Subject: RFR(S): 8223140: Clean-up in 'ok_to_convert()'. In-Reply-To: References: Message-ID: <96e46b80-d309-df0b-72ae-666a3499ce6c@oracle.com> Thanks Vladimir. /Patric On 2019-04-30 18:54, Vladimir Ivanov wrote: > Looks good. > > I like precond/postcond macros. > > +static bool is_cloop_increment(Node* inc) { > +? precond(inc->Opcode() == Op_AddI || inc->Opcode() == Op_AddL); > > Best regards, > Vladimir Ivanov > > On 30/04/2019 07:11, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8223140 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8223140/ >> >> 8223140: Clean-up in 'ok_to_convert()' >> >> ???? Simplify logic in 'ok_to_convert()'. >> ???? Rename 'is_loop_iv()' to 'is_cloop_ind_var()'. >> ???? Adding precond/postcond macros. >> >> >> Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, >> Kitchensink24h) >> >> >> Best regards, >> Patric >> From patric.hedlin at oracle.com Tue Apr 30 17:27:44 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 19:27:44 +0200 Subject: RFR(T): 8223141: Change (count) suffix _ct into _cnt. In-Reply-To: <32e91c7e-795a-9130-15bb-808f6fdf0f30@oracle.com> References: <32e91c7e-795a-9130-15bb-808f6fdf0f30@oracle.com> Message-ID: Thanks Vladimir. /Patric On 2019-04-30 18:55, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 30/04/2019 07:12, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8223141 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8223141/ >> >> 8223141: Change (count) suffix _ct into _cnt. >> >> ???? Align abbreviation with "common" usage. >> >> >> Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, >> Kitchensink24h) >> >> >> Best regards, >> Patric >> From patric.hedlin at oracle.com Tue Apr 30 17:29:08 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 19:29:08 +0200 Subject: RFR(T): 8223142: Clean-up WS and CB. In-Reply-To: <73b555c9-fb76-5289-7a3d-e8d9246ff40b@oracle.com> References: <3c6f9217-b2f5-8e6d-8595-5e99e440c7b3@oracle.com> <73b555c9-fb76-5289-7a3d-e8d9246ff40b@oracle.com> Message-ID: Thanks Vladimir. On 2019-04-30 18:58, Vladimir Ivanov wrote: > Looks good. > > Would the following code benefit from more curly braces as well? :-) > > -? for( uint i = 0; i < _body.size(); i++ ) > -??? if( _body[i]->is_Mem() ) > +? for (uint i = 0; i < _body.size(); i++) > +??? if (_body[i]->is_Mem()) > ?????? return false; > We can always add more... /Patric > Best regards, > Vladimir Ivanov > > On 30/04/2019 07:13, Patric Hedlin wrote: >> ** Trivial but DISRUPTIVE. ** >> >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8223142 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8223142/ >> >> 8223142: Clean-up WS and CB. >> >> ???? Rid of all the different spacing/WS styles in "loopTransform.cpp". >> >> >> Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, >> Kitchensink24h) >> >> >> Best regards, >> Patric >> From patric.hedlin at oracle.com Tue Apr 30 17:31:01 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 19:31:01 +0200 Subject: RFR(T): 8223137: Rename predicate 'do_unroll_only()' to 'is_unroll_only()'. In-Reply-To: References: <68ae08c3-6b64-09cb-19dd-25369e30a55c@oracle.com> Message-ID: <192df248-7bd4-336c-27cd-786088d6d0db@oracle.com> Thanks Vladimir. /Patric On 2019-04-30 19:05, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 30/04/2019 07:08, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8223137 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8223137/ >> >> 8223137: Rename predicate 'do_unroll_only()' to 'is_unroll_only()'. >> >> ???? The "do" prefix is used as a command ditto elsewhere, while >> predicates typically use "is". >> >> >> Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, >> Kitchensink24h) >> >> >> Best regards, >> Patric >> From patric.hedlin at oracle.com Tue Apr 30 17:30:08 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 19:30:08 +0200 Subject: RFR(T): 8223143: Restructure/clean-up for 'loopexit_or_null()'. In-Reply-To: References: <50c0d3ab-75a7-8aea-7bf0-68b88a2a8e7c@oracle.com> Message-ID: Thanks Vladimir. /Patric On 2019-04-30 19:04, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 30/04/2019 07:23, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8223143 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8223143/ >> >> 8223143: Restructure/clean-up for 'loopexit_or_null()'. >> >> ???? Minor restructure and clean-up for 'loopexit_or_null()' and its >> use. >> >> >> Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, >> Kitchensink24h) >> >> >> Best regards, >> Patric >> From patric.hedlin at oracle.com Tue Apr 30 17:33:49 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 30 Apr 2019 19:33:49 +0200 Subject: RFR(M): 8216137: assert(Compile::current()->live_nodes() < Compile::current()->max_node_limit()) failed: Live Node limit exceeded limit In-Reply-To: <7fac99fd-c8f7-1eb4-7ffa-412dbdb61d3b@oracle.com> References: <0553d83e-9d77-0295-acff-9fd3e8a44043@oracle.com> <7fac99fd-c8f7-1eb4-7ffa-412dbdb61d3b@oracle.com> Message-ID: Thanks Vladimir. /Patric On 2019-04-30 19:11, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 30/04/2019 07:28, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8216137 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8216137/ >> >> 8216137: assert(Compile::current()->live_nodes() < >> Compile::current()->max_node_limit()) failed: >> ????????? Live Node limit exceeded limit >> >> Also addressed: >> >> 8219520: assert(Compile::current()->live_nodes() < >> Compile::current()->max_node_limit()) failed: >> ????????? Live Node limit exceeded limit >> >> Approach: >> >> ???? Adding a simplistic (ad-hoc) node budget mechanism, applied >> during loop transforms. >> >> >> Testing: hs-tier1..4, hs-precheckin-comp, Kitchensink24h >> >> >> Caveat:? Testing and benchmarking needs to be reran but is currently >> experiencing issues. >> >> >> Best regards, >> Patric >> From vladimir.x.ivanov at oracle.com Tue Apr 30 19:47:00 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 30 Apr 2019 12:47:00 -0700 Subject: [13] RFR (S): 8219902: C2: MemNode::can_see_stored_value() ignores casts which carry control dependency Message-ID: <8ab7b14b-d42d-37ea-e6d7-151d068c57f0@oracle.com> http://cr.openjdk.java.net/~vlivanov/8219902/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8219902 JDK-8161334 [1] enhanced MemNode::can_see_stored_value to ignore casts when access base addresses are compared. It turned out to be too aggressive since casts may carry control dependency. Proposed fix is to keep casts with control dependency. Testing: failing test case, tier1-3 Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8161334 From vladimir.x.ivanov at oracle.com Tue Apr 30 19:59:23 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 30 Apr 2019 12:59:23 -0700 Subject: [13] RFR (S): 8223171: Redundant nmethod dependencies for effectively final methods Message-ID: http://cr.openjdk.java.net/~vlivanov/8223171/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8223171 Both C1 & C2 may register redundant nmethod dependencies (which (always hold). For example, for instance methods on final classes. Moreover, C2 does add dependencies for private methods. The patch enhances the checks and unify them between C1 & C2. Testing: tier1-4 Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Tue Apr 30 20:02:12 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 30 Apr 2019 13:02:12 -0700 Subject: [13] RFR (S): 8219902: C2: MemNode::can_see_stored_value() ignores casts which carry control dependency In-Reply-To: <8ab7b14b-d42d-37ea-e6d7-151d068c57f0@oracle.com> References: <8ab7b14b-d42d-37ea-e6d7-151d068c57f0@oracle.com> Message-ID: <147e1906-381d-4a3d-0a24-23e9c149581d@oracle.com> Looks good. Thanks, Vladimir K On 4/30/19 12:47 PM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8219902/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8219902 > > JDK-8161334 [1] enhanced MemNode::can_see_stored_value to ignore casts when access base addresses are compared. It > turned out to be too aggressive since casts may carry control dependency. > > Proposed fix is to keep casts with control dependency. > > Testing: failing test case, tier1-3 > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-8161334 From dean.long at oracle.com Tue Apr 30 22:05:08 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 30 Apr 2019 15:05:08 -0700 Subject: RFR(T): 8223139: Rename mandatory policy-do routines. In-Reply-To: References: Message-ID: <216752da-5cbd-b14a-8ea2-96959a889759@oracle.com> Looks good, though "remove_empty_loop" also sounds good. dl On 4/30/19 7:10 AM, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8223139 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8223139/ > > 8223139: Rename mandatory policy-do routines. > > ??? These routines do not implement any policy. The policy is to always > ??? attempt these transforms if applicable. > ??? 'policy_do_remove_empty_loop' -> 'do_remove_empty_loop'. > ??? 'policy_do_one_iteration_loop' -> 'do_one_iteration_loop'. > > > Testing: Part of 8216137 (hs-tier1..4, hs-precheckin-comp, > Kitchensink24h) > > > Best regards, > Patric >