From rwestrel at redhat.com Wed Jan 2 08:25:16 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 02 Jan 2019 09:25:16 +0100 Subject: [aarch64-port-dev ] RFR(S): 8214922: Add vectorization support for fmin/fmax In-Reply-To: References: <87d0pv2iow.fsf@redhat.com> <877eg32bzq.fsf@redhat.com> <871s6a3map.fsf@redhat.com> Message-ID: <87va371n6b.fsf@redhat.com> > http://cr.openjdk.java.net/~pli/rfr/8214922/webrev.01/ That looks good to me. Roland. From erik.joelsson at oracle.com Wed Jan 2 08:52:03 2019 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Wed, 2 Jan 2019 09:52:03 +0100 Subject: RFR(M): 8215902: Add support for SoftFloat-3e library In-Reply-To: <4497ca084b9f48dbb8f6de1aa35c83653fd7acfb.camel@gmail.com> References: <4497ca084b9f48dbb8f6de1aa35c83653fd7acfb.camel@gmail.com> Message-ID: <22c3e7fe-b092-bae5-39d2-2a28c96d5412@oracle.com> From a build perspective, this looks very good. I think adding a link to the github project in the doc makes sense if you want to do that. /Erik On 2018-12-25 16:19, Jakub Van?k wrote: > Hi, > > please review this webrev. It is a successor of the softfloat-3 [patch] > thread (first email > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2018-November/031311.html > ) > > Changes since the last patch (v6): > > - renamed --with-softloat* to --with-sflt* (it is more compact and it > corresponds to the old --with-sflt-lib=... option) > > - license is now obtained via --with-sflt-license switch (so it is not > included in OpenJDK source tree) > > - updated documentation (slight rewording, added the license option) > > - checks for default --with/--without behavior are in place again > (I forgot them when I changed the way the library is detected) > > - added a simple testcase - I found a disrepancy between softfloat and > system function behavior. When a float with bits 0x003FFFFF is > added to 0x00000001, the correct result is 0x00400000, but the > default software floating point implementation returns 0x00000000. > However I'm not sure where to put this test - now it is in > test/hotspot/jtreg/compiler/floatingpoint. > > - comments in code refer to CR 6757269 and newly JDK-8215902 too. > > I have created a repository with SoftFloat-3e with build configuration > specifically for OpenJDK on armel: > https://github.com/ev3dev-lang-java/softfloat-openjdk > > I can add a link to it to the documentation. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8215902 > Webrev: http://cr.openjdk.java.net/~jakvanek/8215902/webrev.02/ > CI build: https://ci.adoptopenjdk.net/view/ev3dev/job/openjdk12_build_ev3_linux/67/ > > Cheers, > > Jakub > From eric.caspole at oracle.com Wed Jan 2 22:08:24 2019 From: eric.caspole at oracle.com (Eric Caspole) Date: Wed, 2 Jan 2019 17:08:24 -0500 Subject: RFR 13 (S): 8196347: LogCompilation: generate log file on the fly for input to junits Message-ID: <6060b883-65e7-d20a-fa1c-2cd4977f5e37@oracle.com> Hi everybody, Could I have reviews on this change to add setup methods to produce the test +LogCompilation files on the fly, so we can test the output of the JDK in the repo by adding it into the PATH etc, instead of reading static files. Also, this changeset removes several large static input files that did not have any special significance. Tested and builds with JDK 8 and 13. Nothing prevents adding back useful static log test files later, because it is hard to reproduce some constructs on the fly. Thanks, Eric JBS: https://bugs.openjdk.java.net/browse/JDK-8196347 webrev: http://cr.openjdk.java.net/~ecaspole/JDK-8196347/01/webrev/ From vladimir.kozlov at oracle.com Wed Jan 2 22:28:23 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 2 Jan 2019 14:28:23 -0800 Subject: RFR 13 (S): 8196347: LogCompilation: generate log file on the fly for input to junits In-Reply-To: <6060b883-65e7-d20a-fa1c-2cd4977f5e37@oracle.com> References: <6060b883-65e7-d20a-fa1c-2cd4977f5e37@oracle.com> Message-ID: Hi Eric, May be add -Xbatch (-XX:-BackgroundCompilation) to make sure compilation and log is complete before execution is finished. Also I think with -version much less methods are compiled than without it when 'java' help output is printed by running without arguments. Thanks, Vladimir On 1/2/19 2:08 PM, Eric Caspole wrote: > Hi everybody, > Could I have reviews on this change to add setup methods to produce the test +LogCompilation files on the fly, so we can > test the output of the JDK in the repo by adding it into the PATH etc, instead of reading static files. Also, this > changeset removes several large static input files that did not have any special significance. Tested and builds with > JDK 8 and 13. Nothing prevents adding back useful static log test files later, because it is hard to reproduce some > constructs on the fly. > > Thanks, > Eric > > > JBS: > https://bugs.openjdk.java.net/browse/JDK-8196347 > > webrev: > http://cr.openjdk.java.net/~ecaspole/JDK-8196347/01/webrev/ From eric.caspole at oracle.com Wed Jan 2 22:36:23 2019 From: eric.caspole at oracle.com (Eric Caspole) Date: Wed, 2 Jan 2019 17:36:23 -0500 Subject: RFR 13 (S): 8196347: LogCompilation: generate log file on the fly for input to junits In-Reply-To: References: <6060b883-65e7-d20a-fa1c-2cd4977f5e37@oracle.com> Message-ID: Hi Vladimir, OK I will experiment with those and let you know. Thanks, Eric On 1/2/19 17:28, Vladimir Kozlov wrote: > Hi Eric, > > May be add -Xbatch (-XX:-BackgroundCompilation) to make sure compilation > and log is complete before execution is finished. Also I think with > -version much less methods are compiled than without it when 'java' help > output is printed by running without arguments. > > Thanks, > Vladimir > > On 1/2/19 2:08 PM, Eric Caspole wrote: >> Hi everybody, >> Could I have reviews on this change to add setup methods to produce >> the test +LogCompilation files on the fly, so we can test the output >> of the JDK in the repo by adding it into the PATH etc, instead of >> reading static files. Also, this changeset removes several large >> static input files that did not have any special significance. Tested >> and builds with JDK 8 and 13. Nothing prevents adding back useful >> static log test files later, because it is hard to reproduce some >> constructs on the fly. >> >> Thanks, >> Eric >> >> >> JBS: >> https://bugs.openjdk.java.net/browse/JDK-8196347 >> >> webrev: >> http://cr.openjdk.java.net/~ecaspole/JDK-8196347/01/webrev/ From eric.caspole at oracle.com Wed Jan 2 23:27:48 2019 From: eric.caspole at oracle.com (Eric Caspole) Date: Wed, 2 Jan 2019 18:27:48 -0500 Subject: RFR 13 (S): 8196347: LogCompilation: generate log file on the fly for input to junits In-Reply-To: References: <6060b883-65e7-d20a-fa1c-2cd4977f5e37@oracle.com> Message-ID: On 1/2/19 17:36, Eric Caspole wrote: > Hi Vladimir, > OK I will experiment with those and let you know. > Thanks, > Eric You are right, running with no -version produces more compilations, the log file is about 40% bigger. -Xbatch was sort of a wash. So I added combos of each since these are short running and will give more coverage. New one: http://cr.openjdk.java.net/~ecaspole/JDK-8196347/02/webrev/ > > > On 1/2/19 17:28, Vladimir Kozlov wrote: >> Hi Eric, >> >> May be add -Xbatch (-XX:-BackgroundCompilation) to make sure >> compilation and log is complete before execution is finished. Also I >> think with -version much less methods are compiled than without it >> when 'java' help output is printed by running without arguments. >> >> Thanks, >> Vladimir >> >> On 1/2/19 2:08 PM, Eric Caspole wrote: >>> Hi everybody, >>> Could I have reviews on this change to add setup methods to produce >>> the test +LogCompilation files on the fly, so we can test the output >>> of the JDK in the repo by adding it into the PATH etc, instead of >>> reading static files. Also, this changeset removes several large >>> static input files that did not have any special significance. Tested >>> and builds with JDK 8 and 13. Nothing prevents adding back useful >>> static log test files later, because it is hard to reproduce some >>> constructs on the fly. >>> >>> Thanks, >>> Eric >>> >>> >>> JBS: >>> https://bugs.openjdk.java.net/browse/JDK-8196347 >>> >>> webrev: >>> http://cr.openjdk.java.net/~ecaspole/JDK-8196347/01/webrev/ From vladimir.kozlov at oracle.com Thu Jan 3 00:41:07 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 2 Jan 2019 16:41:07 -0800 Subject: RFR 13 (S): 8196347: LogCompilation: generate log file on the fly for input to junits In-Reply-To: References: <6060b883-65e7-d20a-fa1c-2cd4977f5e37@oracle.com> Message-ID: Good. Thanks, Vladimir > On Jan 2, 2019, at 3:27 PM, Eric Caspole wrote: > > >> On 1/2/19 17:36, Eric Caspole wrote: >> Hi Vladimir, >> OK I will experiment with those and let you know. >> Thanks, >> Eric > > You are right, running with no -version produces more compilations, the log file is about 40% bigger. -Xbatch was sort of a wash. So I added combos of each since these are short running and will give more coverage. > > New one: > http://cr.openjdk.java.net/~ecaspole/JDK-8196347/02/webrev/ > > > >>> On 1/2/19 17:28, Vladimir Kozlov wrote: >>> Hi Eric, >>> >>> May be add -Xbatch (-XX:-BackgroundCompilation) to make sure compilation and log is complete before execution is finished. Also I think with -version much less methods are compiled than without it when 'java' help output is printed by running without arguments. >>> >>> Thanks, >>> Vladimir >>> >>>> On 1/2/19 2:08 PM, Eric Caspole wrote: >>>> Hi everybody, >>>> Could I have reviews on this change to add setup methods to produce the test +LogCompilation files on the fly, so we can test the output of the JDK in the repo by adding it into the PATH etc, instead of reading static files. Also, this changeset removes several large static input files that did not have any special significance. Tested and builds with JDK 8 and 13. Nothing prevents adding back useful static log test files later, because it is hard to reproduce some constructs on the fly. >>>> >>>> Thanks, >>>> Eric >>>> >>>> >>>> JBS: >>>> https://bugs.openjdk.java.net/browse/JDK-8196347 >>>> >>>> webrev: >>>> http://cr.openjdk.java.net/~ecaspole/JDK-8196347/01/webrev/ From tobias.hartmann at oracle.com Thu Jan 3 08:44:53 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 3 Jan 2019 09:44:53 +0100 Subject: RFR 13 (S): 8196347: LogCompilation: generate log file on the fly for input to junits In-Reply-To: References: <6060b883-65e7-d20a-fa1c-2cd4977f5e37@oracle.com> Message-ID: <7cbb403b-2898-fb42-a8b7-a1e9769a2ec2@oracle.com> Hi Eric, this looks good to me too. Best regards, Tobias On 03.01.19 00:27, Eric Caspole wrote: > > On 1/2/19 17:36, Eric Caspole wrote: >> Hi Vladimir, >> OK I will experiment with those and let you know. >> Thanks, >> Eric > > You are right, running with no -version produces more compilations, the log file is about 40% > bigger. -Xbatch was sort of a wash. So I added combos of each since these are short running and will > give more coverage. > > New one: > http://cr.openjdk.java.net/~ecaspole/JDK-8196347/02/webrev/ > > > >> >> >> On 1/2/19 17:28, Vladimir Kozlov wrote: >>> Hi Eric, >>> >>> May be add -Xbatch (-XX:-BackgroundCompilation) to make sure compilation and log is complete >>> before execution is finished. Also I think with -version much less methods are compiled than >>> without it when 'java' help output is printed by running without arguments. >>> >>> Thanks, >>> Vladimir >>> >>> On 1/2/19 2:08 PM, Eric Caspole wrote: >>>> Hi everybody, >>>> Could I have reviews on this change to add setup methods to produce the test +LogCompilation >>>> files on the fly, so we can test the output of the JDK in the repo by adding it into the PATH >>>> etc, instead of reading static files. Also, this changeset removes several large static input >>>> files that did not have any special significance. Tested and builds with JDK 8 and 13. Nothing >>>> prevents adding back useful static log test files later, because it is hard to reproduce some >>>> constructs on the fly. >>>> >>>> Thanks, >>>> Eric >>>> >>>> >>>> JBS: >>>> https://bugs.openjdk.java.net/browse/JDK-8196347 >>>> >>>> webrev: >>>> http://cr.openjdk.java.net/~ecaspole/JDK-8196347/01/webrev/ From tobias.hartmann at oracle.com Thu Jan 3 09:01:16 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 3 Jan 2019 10:01:16 +0100 Subject: RFR (XS): 8215888: Register to register spill may use AVX 512 move instruction on unsupported platform In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A365AB@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A363D9@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A36514@FMSMSX126.amr.corp.intel.com> <7b19baa8-c48f-d0aa-02c0-aacaac3e984b@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A365AB@FMSMSX126.amr.corp.intel.com> Message-ID: <78a0415c-9cce-dea0-dd37-3fa22692ac7b@oracle.com> Hi Sandhya, all webrevs look good to me (and the tests submitted by Vladimir passed). You don't need to push to JDK 13 because patches pushed to JDK 12 will be synced with mainline automatically: https://mail.openjdk.java.net/pipermail/jdk-dev/2018-December/002376.html So I would suggest to push the patch to jdk/jdk12 and request a backport to JDK 11u after some iterations of nightly testing have passed. Best regards, Tobias On 22.12.18 01:55, Viswanathan, Sandhya wrote: > Thanks a lot! I have also created backport patches for JDK 12 and JDK 11.0.2 as this bug affects those versions too. The below are for your consideration: > > JDK 12: > http://cr.openjdk.java.net/~sviswanathan/8215888/jdk12/webrev.01/ > JDK11u: > http://cr.openjdk.java.net/~sviswanathan/8215888/jdk11u/webrev.01/ > > The compiler jtreg testing passes for these as well. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] > Sent: Friday, December 21, 2018 4:27 PM > To: Viswanathan, Sandhya ; hotspot compiler ; vladimir.kozlov at oracle.com > Subject: Re: RFR (XS): 8215888: Register to register spill may use AVX 512 move instruction on unsupported platform > > >> Please find the updated webrev with your comments incorporated at: >> >> http://cr.openjdk.java.net/~sviswanathan/8215888/webrev.01/ > > Thanks, submitted for testing. > > Best regards, > Vladimir Ivanov > >> -----Original Message----- >> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] >> Sent: Friday, December 21, 2018 12:00 PM >> To: Viswanathan, Sandhya ; hotspot compiler ; vladimir.kozlov at oracle.com >> Subject: Re: RFR (XS): 8215888: Register to register spill may use AVX 512 move instruction on unsupported platform >> >> Sandhya, >> >> I'd prefer to see the check inverted: >> >> if (UseAVX > 2 && !VM_Version::supports_avx512vl()) { >> int vector_len = 2; >> __ evmovdquq($dst$$XMMRegister, $src$$XMMRegister, vector_len); >> } else { >> __ movdqu($dst$$XMMRegister, $src$$XMMRegister); >> } >> >> It looks easier to read considering the code around is full of "UseAVX > 2" checks. >> >> By coincidence I was debugging the very same bug today and at first I didn't notice the problem with "UseAVX < 2" misreading it as "UseAVX > 2". >> >> Otherwise, looks good. >> >> Best regards, >> Vladimir Ivanov >> >> On 21/12/2018 11:44, Viswanathan, Sandhya wrote: >>> Hi All, >>> >>> We noticed that the register to register moves in x86.ad file attempt >>> to generate emovdqu when UseAVX==2. >>> >>> The instruction emovdquq is only supported on platforms where UseAVX > >>> 2 (AVX 512). >>> >>> The following rules in x86.ad file need to be corrected: >>> >>> MoveVecX2Leg >>> >>> MoveLeg2VecX >>> >>> MoveVecY2Leg >>> >>> MoveLeg2VecY >>> >>> The above move rules when activated through register allocator could >>> result in illegal instruction exception. >>> >>> Bug: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8215888 >>> >>> This bug affects versions 11.0.2, 12 and the mainline. >>> >>> Webrev for jdk mainline: >>> >>> http://cr.openjdk.java.net/~sviswanathan/8215888/webrev.00/ >>> >>> This webrev passes jtreg compiler tests on Haswell and SKX. >>> >>> Best Regards, >>> >>> Sandhya >>> From Pengfei.Li at arm.com Thu Jan 3 09:42:30 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Thu, 3 Jan 2019 09:42:30 +0000 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result Message-ID: Hi, This is a patch to fix an AArch64 string intrinsics issue. It can be reproduced by below code and JVM options. public class Test { public static void main(String[] args) { StringBuilder str = new StringBuilder("ABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890123456789"); str.setLength(str.length() - 10); System.out.println(str.indexOf("01234567890123456789")); } } $ java Test -1 $ java -Xcomp -XX:-Inline Test 26 In the case above, we firstly have a long string "ABC...Z012...9012...9" (hereinafter called "the main string") and then truncate it by removing its last 10 characters. After doing this, we can incorrectly find the pattern string ("012...9012...9") inside the main string. This bug is caused by the boundary of the main string not being checked while working on the matching in AArch64 String.indexOf() intrinsics. In the intrinsic implementation, we firstly find indexes of the first character of the pattern string (0x30 in this case) inside the main string. Each of the indexes could be a potential return value of the String.indexOf() method. And then for each index value, we compare the remaining characters inside the two strings. In this step, as Java strings in memory do not necessarily end with '\0' like C strings, we should explicitly check if the length of the remaining part of the main string is shorter than that of the pattern string. In my fix, the length of the remaining part of the main string is calculated after we found a first-character-match. The length value is put into the ch2 register (as it can be used as a temp according to the code context) and then compared to the length of the pattern string (in cnt1). The compare and branch code is like below. __ cmp(ch2, cnt1); __ br(__ LT, NOMATCH); Here we directly branch to the NOMATCH label since if the remaining part of the main string has fewer characters, there would not be any other pattern string match after current first-character-match index. The length calculation and compare code is added at two positions in my patch, as there are two different first-character-match exits (L_HAS_ZERO and L_SMALL_HAS_ZERO) in the original intrinsic code. I also fixed the cnt2 value (which is used to count the number of bytes not processed in the main string) as well as some branch conditions in my patch. Because cnt2 always counts one more byte than the actual length. Fixing that makes the number of remaining bytes in the main string easier to be calculated. JBS: https://bugs.openjdk.java.net/browse/JDK-8215792 webrev: http://cr.openjdk.java.net/~pli/rfr/8215792/webrev.00/ Could anyone help review this fix? -- Thanks, Pengfei From aph at redhat.com Thu Jan 3 12:12:58 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 3 Jan 2019 12:12:58 +0000 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: References: Message-ID: <6959eea4-be01-5302-9ecb-1631066adb9e@redhat.com> On 1/3/19 9:42 AM, Pengfei Li (Arm Technology China) wrote: > JBS: https://bugs.openjdk.java.net/browse/JDK-8215792 > webrev: http://cr.openjdk.java.net/~pli/rfr/8215792/webrev.00/ > > Could anyone help review this fix? I'm looking now. Thanks. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dmitrij.pochepko at bell-sw.com Thu Jan 3 13:19:26 2019 From: dmitrij.pochepko at bell-sw.com (dmitrij.pochepko at bell-sw.com) Date: Thu, 03 Jan 2019 16:19:26 +0300 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: References: Message-ID: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Thu Jan 3 14:17:01 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 3 Jan 2019 14:17:01 +0000 Subject: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays Message-ID: <1c4646d554954551b73c077fa40f983d@sap.com> Hi, the JVM on PPC64 currently misses usage of the fast vector implementation in the interpreter code. In addition, performance is not good for short arrays (unaligned 512 byte arrays or shorter arrays) because the current vector implementation needs at least 512 bytes. Bug: https://bugs.openjdk.java.net/browse/JDK-8216060 I have addressed these 2 issues + some cleanup with the following webrev: http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ Please review. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From gromero at linux.vnet.ibm.com Thu Jan 3 16:13:23 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Thu, 3 Jan 2019 14:13:23 -0200 Subject: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: <1c4646d554954551b73c077fa40f983d@sap.com> References: <1c4646d554954551b73c077fa40f983d@sap.com> Message-ID: <771a457e-6b4d-f73b-e072-703490c9ced5@linux.vnet.ibm.com> Hi Martin, oh that's nice. You removed the 512-byte block constraint and also wired it up to the Interpreter :) For the worst case, unaligned 512 byte array, I see the gap to aligned 512 byte array reduced by about ~5.7x. On the Interpreter I see an improvement of at least 50% for 1024 bytes. This is all for the CRC32 class. On CRC32C I'm getting a SIGSEV that can be reproduced running against ./test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java. I've upload a full log into http://cr.openjdk.java.net/~gromero/logs/crc32c_sigsegv/ I'm leaving for the lunch and I'll take a closer look when back. But probably you will figure it out before I sit to appreciate the meal :) Finally, since the change does some cleanup, I wonder if it would be worth fixing the following typos: I think it's Barrett const., not Barret. Probably 'barret' is used in the code as a short version for Barrett but it should be changed in + // Point to Barret constants + add_const_optimized(cur_const, constants, outer_consts_size + inner_consts_size); + ? s/not/note/ in: cpu/ppc/macroAssembler_ppc.cpp:3977:// A not on the lookup table address(es): d/lives/ in: cpu/ppc/macroAssembler_ppc.cpp:4265: mtvrwz(VCRC, crc); // crc lives lives in VCRC, now Best regards, Gustavo On 01/03/2019 12:17 PM, Doerr, Martin wrote: > Hi, > > the JVM on PPC64 currently misses usage of the fast vector implementation in the interpreter code. > > In addition, performance is not good for short arrays (unaligned 512 byte arrays or shorter arrays) because the current vector implementation needs at least 512 bytes. > > Bug: > > https://bugs.openjdk.java.net/browse/JDK-8216060 > > I have addressed these 2 issues + some cleanup with the following webrev: > > http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ > > Please review. > > Best regards, > > Martin > From martin.doerr at sap.com Thu Jan 3 17:34:57 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 3 Jan 2019 17:34:57 +0000 Subject: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: <771a457e-6b4d-f73b-e072-703490c9ced5@linux.vnet.ibm.com> References: <1c4646d554954551b73c077fa40f983d@sap.com> <771a457e-6b4d-f73b-e072-703490c9ced5@linux.vnet.ibm.com> Message-ID: <9863276de30643338249ead2a6ac7fe9@sap.com> Hi Gustavo, thanks for testing and your feedback. I just fixed the comment typos in place. Unfortunately, I can't reproduce the crash. TestCRC32C works stable on our machine (with fastdbg build). I guess that the frameless spills mess up the stack. Can you check if the patch below helps? Best regards, Martin diff -r a33f49d5998c src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 17:30:03 2019 +0100 +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 18:33:16 2019 +0100 @@ -1924,6 +1924,9 @@ __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); } + // Restore caller sp for c2i case. + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. + StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); if (!VM_Version::has_vpmsumb()) { @@ -1933,8 +1936,6 @@ __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, true); } - // Restore caller sp for c2i case and return. - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. __ blr(); // Generate a vanilla native entry as the slow path. @@ -2014,6 +2015,9 @@ __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); } + // Restore caller sp for c2i case. + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. + StubRoutines::ppc64::generate_load_crc32c_table_addr(_masm, table); if (!VM_Version::has_vpmsumb()) { @@ -2023,8 +2027,6 @@ __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, false); } - // Restore caller sp for c2i case and return. - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. __ blr(); BLOCK_COMMENT("} CRC32C_update{Bytes|DirectByteBuffer}"); -----Original Message----- From: Gustavo Romero Sent: Donnerstag, 3. Januar 2019 17:13 To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays Hi Martin, oh that's nice. You removed the 512-byte block constraint and also wired it up to the Interpreter :) For the worst case, unaligned 512 byte array, I see the gap to aligned 512 byte array reduced by about ~5.7x. On the Interpreter I see an improvement of at least 50% for 1024 bytes. This is all for the CRC32 class. On CRC32C I'm getting a SIGSEV that can be reproduced running against ./test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java. I've upload a full log into http://cr.openjdk.java.net/~gromero/logs/crc32c_sigsegv/ I'm leaving for the lunch and I'll take a closer look when back. But probably you will figure it out before I sit to appreciate the meal :) Finally, since the change does some cleanup, I wonder if it would be worth fixing the following typos: I think it's Barrett const., not Barret. Probably 'barret' is used in the code as a short version for Barrett but it should be changed in + // Point to Barret constants + add_const_optimized(cur_const, constants, outer_consts_size + inner_consts_size); + ? s/not/note/ in: cpu/ppc/macroAssembler_ppc.cpp:3977:// A not on the lookup table address(es): d/lives/ in: cpu/ppc/macroAssembler_ppc.cpp:4265: mtvrwz(VCRC, crc); // crc lives lives in VCRC, now Best regards, Gustavo On 01/03/2019 12:17 PM, Doerr, Martin wrote: > Hi, > > the JVM on PPC64 currently misses usage of the fast vector implementation in the interpreter code. > > In addition, performance is not good for short arrays (unaligned 512 byte arrays or shorter arrays) because the current vector implementation needs at least 512 bytes. > > Bug: > > https://bugs.openjdk.java.net/browse/JDK-8216060 > > I have addressed these 2 issues + some cleanup with the following webrev: > > http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ > > Please review. > > Best regards, > > Martin > From gromero at linux.vnet.ibm.com Thu Jan 3 18:36:16 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Thu, 3 Jan 2019 16:36:16 -0200 Subject: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: <9863276de30643338249ead2a6ac7fe9@sap.com> References: <1c4646d554954551b73c077fa40f983d@sap.com> <771a457e-6b4d-f73b-e072-703490c9ced5@linux.vnet.ibm.com> <9863276de30643338249ead2a6ac7fe9@sap.com> Message-ID: <452dcb69-189e-700d-5995-582ba13669b9@linux.vnet.ibm.com> Hi Martin, On 01/03/2019 03:34 PM, Doerr, Martin wrote: > Unfortunately, I can't reproduce the crash. TestCRC32C works stable on our machine (with fastdbg build). > I guess that the frameless spills mess up the stack. Can you check if the patch below helps? Thanks for providing a fix so I can try it. Yes, I confirm the patch below indeed fixes the sigsegv crash when CRC32C update() method is used. I also confirm that I don't observe the crash on the fastdebug build, only on the release build. It also only affects the Interpreter mode, so passing -Xcomp avoids the crash on the release build. Just as reference, I can reproduce it on the release build with the following trivial code: import java.util.zip.CRC32C; class CRC32C_v1 { public static void main(String[] arg) { byte[] b = new byte[1024]; CRC32C crc32c = new CRC32C(); crc32c.update(b, 0, b.length); System.out.println(crc32c.getValue()); } } Thanks for fixing the typos. Best regards, Gustavo > Best regards, > Martin > > > diff -r a33f49d5998c src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp > --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 17:30:03 2019 +0100 > +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 18:33:16 2019 +0100 > @@ -1924,6 +1924,9 @@ > __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); > } > > + // Restore caller sp for c2i case. > + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. > + > StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); > > if (!VM_Version::has_vpmsumb()) { > @@ -1933,8 +1936,6 @@ > __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, true); > } > > - // Restore caller sp for c2i case and return. > - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. > __ blr(); > > // Generate a vanilla native entry as the slow path. > @@ -2014,6 +2015,9 @@ > __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); > } > > + // Restore caller sp for c2i case. > + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. > + > StubRoutines::ppc64::generate_load_crc32c_table_addr(_masm, table); > > if (!VM_Version::has_vpmsumb()) { > @@ -2023,8 +2027,6 @@ > __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, false); > } > > - // Restore caller sp for c2i case and return. > - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. > __ blr(); > > BLOCK_COMMENT("} CRC32C_update{Bytes|DirectByteBuffer}"); > > > -----Original Message----- > From: Gustavo Romero > Sent: Donnerstag, 3. Januar 2019 17:13 > To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays > > Hi Martin, > > oh that's nice. You removed the 512-byte block constraint and also wired it up to the Interpreter :) > > For the worst case, unaligned 512 byte array, I see the gap to aligned 512 byte array reduced by about ~5.7x. > > On the Interpreter I see an improvement of at least 50% for 1024 bytes. > > This is all for the CRC32 class. > > On CRC32C I'm getting a SIGSEV that can be reproduced running against ./test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java. > > I've upload a full log into http://cr.openjdk.java.net/~gromero/logs/crc32c_sigsegv/ > > I'm leaving for the lunch and I'll take a closer look when back. But probably you will figure it out before I sit to appreciate the meal :) > > Finally, since the change does some cleanup, I wonder if it would be worth fixing the following typos: > > I think it's Barrett const., not Barret. Probably 'barret' is used in the code as a short version > for Barrett but it should be changed in > > + // Point to Barret constants > + add_const_optimized(cur_const, constants, outer_consts_size + inner_consts_size); > + > > ? > > s/not/note/ in: > cpu/ppc/macroAssembler_ppc.cpp:3977:// A not on the lookup table address(es): > > d/lives/ in: > cpu/ppc/macroAssembler_ppc.cpp:4265: mtvrwz(VCRC, crc); // crc lives lives in VCRC, now > > Best regards, > Gustavo > > On 01/03/2019 12:17 PM, Doerr, Martin wrote: >> Hi, >> >> the JVM on PPC64 currently misses usage of the fast vector implementation in the interpreter code. >> >> In addition, performance is not good for short arrays (unaligned 512 byte arrays or shorter arrays) because the current vector implementation needs at least 512 bytes. >> >> Bug: >> >> https://bugs.openjdk.java.net/browse/JDK-8216060 >> >> I have addressed these 2 issues + some cleanup with the following webrev: >> >> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ >> >> Please review. >> >> Best regards, >> >> Martin >> > From sandhya.viswanathan at intel.com Thu Jan 3 18:35:25 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Thu, 3 Jan 2019 18:35:25 +0000 Subject: RFR (XS): 8215888: Register to register spill may use AVX 512 move instruction on unsupported platform In-Reply-To: <78a0415c-9cce-dea0-dd37-3fa22692ac7b@oracle.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A363D9@FMSMSX126.amr.corp.intel.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A36514@FMSMSX126.amr.corp.intel.com> <7b19baa8-c48f-d0aa-02c0-aacaac3e984b@oracle.com> <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A365AB@FMSMSX126.amr.corp.intel.com> <78a0415c-9cce-dea0-dd37-3fa22692ac7b@oracle.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A444E2@FMSMSX126.amr.corp.intel.com> Thanks a lot Tobias! I will work with Vivek to push the patch to jdk/jdk12. Best Regards, Sandhya -----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Thursday, January 03, 2019 1:01 AM To: Viswanathan, Sandhya ; Vladimir Ivanov ; hotspot compiler ; vladimir.kozlov at oracle.com Subject: Re: RFR (XS): 8215888: Register to register spill may use AVX 512 move instruction on unsupported platform Hi Sandhya, all webrevs look good to me (and the tests submitted by Vladimir passed). You don't need to push to JDK 13 because patches pushed to JDK 12 will be synced with mainline automatically: https://mail.openjdk.java.net/pipermail/jdk-dev/2018-December/002376.html So I would suggest to push the patch to jdk/jdk12 and request a backport to JDK 11u after some iterations of nightly testing have passed. Best regards, Tobias On 22.12.18 01:55, Viswanathan, Sandhya wrote: > Thanks a lot! I have also created backport patches for JDK 12 and JDK 11.0.2 as this bug affects those versions too. The below are for your consideration: > > JDK 12: > http://cr.openjdk.java.net/~sviswanathan/8215888/jdk12/webrev.01/ > JDK11u: > http://cr.openjdk.java.net/~sviswanathan/8215888/jdk11u/webrev.01/ > > The compiler jtreg testing passes for these as well. > > Best Regards, > Sandhya > > > -----Original Message----- > From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] > Sent: Friday, December 21, 2018 4:27 PM > To: Viswanathan, Sandhya ; hotspot > compiler ; > vladimir.kozlov at oracle.com > Subject: Re: RFR (XS): 8215888: Register to register spill may use AVX > 512 move instruction on unsupported platform > > >> Please find the updated webrev with your comments incorporated at: >> >> http://cr.openjdk.java.net/~sviswanathan/8215888/webrev.01/ > > Thanks, submitted for testing. > > Best regards, > Vladimir Ivanov > >> -----Original Message----- >> From: Vladimir Ivanov [mailto:vladimir.x.ivanov at oracle.com] >> Sent: Friday, December 21, 2018 12:00 PM >> To: Viswanathan, Sandhya ; hotspot >> compiler ; >> vladimir.kozlov at oracle.com >> Subject: Re: RFR (XS): 8215888: Register to register spill may use >> AVX 512 move instruction on unsupported platform >> >> Sandhya, >> >> I'd prefer to see the check inverted: >> >> if (UseAVX > 2 && !VM_Version::supports_avx512vl()) { >> int vector_len = 2; >> __ evmovdquq($dst$$XMMRegister, $src$$XMMRegister, vector_len); >> } else { >> __ movdqu($dst$$XMMRegister, $src$$XMMRegister); >> } >> >> It looks easier to read considering the code around is full of "UseAVX > 2" checks. >> >> By coincidence I was debugging the very same bug today and at first I didn't notice the problem with "UseAVX < 2" misreading it as "UseAVX > 2". >> >> Otherwise, looks good. >> >> Best regards, >> Vladimir Ivanov >> >> On 21/12/2018 11:44, Viswanathan, Sandhya wrote: >>> Hi All, >>> >>> We noticed that the register to register moves in x86.ad file >>> attempt to generate emovdqu when UseAVX==2. >>> >>> The instruction emovdquq is only supported on platforms where UseAVX >>> > >>> 2 (AVX 512). >>> >>> The following rules in x86.ad file need to be corrected: >>> >>> MoveVecX2Leg >>> >>> MoveLeg2VecX >>> >>> MoveVecY2Leg >>> >>> MoveLeg2VecY >>> >>> The above move rules when activated through register allocator could >>> result in illegal instruction exception. >>> >>> Bug: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8215888 >>> >>> This bug affects versions 11.0.2, 12 and the mainline. >>> >>> Webrev for jdk mainline: >>> >>> http://cr.openjdk.java.net/~sviswanathan/8215888/webrev.00/ >>> >>> This webrev passes jtreg compiler tests on Haswell and SKX. >>> >>> Best Regards, >>> >>> Sandhya >>> From Pengfei.Li at arm.com Fri Jan 4 08:52:17 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Fri, 4 Jan 2019 08:52:17 +0000 Subject: [aarch64-port-dev ] RFR(S): 8214922: Add vectorization support for fmin/fmax In-Reply-To: <87va371n6b.fsf@redhat.com> References: <87d0pv2iow.fsf@redhat.com> <877eg32bzq.fsf@redhat.com> <871s6a3map.fsf@redhat.com> <87va371n6b.fsf@redhat.com> Message-ID: Hi, > > http://cr.openjdk.java.net/~pli/rfr/8214922/webrev.01/ > > That looks good to me. > Thanks Roland. May I have other review comments for this 2nd webrev? -- Thanks, Pengfei From martin.doerr at sap.com Fri Jan 4 09:30:43 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 4 Jan 2019 09:30:43 +0000 Subject: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: <452dcb69-189e-700d-5995-582ba13669b9@linux.vnet.ibm.com> References: <1c4646d554954551b73c077fa40f983d@sap.com> <771a457e-6b4d-f73b-e072-703490c9ced5@linux.vnet.ibm.com> <9863276de30643338249ead2a6ac7fe9@sap.com> <452dcb69-189e-700d-5995-582ba13669b9@linux.vnet.ibm.com> Message-ID: <37d9e6f3b2b4400d8963f54d2fe7767f@sap.com> Hi Gustavo, thank you very much for confirming. This makes sense. We use different frame headers depending on whether the frame is the top Java frame or not (and on whether it's a debug build or not). Setting R1_SP to sender_SP is a shortcut for leaf calls which relies on having an unmodified stack until this point. So the patch fixes the issue. New webrev: http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ Best regards, Martin -----Original Message----- From: Gustavo Romero Sent: Donnerstag, 3. Januar 2019 19:36 To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays Hi Martin, On 01/03/2019 03:34 PM, Doerr, Martin wrote: > Unfortunately, I can't reproduce the crash. TestCRC32C works stable on our machine (with fastdbg build). > I guess that the frameless spills mess up the stack. Can you check if the patch below helps? Thanks for providing a fix so I can try it. Yes, I confirm the patch below indeed fixes the sigsegv crash when CRC32C update() method is used. I also confirm that I don't observe the crash on the fastdebug build, only on the release build. It also only affects the Interpreter mode, so passing -Xcomp avoids the crash on the release build. Just as reference, I can reproduce it on the release build with the following trivial code: import java.util.zip.CRC32C; class CRC32C_v1 { public static void main(String[] arg) { byte[] b = new byte[1024]; CRC32C crc32c = new CRC32C(); crc32c.update(b, 0, b.length); System.out.println(crc32c.getValue()); } } Thanks for fixing the typos. Best regards, Gustavo > Best regards, > Martin > > > diff -r a33f49d5998c src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp > --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 17:30:03 2019 +0100 > +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 18:33:16 2019 +0100 > @@ -1924,6 +1924,9 @@ > __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); > } > > + // Restore caller sp for c2i case. > + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. > + > StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); > > if (!VM_Version::has_vpmsumb()) { > @@ -1933,8 +1936,6 @@ > __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, true); > } > > - // Restore caller sp for c2i case and return. > - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. > __ blr(); > > // Generate a vanilla native entry as the slow path. > @@ -2014,6 +2015,9 @@ > __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); > } > > + // Restore caller sp for c2i case. > + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. > + > StubRoutines::ppc64::generate_load_crc32c_table_addr(_masm, table); > > if (!VM_Version::has_vpmsumb()) { > @@ -2023,8 +2027,6 @@ > __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, false); > } > > - // Restore caller sp for c2i case and return. > - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. > __ blr(); > > BLOCK_COMMENT("} CRC32C_update{Bytes|DirectByteBuffer}"); > > > -----Original Message----- > From: Gustavo Romero > Sent: Donnerstag, 3. Januar 2019 17:13 > To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays > > Hi Martin, > > oh that's nice. You removed the 512-byte block constraint and also wired it up to the Interpreter :) > > For the worst case, unaligned 512 byte array, I see the gap to aligned 512 byte array reduced by about ~5.7x. > > On the Interpreter I see an improvement of at least 50% for 1024 bytes. > > This is all for the CRC32 class. > > On CRC32C I'm getting a SIGSEV that can be reproduced running against ./test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java. > > I've upload a full log into http://cr.openjdk.java.net/~gromero/logs/crc32c_sigsegv/ > > I'm leaving for the lunch and I'll take a closer look when back. But probably you will figure it out before I sit to appreciate the meal :) > > Finally, since the change does some cleanup, I wonder if it would be worth fixing the following typos: > > I think it's Barrett const., not Barret. Probably 'barret' is used in the code as a short version > for Barrett but it should be changed in > > + // Point to Barret constants > + add_const_optimized(cur_const, constants, outer_consts_size + inner_consts_size); > + > > ? > > s/not/note/ in: > cpu/ppc/macroAssembler_ppc.cpp:3977:// A not on the lookup table address(es): > > d/lives/ in: > cpu/ppc/macroAssembler_ppc.cpp:4265: mtvrwz(VCRC, crc); // crc lives lives in VCRC, now > > Best regards, > Gustavo > > On 01/03/2019 12:17 PM, Doerr, Martin wrote: >> Hi, >> >> the JVM on PPC64 currently misses usage of the fast vector implementation in the interpreter code. >> >> In addition, performance is not good for short arrays (unaligned 512 byte arrays or shorter arrays) because the current vector implementation needs at least 512 bytes. >> >> Bug: >> >> https://bugs.openjdk.java.net/browse/JDK-8216060 >> >> I have addressed these 2 issues + some cleanup with the following webrev: >> >> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ >> >> Please review. >> >> Best regards, >> >> Martin >> > From Pengfei.Li at arm.com Fri Jan 4 11:04:40 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Fri, 4 Jan 2019 11:04:40 +0000 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> References: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> Message-ID: Hi Dmitrij, Thanks a lot for your reply. > since cnt2 is used as counter, wouldn't it be easier and shorter just to substract cnt1 from cnt2 at the beginning of this code. Total (cnt2 - cnt1 +1) combinations must be checked. That is why first sustraction is by (wordSize/str2_chr_size - 1). > Then whole fix will be probably just 1 line at the beginning: sub(cnt2, cnt2, cnt1); I don't think the whole fix could be as easy as "sub(cnt2, cnt2, cnt1)" because cnt2 is the counter which counts number of bytes not processed. It could be different from the number of bytes after current first-character-match index. But this is just my thought. Perhaps I didn't understand your idea and code thoroughly. So could you post your shorter fix and let's test if it's right? -- Thanks, Pengfei From aph at redhat.com Fri Jan 4 12:13:26 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 4 Jan 2019 12:13:26 +0000 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: References: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> Message-ID: On 1/4/19 11:04 AM, Pengfei Li (Arm Technology China) wrote: > But this is just my thought. Perhaps I didn't understand your idea > and code thoroughly. So could you post your shorter fix and let's > test if it's right? I agree, that's the best way to proceed. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dmitrij.pochepko at bell-sw.com Fri Jan 4 12:52:03 2019 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Fri, 4 Jan 2019 15:52:03 +0300 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: References: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> Message-ID: Sure. I could miss something, so, need to try it. I'll send webrev with patch once it's done. Thanks, Dmitrij On 04.01.2019 14:04, Pengfei Li (Arm Technology China) wrote: > Hi Dmitrij, > > Thanks a lot for your reply. > >> since cnt2 is used as counter, wouldn't it be easier and shorter just to substract cnt1 from cnt2 at the beginning of this code. Total (cnt2 - cnt1 +1) combinations must be checked. That is why first sustraction is by (wordSize/str2_chr_size - 1). >> Then whole fix will be probably just 1 line at the beginning: sub(cnt2, cnt2, cnt1); > I don't think the whole fix could be as easy as "sub(cnt2, cnt2, cnt1)" because cnt2 is the counter which counts number of bytes not processed. It could be different from the number of bytes after current first-character-match index. > > But this is just my thought. Perhaps I didn't understand your idea and code thoroughly. So could you post your shorter fix and let's test if it's right? > > -- > Thanks, > Pengfei > From gromero at linux.vnet.ibm.com Fri Jan 4 13:44:27 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Fri, 4 Jan 2019 11:44:27 -0200 Subject: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: <37d9e6f3b2b4400d8963f54d2fe7767f@sap.com> References: <1c4646d554954551b73c077fa40f983d@sap.com> <771a457e-6b4d-f73b-e072-703490c9ced5@linux.vnet.ibm.com> <9863276de30643338249ead2a6ac7fe9@sap.com> <452dcb69-189e-700d-5995-582ba13669b9@linux.vnet.ibm.com> <37d9e6f3b2b4400d8963f54d2fe7767f@sap.com> Message-ID: <406db3e3-2ac3-dc16-f384-99a314e62a42@linux.vnet.ibm.com> Hi Martin, On 01/04/2019 07:30 AM, Doerr, Martin wrote: > thank you very much for confirming. This makes sense. We use different frame headers depending on whether the frame is the top Java frame or not (and on whether it's a debug build or not). Setting R1_SP to sender_SP is a shortcut for leaf calls which relies on having an unmodified stack until this point. So the patch fixes the issue. Glad to help! Thanks for the additional information, I was not aware that the selection of different frame headers could be done at compile time. One last question only for my education: what exactly advanced (incremented) R1_SP so it has to be cut back using sender_SP value, i.e. sender_SP tracks the frame for which function exactly or "who" is the caller exactly here? Thank you. Best regards, Gustavo > New webrev: > http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ > > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero > Sent: Donnerstag, 3. Januar 2019 19:36 > To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays > > Hi Martin, > > On 01/03/2019 03:34 PM, Doerr, Martin wrote: >> Unfortunately, I can't reproduce the crash. TestCRC32C works stable on our machine (with fastdbg build). >> I guess that the frameless spills mess up the stack. Can you check if the patch below helps? > > Thanks for providing a fix so I can try it. > Yes, I confirm the patch below indeed fixes the sigsegv crash when CRC32C update() method is used. > I also confirm that I don't observe the crash on the fastdebug build, only on the release build. > It also only affects the Interpreter mode, so passing -Xcomp avoids the crash on the release build. > > Just as reference, I can reproduce it on the release build with the following trivial code: > > import java.util.zip.CRC32C; > > class CRC32C_v1 { > public static void main(String[] arg) { > byte[] b = new byte[1024]; > > CRC32C crc32c = new CRC32C(); > crc32c.update(b, 0, b.length); > > System.out.println(crc32c.getValue()); > } > } > > Thanks for fixing the typos. > > > Best regards, > Gustavo > >> Best regards, >> Martin >> >> >> diff -r a33f49d5998c src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp >> --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 17:30:03 2019 +0100 >> +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 18:33:16 2019 +0100 >> @@ -1924,6 +1924,9 @@ >> __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); >> } >> >> + // Restore caller sp for c2i case. >> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >> + >> StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); >> >> if (!VM_Version::has_vpmsumb()) { >> @@ -1933,8 +1936,6 @@ >> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, true); >> } >> >> - // Restore caller sp for c2i case and return. >> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >> __ blr(); >> >> // Generate a vanilla native entry as the slow path. >> @@ -2014,6 +2015,9 @@ >> __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); >> } >> >> + // Restore caller sp for c2i case. >> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >> + >> StubRoutines::ppc64::generate_load_crc32c_table_addr(_masm, table); >> >> if (!VM_Version::has_vpmsumb()) { >> @@ -2023,8 +2027,6 @@ >> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, false); >> } >> >> - // Restore caller sp for c2i case and return. >> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >> __ blr(); >> >> BLOCK_COMMENT("} CRC32C_update{Bytes|DirectByteBuffer}"); >> >> >> -----Original Message----- >> From: Gustavo Romero >> Sent: Donnerstag, 3. Januar 2019 17:13 >> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >> >> Hi Martin, >> >> oh that's nice. You removed the 512-byte block constraint and also wired it up to the Interpreter :) >> >> For the worst case, unaligned 512 byte array, I see the gap to aligned 512 byte array reduced by about ~5.7x. >> >> On the Interpreter I see an improvement of at least 50% for 1024 bytes. >> >> This is all for the CRC32 class. >> >> On CRC32C I'm getting a SIGSEV that can be reproduced running against ./test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java. >> >> I've upload a full log into http://cr.openjdk.java.net/~gromero/logs/crc32c_sigsegv/ >> >> I'm leaving for the lunch and I'll take a closer look when back. But probably you will figure it out before I sit to appreciate the meal :) >> >> Finally, since the change does some cleanup, I wonder if it would be worth fixing the following typos: >> >> I think it's Barrett const., not Barret. Probably 'barret' is used in the code as a short version >> for Barrett but it should be changed in >> >> + // Point to Barret constants >> + add_const_optimized(cur_const, constants, outer_consts_size + inner_consts_size); >> + >> >> ? >> >> s/not/note/ in: >> cpu/ppc/macroAssembler_ppc.cpp:3977:// A not on the lookup table address(es): >> >> d/lives/ in: >> cpu/ppc/macroAssembler_ppc.cpp:4265: mtvrwz(VCRC, crc); // crc lives lives in VCRC, now >> >> Best regards, >> Gustavo >> >> On 01/03/2019 12:17 PM, Doerr, Martin wrote: >>> Hi, >>> >>> the JVM on PPC64 currently misses usage of the fast vector implementation in the interpreter code. >>> >>> In addition, performance is not good for short arrays (unaligned 512 byte arrays or shorter arrays) because the current vector implementation needs at least 512 bytes. >>> >>> Bug: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8216060 >>> >>> I have addressed these 2 issues + some cleanup with the following webrev: >>> >>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ >>> >>> Please review. >>> >>> Best regards, >>> >>> Martin >>> >> > From martin.doerr at sap.com Fri Jan 4 16:13:56 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 4 Jan 2019 16:13:56 +0000 Subject: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: <406db3e3-2ac3-dc16-f384-99a314e62a42@linux.vnet.ibm.com> References: <1c4646d554954551b73c077fa40f983d@sap.com> <771a457e-6b4d-f73b-e072-703490c9ced5@linux.vnet.ibm.com> <9863276de30643338249ead2a6ac7fe9@sap.com> <452dcb69-189e-700d-5995-582ba13669b9@linux.vnet.ibm.com> <37d9e6f3b2b4400d8963f54d2fe7767f@sap.com> <406db3e3-2ac3-dc16-f384-99a314e62a42@linux.vnet.ibm.com> Message-ID: Hi Gustavo, when called from the interpreter (the scenario you observed), R21 is set before resizing the frame to avoid wasted stack space (InterpreterMacroAssembler::call_from_interpreter). When called from compiled methods, R21 is set by a c2i adapter which extends the compiled frame by space for arguments (gen_c2i_adapter). "mr(R1_SP, R21_sender_SP)" is more error-prone than "resize_frame_absolute" so I think the latter would be better (though it takes more registers and instructions), but I don't want to replace that as part of this CRC change. Best regards, Martin -----Original Message----- From: Gustavo Romero Sent: Freitag, 4. Januar 2019 14:44 To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays Hi Martin, On 01/04/2019 07:30 AM, Doerr, Martin wrote: > thank you very much for confirming. This makes sense. We use different frame headers depending on whether the frame is the top Java frame or not (and on whether it's a debug build or not). Setting R1_SP to sender_SP is a shortcut for leaf calls which relies on having an unmodified stack until this point. So the patch fixes the issue. Glad to help! Thanks for the additional information, I was not aware that the selection of different frame headers could be done at compile time. One last question only for my education: what exactly advanced (incremented) R1_SP so it has to be cut back using sender_SP value, i.e. sender_SP tracks the frame for which function exactly or "who" is the caller exactly here? Thank you. Best regards, Gustavo > New webrev: > http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ > > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero > Sent: Donnerstag, 3. Januar 2019 19:36 > To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays > > Hi Martin, > > On 01/03/2019 03:34 PM, Doerr, Martin wrote: >> Unfortunately, I can't reproduce the crash. TestCRC32C works stable on our machine (with fastdbg build). >> I guess that the frameless spills mess up the stack. Can you check if the patch below helps? > > Thanks for providing a fix so I can try it. > Yes, I confirm the patch below indeed fixes the sigsegv crash when CRC32C update() method is used. > I also confirm that I don't observe the crash on the fastdebug build, only on the release build. > It also only affects the Interpreter mode, so passing -Xcomp avoids the crash on the release build. > > Just as reference, I can reproduce it on the release build with the following trivial code: > > import java.util.zip.CRC32C; > > class CRC32C_v1 { > public static void main(String[] arg) { > byte[] b = new byte[1024]; > > CRC32C crc32c = new CRC32C(); > crc32c.update(b, 0, b.length); > > System.out.println(crc32c.getValue()); > } > } > > Thanks for fixing the typos. > > > Best regards, > Gustavo > >> Best regards, >> Martin >> >> >> diff -r a33f49d5998c src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp >> --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 17:30:03 2019 +0100 >> +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 18:33:16 2019 +0100 >> @@ -1924,6 +1924,9 @@ >> __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); >> } >> >> + // Restore caller sp for c2i case. >> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >> + >> StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); >> >> if (!VM_Version::has_vpmsumb()) { >> @@ -1933,8 +1936,6 @@ >> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, true); >> } >> >> - // Restore caller sp for c2i case and return. >> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >> __ blr(); >> >> // Generate a vanilla native entry as the slow path. >> @@ -2014,6 +2015,9 @@ >> __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); >> } >> >> + // Restore caller sp for c2i case. >> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >> + >> StubRoutines::ppc64::generate_load_crc32c_table_addr(_masm, table); >> >> if (!VM_Version::has_vpmsumb()) { >> @@ -2023,8 +2027,6 @@ >> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, false); >> } >> >> - // Restore caller sp for c2i case and return. >> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >> __ blr(); >> >> BLOCK_COMMENT("} CRC32C_update{Bytes|DirectByteBuffer}"); >> >> >> -----Original Message----- >> From: Gustavo Romero >> Sent: Donnerstag, 3. Januar 2019 17:13 >> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >> >> Hi Martin, >> >> oh that's nice. You removed the 512-byte block constraint and also wired it up to the Interpreter :) >> >> For the worst case, unaligned 512 byte array, I see the gap to aligned 512 byte array reduced by about ~5.7x. >> >> On the Interpreter I see an improvement of at least 50% for 1024 bytes. >> >> This is all for the CRC32 class. >> >> On CRC32C I'm getting a SIGSEV that can be reproduced running against ./test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java. >> >> I've upload a full log into http://cr.openjdk.java.net/~gromero/logs/crc32c_sigsegv/ >> >> I'm leaving for the lunch and I'll take a closer look when back. But probably you will figure it out before I sit to appreciate the meal :) >> >> Finally, since the change does some cleanup, I wonder if it would be worth fixing the following typos: >> >> I think it's Barrett const., not Barret. Probably 'barret' is used in the code as a short version >> for Barrett but it should be changed in >> >> + // Point to Barret constants >> + add_const_optimized(cur_const, constants, outer_consts_size + inner_consts_size); >> + >> >> ? >> >> s/not/note/ in: >> cpu/ppc/macroAssembler_ppc.cpp:3977:// A not on the lookup table address(es): >> >> d/lives/ in: >> cpu/ppc/macroAssembler_ppc.cpp:4265: mtvrwz(VCRC, crc); // crc lives lives in VCRC, now >> >> Best regards, >> Gustavo >> >> On 01/03/2019 12:17 PM, Doerr, Martin wrote: >>> Hi, >>> >>> the JVM on PPC64 currently misses usage of the fast vector implementation in the interpreter code. >>> >>> In addition, performance is not good for short arrays (unaligned 512 byte arrays or shorter arrays) because the current vector implementation needs at least 512 bytes. >>> >>> Bug: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8216060 >>> >>> I have addressed these 2 issues + some cleanup with the following webrev: >>> >>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ >>> >>> Please review. >>> >>> Best regards, >>> >>> Martin >>> >> > From gromero at linux.vnet.ibm.com Fri Jan 4 18:54:32 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Fri, 4 Jan 2019 16:54:32 -0200 Subject: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: References: <1c4646d554954551b73c077fa40f983d@sap.com> <771a457e-6b4d-f73b-e072-703490c9ced5@linux.vnet.ibm.com> <9863276de30643338249ead2a6ac7fe9@sap.com> <452dcb69-189e-700d-5995-582ba13669b9@linux.vnet.ibm.com> <37d9e6f3b2b4400d8963f54d2fe7767f@sap.com> <406db3e3-2ac3-dc16-f384-99a314e62a42@linux.vnet.ibm.com> Message-ID: <180a6c0b-7abe-9d5c-51e6-dffbb23570d3@linux.vnet.ibm.com> Hi Martin, On 01/04/2019 02:13 PM, Doerr, Martin wrote: > Hi Gustavo, > > when called from the interpreter (the scenario you observed), R21 is set before resizing the frame to avoid wasted stack space (InterpreterMacroAssembler::call_from_interpreter). Got it. Thanks a lot for the explanations. I think it doesn't currently matter in practice, but I'm wondering if to be consistent we should cut back the stack back earlier also in TemplateInterpreterGenerator::generate_CRC32_update_entry()? diff -r a35f8c35d8c9 src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan 04 10:09:00 2019 +0100 +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan 04 13:44:37 2019 -0500 @@ -1840,11 +1840,12 @@ #endif __ lwz(crc, 2*wordSize, argP); // Current crc state, zero extend to 64 bit to have a clean register. + // Restore caller sp for c2i case and return. + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. + StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); __ kernel_crc32_singleByte(crc, data, dataLen, table, tmp, true); - // Restore caller sp for c2i case and return. - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. __ blr(); // Generate a vanilla native entry as the slow path. Currently there is no issue probably because generated code is simpler and does no spills. Best regards, Gustavo > When called from compiled methods, R21 is set by a c2i adapter which extends the compiled frame by space for arguments (gen_c2i_adapter). > > "mr(R1_SP, R21_sender_SP)" is more error-prone than "resize_frame_absolute" so I think the latter would be better (though it takes more registers and instructions), but I don't want to replace that as part of this CRC change. > > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero > Sent: Freitag, 4. Januar 2019 14:44 > To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays > > Hi Martin, > > On 01/04/2019 07:30 AM, Doerr, Martin wrote: >> thank you very much for confirming. This makes sense. We use different frame headers depending on whether the frame is the top Java frame or not (and on whether it's a debug build or not). Setting R1_SP to sender_SP is a shortcut for leaf calls which relies on having an unmodified stack until this point. So the patch fixes the issue. > > Glad to help! Thanks for the additional information, I was not aware that the > selection of different frame headers could be done at compile time. One last > question only for my education: what exactly advanced (incremented) R1_SP so it > has to be cut back using sender_SP value, i.e. sender_SP tracks the frame for > which function exactly or "who" is the caller exactly here? > > Thank you. > > Best regards, > Gustavo > >> New webrev: >> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero >> Sent: Donnerstag, 3. Januar 2019 19:36 >> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >> >> Hi Martin, >> >> On 01/03/2019 03:34 PM, Doerr, Martin wrote: >>> Unfortunately, I can't reproduce the crash. TestCRC32C works stable on our machine (with fastdbg build). >>> I guess that the frameless spills mess up the stack. Can you check if the patch below helps? >> >> Thanks for providing a fix so I can try it. >> Yes, I confirm the patch below indeed fixes the sigsegv crash when CRC32C update() method is used. >> I also confirm that I don't observe the crash on the fastdebug build, only on the release build. >> It also only affects the Interpreter mode, so passing -Xcomp avoids the crash on the release build. >> >> Just as reference, I can reproduce it on the release build with the following trivial code: >> >> import java.util.zip.CRC32C; >> >> class CRC32C_v1 { >> public static void main(String[] arg) { >> byte[] b = new byte[1024]; >> >> CRC32C crc32c = new CRC32C(); >> crc32c.update(b, 0, b.length); >> >> System.out.println(crc32c.getValue()); >> } >> } >> >> Thanks for fixing the typos. >> >> >> Best regards, >> Gustavo >> >>> Best regards, >>> Martin >>> >>> >>> diff -r a33f49d5998c src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp >>> --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 17:30:03 2019 +0100 >>> +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 18:33:16 2019 +0100 >>> @@ -1924,6 +1924,9 @@ >>> __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); >>> } >>> >>> + // Restore caller sp for c2i case. >>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>> + >>> StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); >>> >>> if (!VM_Version::has_vpmsumb()) { >>> @@ -1933,8 +1936,6 @@ >>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, true); >>> } >>> >>> - // Restore caller sp for c2i case and return. >>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>> __ blr(); >>> >>> // Generate a vanilla native entry as the slow path. >>> @@ -2014,6 +2015,9 @@ >>> __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); >>> } >>> >>> + // Restore caller sp for c2i case. >>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>> + >>> StubRoutines::ppc64::generate_load_crc32c_table_addr(_masm, table); >>> >>> if (!VM_Version::has_vpmsumb()) { >>> @@ -2023,8 +2027,6 @@ >>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, false); >>> } >>> >>> - // Restore caller sp for c2i case and return. >>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>> __ blr(); >>> >>> BLOCK_COMMENT("} CRC32C_update{Bytes|DirectByteBuffer}"); >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero >>> Sent: Donnerstag, 3. Januar 2019 17:13 >>> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >>> >>> Hi Martin, >>> >>> oh that's nice. You removed the 512-byte block constraint and also wired it up to the Interpreter :) >>> >>> For the worst case, unaligned 512 byte array, I see the gap to aligned 512 byte array reduced by about ~5.7x. >>> >>> On the Interpreter I see an improvement of at least 50% for 1024 bytes. >>> >>> This is all for the CRC32 class. >>> >>> On CRC32C I'm getting a SIGSEV that can be reproduced running against ./test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java. >>> >>> I've upload a full log into http://cr.openjdk.java.net/~gromero/logs/crc32c_sigsegv/ >>> >>> I'm leaving for the lunch and I'll take a closer look when back. But probably you will figure it out before I sit to appreciate the meal :) >>> >>> Finally, since the change does some cleanup, I wonder if it would be worth fixing the following typos: >>> >>> I think it's Barrett const., not Barret. Probably 'barret' is used in the code as a short version >>> for Barrett but it should be changed in >>> >>> + // Point to Barret constants >>> + add_const_optimized(cur_const, constants, outer_consts_size + inner_consts_size); >>> + >>> >>> ? >>> >>> s/not/note/ in: >>> cpu/ppc/macroAssembler_ppc.cpp:3977:// A not on the lookup table address(es): >>> >>> d/lives/ in: >>> cpu/ppc/macroAssembler_ppc.cpp:4265: mtvrwz(VCRC, crc); // crc lives lives in VCRC, now >>> >>> Best regards, >>> Gustavo >>> >>> On 01/03/2019 12:17 PM, Doerr, Martin wrote: >>>> Hi, >>>> >>>> the JVM on PPC64 currently misses usage of the fast vector implementation in the interpreter code. >>>> >>>> In addition, performance is not good for short arrays (unaligned 512 byte arrays or shorter arrays) because the current vector implementation needs at least 512 bytes. >>>> >>>> Bug: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8216060 >>>> >>>> I have addressed these 2 issues + some cleanup with the following webrev: >>>> >>>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ >>>> >>>> Please review. >>>> >>>> Best regards, >>>> >>>> Martin >>>> >>> >> > From yasuenag at gmail.com Sat Jan 5 01:33:42 2019 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Sat, 5 Jan 2019 10:33:42 +0900 Subject: RFR: 8216154: C4819 warnings at HotSpot sources on Windows Message-ID: <9b1bd147-a261-95df-18f8-a5476415d0a4@gmail.com> Hi all, Please review this change: JBS: https://bugs.openjdk.java.net/browse/JDK-8216154 webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.00/ Discussion on build-dev: https://mail.openjdk.java.net/pipermail/build-dev/2019-January/024581.html I tried to build OpenJDK on WSL (Windows 10 1809 + VS2017 (15.9.4) + Ubuntu 18.04 LTS). However, I saw some C4819 warnings as below: ``` c:/OpenJDK/jdk/src/hotspot/share/compiler/methodMatcher.cpp(258): warning C4819: ???????????? ??? (0) ??????????????????????????????????? Unicode ???????????? ``` * The locale of my laptop is set to Japanese (CP932) I saw this warning at 2 files as below: - hotspot/share/code/codeHeapState.cpp - hotspot/share/compiler/methodMatcher.cpp We can see the problem with iconv: $ iconv -f US-ASCII -t UTF8 This change passed submit repo tests. Thanks, Yasumasa From kim.barrett at oracle.com Sun Jan 6 07:14:22 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Sun, 6 Jan 2019 02:14:22 -0500 Subject: RFR: 8216154: C4819 warnings at HotSpot sources on Windows In-Reply-To: <9b1bd147-a261-95df-18f8-a5476415d0a4@gmail.com> References: <9b1bd147-a261-95df-18f8-a5476415d0a4@gmail.com> Message-ID: > On Jan 4, 2019, at 8:33 PM, Yasumasa Suenaga wrote: > > Hi all, > > Please review this change: > > JBS: https://bugs.openjdk.java.net/browse/JDK-8216154 > webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.00/ > Discussion on build-dev: https://mail.openjdk.java.net/pipermail/build-dev/2019-January/024581.html The preferred idiom to disable a warning over some scope is to use #pragma warning(push) #pragma warning(disable : 4819) ? #pragma warning(pop) From yasuenag at gmail.com Sun Jan 6 12:53:21 2019 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Sun, 6 Jan 2019 21:53:21 +0900 Subject: RFR: 8216154: C4819 warnings at HotSpot sources on Windows In-Reply-To: References: <9b1bd147-a261-95df-18f8-a5476415d0a4@gmail.com> Message-ID: Hi Kim, Thank you for your comment. I uploaded new webrev to use pragma warning push/pop: http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.01/ Please review again. Yasumasa On 2019/01/06 16:14, Kim Barrett wrote: >> On Jan 4, 2019, at 8:33 PM, Yasumasa Suenaga wrote: >> >> Hi all, >> >> Please review this change: >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8216154 >> webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.00/ >> Discussion on build-dev: https://mail.openjdk.java.net/pipermail/build-dev/2019-January/024581.html > > The preferred idiom to disable a warning over some scope is to use > > #pragma warning(push) > #pragma warning(disable : 4819) > ? > #pragma warning(pop) > From kim.barrett at oracle.com Sun Jan 6 17:54:48 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Sun, 6 Jan 2019 12:54:48 -0500 Subject: RFR: 8216154: C4819 warnings at HotSpot sources on Windows In-Reply-To: References: <9b1bd147-a261-95df-18f8-a5476415d0a4@gmail.com> Message-ID: <0E131429-5B2F-4A2E-980A-8966354AB7F4@oracle.com> > On Jan 6, 2019, at 7:53 AM, Yasumasa Suenaga wrote: > > Hi Kim, > > Thank you for your comment. > I uploaded new webrev to use pragma warning push/pop: > > http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.01/ > > > Please review again. Looks good. From kim.barrett at oracle.com Sun Jan 6 22:18:55 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Sun, 6 Jan 2019 17:18:55 -0500 Subject: RFR: 8216154: C4819 warnings at HotSpot sources on Windows In-Reply-To: <0E131429-5B2F-4A2E-980A-8966354AB7F4@oracle.com> References: <9b1bd147-a261-95df-18f8-a5476415d0a4@gmail.com> <0E131429-5B2F-4A2E-980A-8966354AB7F4@oracle.com> Message-ID: > On Jan 6, 2019, at 12:54 PM, Kim Barrett wrote: > >> On Jan 6, 2019, at 7:53 AM, Yasumasa Suenaga wrote: >> >> Hi Kim, >> >> Thank you for your comment. >> I uploaded new webrev to use pragma warning push/pop: >> >> http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.01/ >> >> >> Please review again. > > Looks good. It later occurred to me to wonder whether _WINDOWS was the right macro to conditionalize on. All other uses of #pragma warning push/pop (there are 5 in HotSpot) use _MSC_VER. I also wonder why we don?t have a Visual Studio definition for PRAGMA_DIAG_PUSH/POP, but that?s a different issue altogether. From OGATAK at jp.ibm.com Mon Jan 7 05:13:31 2019 From: OGATAK at jp.ibm.com (Kazunori Ogata) Date: Mon, 7 Jan 2019 14:13:31 +0900 Subject: [8u] RFR for backport of 8154156: PPC64: improve array copy stubs by using vector instructions In-Reply-To: References: Message-ID: Hi, Ping. Can anyone review this enhancement backport request? Regards, Ogata Kazunori Ogata/Japan/IBM wrote on 2018/12/18 23:41:16: > From: Kazunori Ogata/Japan/IBM > To: hotspot-compiler-dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net > Date: 2018/12/18 23:41 > Subject: [8u] RFR for backport of 8154156: PPC64: improve array copy stubs > by using vector instructions > > Hi, > > May I get review for enhancement backport of 8154156: PPC64: improve array > copy stubs by using vector instructions? > > To make this patch buildable (and usable by other planned backports listed > in [1]), I cherry picked config_dscr() and its dependent code from [2,3] > and has_mfdscr() from [4]. > > Original patch: http://hg.openjdk.java.net/jdk/jdk/rev/c9d756fa846e > Weberv: http://cr.openjdk.java.net/~horii/jdk8u_aes_be/8154156/webrev.01/ > > I confirmed it was buildable for both relase and fastdebug builds, and > JTREG caused no degradation. > > Refs: > [1] http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-December/ > 003818.html > [2] 8149655: PPC64: Implement CompactString intrinsics > http://hg.openjdk.java.net/jdk/jdk/rev/6241574f5982 > [3] 8080684: PPC64: Fix little-endian build after "8077838: Recent > developments for ppc" > http://hg.openjdk.java.net/jdk/jdk/rev/12ccf8b26eb0 > [4] 8077838: Recent developments for ppc. > http://hg.openjdk.java.net/jdk/jdk/rev/c703c89fddbf > > Regards, > Ogata From claes.redestad at oracle.com Mon Jan 7 12:36:45 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 7 Jan 2019 13:36:45 +0100 Subject: RFR: 8216262: Remove develop flag DelayCompilationDuringStartup Message-ID: <64522ad1-1473-8012-b63a-5309bc72c5cc@oracle.com> Hi, DelayCompilationAtStartup doesn't delay any compilations. Webrev: http://cr.openjdk.java.net/~redestad/8216262/open.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8216262 Testing: tier1 Thanks! /Claes From yasuenag at gmail.com Mon Jan 7 12:36:24 2019 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Mon, 7 Jan 2019 21:36:24 +0900 Subject: RFR: 8216154: C4819 warnings at HotSpot sources on Windows In-Reply-To: References: <9b1bd147-a261-95df-18f8-a5476415d0a4@gmail.com> <0E131429-5B2F-4A2E-980A-8966354AB7F4@oracle.com> Message-ID: Hi Kim, On 2019/01/07 7:18, Kim Barrett wrote: >> On Jan 6, 2019, at 12:54 PM, Kim Barrett wrote: >> >>> On Jan 6, 2019, at 7:53 AM, Yasumasa Suenaga wrote: >>> >>> Hi Kim, >>> >>> Thank you for your comment. >>> I uploaded new webrev to use pragma warning push/pop: >>> >>> http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.01/ >>> >>> >>> Please review again. >> >> Looks good. > > It later occurred to me to wonder whether _WINDOWS was the right macro to conditionalize > on. All other uses of #pragma warning push/pop (there are 5 in HotSpot) use _MSC_VER. I updated webrev to use _MSC_VER. Is it ok? http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.02/ Thanks, Yasumasa > I also wonder why we don?t have a Visual Studio definition for PRAGMA_DIAG_PUSH/POP, > but that?s a different issue altogether. > From david.holmes at oracle.com Mon Jan 7 12:54:17 2019 From: david.holmes at oracle.com (David Holmes) Date: Mon, 7 Jan 2019 22:54:17 +1000 Subject: RFR: 8216262: Remove develop flag DelayCompilationDuringStartup In-Reply-To: <64522ad1-1473-8012-b63a-5309bc72c5cc@oracle.com> References: <64522ad1-1473-8012-b63a-5309bc72c5cc@oracle.com> Message-ID: <9a9f466b-5ffc-06de-9fde-6a8ef78622ee@oracle.com> Hi Claes, On 7/01/2019 10:36 pm, Claes Redestad wrote: > Hi, > > DelayCompilationAtStartup doesn't delay any compilations. > > Webrev: http://cr.openjdk.java.net/~redestad/8216262/open.00/ > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8216262 Normally we would follow a staged removal process: deprecate, obsolete, then expire - see arguments.cpp and special_jvm_flags table. In this case we can probably start at obsoletion, but that would leave expiration for JDK 14. Or compiler folk can argue for / justify immediate full expiration/removal. Cheers, David > Testing: tier1 > > Thanks! > > /Claes From claes.redestad at oracle.com Mon Jan 7 13:01:51 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 7 Jan 2019 14:01:51 +0100 Subject: RFR: 8216262: Remove develop flag DelayCompilationDuringStartup In-Reply-To: <9a9f466b-5ffc-06de-9fde-6a8ef78622ee@oracle.com> References: <64522ad1-1473-8012-b63a-5309bc72c5cc@oracle.com> <9a9f466b-5ffc-06de-9fde-6a8ef78622ee@oracle.com> Message-ID: <2708906c-c27e-4514-3066-0f2a86fbae9e@oracle.com> On 2019-01-07 13:54, David Holmes wrote: > Hi Claes, > > On 7/01/2019 10:36 pm, Claes Redestad wrote: >> Hi, >> >> DelayCompilationAtStartup doesn't delay any compilations. >> >> Webrev: http://cr.openjdk.java.net/~redestad/8216262/open.00/ >> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8216262 > > Normally we would follow a staged removal process: deprecate, obsolete, > then expire - see arguments.cpp and special_jvm_flags table. In this > case we can probably start at obsoletion, but that would leave > expiration for JDK 14. Or compiler folk can argue for / justify > immediate full expiration/removal. I'm under the impression this process does not apply to develop flags (which are not visible an anything by debug builds)? /Claes From martin.doerr at sap.com Mon Jan 7 13:08:57 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 7 Jan 2019 13:08:57 +0000 Subject: [8u] RFR for backport of 8154156: PPC64: improve array copy stubs by using vector instructions In-Reply-To: References: Message-ID: <6adf0a283eda47b29df02a3a2d8550ee@sap.com> Hi Ogata, looks good to me. However, I'm not a jdk8u reviewer. Best regards, Martin -----Original Message----- From: ppc-aix-port-dev On Behalf Of Kazunori Ogata Sent: Montag, 7. Januar 2019 06:14 To: hotspot-compiler-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net Subject: Re: [8u] RFR for backport of 8154156: PPC64: improve array copy stubs by using vector instructions Hi, Ping. Can anyone review this enhancement backport request? Regards, Ogata Kazunori Ogata/Japan/IBM wrote on 2018/12/18 23:41:16: > From: Kazunori Ogata/Japan/IBM > To: hotspot-compiler-dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net > Date: 2018/12/18 23:41 > Subject: [8u] RFR for backport of 8154156: PPC64: improve array copy stubs > by using vector instructions > > Hi, > > May I get review for enhancement backport of 8154156: PPC64: improve array > copy stubs by using vector instructions? > > To make this patch buildable (and usable by other planned backports listed > in [1]), I cherry picked config_dscr() and its dependent code from [2,3] > and has_mfdscr() from [4]. > > Original patch: http://hg.openjdk.java.net/jdk/jdk/rev/c9d756fa846e > Weberv: http://cr.openjdk.java.net/~horii/jdk8u_aes_be/8154156/webrev.01/ > > I confirmed it was buildable for both relase and fastdebug builds, and > JTREG caused no degradation. > > Refs: > [1] http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-December/ > 003818.html > [2] 8149655: PPC64: Implement CompactString intrinsics > http://hg.openjdk.java.net/jdk/jdk/rev/6241574f5982 > [3] 8080684: PPC64: Fix little-endian build after "8077838: Recent > developments for ppc" > http://hg.openjdk.java.net/jdk/jdk/rev/12ccf8b26eb0 > [4] 8077838: Recent developments for ppc. > http://hg.openjdk.java.net/jdk/jdk/rev/c703c89fddbf > > Regards, > Ogata From thomas.schatzl at oracle.com Mon Jan 7 13:20:43 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 07 Jan 2019 14:20:43 +0100 Subject: RFR: 8216154: C4819 warnings at HotSpot sources on Windows In-Reply-To: References: <9b1bd147-a261-95df-18f8-a5476415d0a4@gmail.com> <0E131429-5B2F-4A2E-980A-8966354AB7F4@oracle.com> Message-ID: <3a1836446f59ae6ac89a00dc09867c1d1e97e157.camel@oracle.com> Hi, On Mon, 2019-01-07 at 21:36 +0900, Yasumasa Suenaga wrote: > Hi Kim, > > On 2019/01/07 7:18, Kim Barrett wrote: > > > On Jan 6, 2019, at 12:54 PM, Kim Barrett > > > wrote: > > > > > > > On Jan 6, 2019, at 7:53 AM, Yasumasa Suenaga < > > > > yasuenag at gmail.com> wrote: > > > > > > > > Hi Kim, > > > > > > > > Thank you for your comment. > > > > I uploaded new webrev to use pragma warning push/pop: > > > > > > > > http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.01/ > > > > > > > > > > > > Please review again. > > > > > > Looks good. I tried to verify these problems on these two files as suggested with "iconv -f US-ASCII -t UTF8 " which errored out on codeHeapState.cpp as expected but there has been no error with methodMatcher.cpp. Am I doing something wrong? I am fine with that change if it is really needed for successful compliation :) I just can't find the non-US-ASCII character used in the line indicated by the error message. > > > > It later occurred to me to wonder whether _WINDOWS was the right > > macro to conditionalize on. All other uses of #pragma warning > > push/pop (there are 5 in HotSpot) use _MSC_VER. > > I updated webrev to use _MSC_VER. Is it ok? > > http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.02/ > Please add a "// warning C4189: The file contains a character that cannot be represented in the current code page" comment above or next to the pragma warning(disable) declaration. Not many people know the VC warning numbers by default... Looks good otherwise, I do not need a re-review for this comment change. Thanks, Thomas From claes.redestad at oracle.com Mon Jan 7 13:31:42 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 7 Jan 2019 14:31:42 +0100 Subject: RFR: 8216262: Remove develop flag DelayCompilationDuringStartup In-Reply-To: <2708906c-c27e-4514-3066-0f2a86fbae9e@oracle.com> References: <64522ad1-1473-8012-b63a-5309bc72c5cc@oracle.com> <9a9f466b-5ffc-06de-9fde-6a8ef78622ee@oracle.com> <2708906c-c27e-4514-3066-0f2a86fbae9e@oracle.com> Message-ID: <818084dc-3e98-97da-20f4-aa00f3f6545e@oracle.com> On 2019-01-07 14:01, Claes Redestad wrote: >> >> Normally we would follow a staged removal process: deprecate, >> obsolete, then expire - see arguments.cpp and special_jvm_flags table. >> In this case we can probably start at obsoletion, but that would leave >> expiration for JDK 14. Or compiler folk can argue for / justify >> immediate full expiration/removal. > > I'm under the impression this process does not apply to develop flags > (which are not visible an anything but debug builds)? We've removed develop flags without obsoletion + expiry many times in the past[1], and while this goes against the written down expiration in arguments.cpp, I believe it to be a misguided recommendation for develop flags. /Claes [1] https://bugs.openjdk.java.net/browse/JDK-8191870 https://bugs.openjdk.java.net/browse/JDK-8132318 https://bugs.openjdk.java.net/browse/JDK-8186042 https://bugs.openjdk.java.net/browse/JDK-8180423 https://bugs.openjdk.java.net/browse/JDK-8058259 From martin.doerr at sap.com Mon Jan 7 13:49:34 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 7 Jan 2019 13:49:34 +0000 Subject: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: <180a6c0b-7abe-9d5c-51e6-dffbb23570d3@linux.vnet.ibm.com> References: <1c4646d554954551b73c077fa40f983d@sap.com> <771a457e-6b4d-f73b-e072-703490c9ced5@linux.vnet.ibm.com> <9863276de30643338249ead2a6ac7fe9@sap.com> <452dcb69-189e-700d-5995-582ba13669b9@linux.vnet.ibm.com> <37d9e6f3b2b4400d8963f54d2fe7767f@sap.com> <406db3e3-2ac3-dc16-f384-99a314e62a42@linux.vnet.ibm.com> <180a6c0b-7abe-9d5c-51e6-dffbb23570d3@linux.vnet.ibm.com> Message-ID: Hi Gustavo, I want to check all places where we use "mr(R1_SP, R21_sender_SP)". There may be more issues with that. I'll probably handle that in a separate change and push this CRC change afterwards. Best regards, Martin -----Original Message----- From: Gustavo Romero Sent: Freitag, 4. Januar 2019 19:55 To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays Hi Martin, On 01/04/2019 02:13 PM, Doerr, Martin wrote: > Hi Gustavo, > > when called from the interpreter (the scenario you observed), R21 is set before resizing the frame to avoid wasted stack space (InterpreterMacroAssembler::call_from_interpreter). Got it. Thanks a lot for the explanations. I think it doesn't currently matter in practice, but I'm wondering if to be consistent we should cut back the stack back earlier also in TemplateInterpreterGenerator::generate_CRC32_update_entry()? diff -r a35f8c35d8c9 src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan 04 10:09:00 2019 +0100 +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan 04 13:44:37 2019 -0500 @@ -1840,11 +1840,12 @@ #endif __ lwz(crc, 2*wordSize, argP); // Current crc state, zero extend to 64 bit to have a clean register. + // Restore caller sp for c2i case and return. + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. + StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); __ kernel_crc32_singleByte(crc, data, dataLen, table, tmp, true); - // Restore caller sp for c2i case and return. - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. __ blr(); // Generate a vanilla native entry as the slow path. Currently there is no issue probably because generated code is simpler and does no spills. Best regards, Gustavo > When called from compiled methods, R21 is set by a c2i adapter which extends the compiled frame by space for arguments (gen_c2i_adapter). > > "mr(R1_SP, R21_sender_SP)" is more error-prone than "resize_frame_absolute" so I think the latter would be better (though it takes more registers and instructions), but I don't want to replace that as part of this CRC change. > > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero > Sent: Freitag, 4. Januar 2019 14:44 > To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays > > Hi Martin, > > On 01/04/2019 07:30 AM, Doerr, Martin wrote: >> thank you very much for confirming. This makes sense. We use different frame headers depending on whether the frame is the top Java frame or not (and on whether it's a debug build or not). Setting R1_SP to sender_SP is a shortcut for leaf calls which relies on having an unmodified stack until this point. So the patch fixes the issue. > > Glad to help! Thanks for the additional information, I was not aware that the > selection of different frame headers could be done at compile time. One last > question only for my education: what exactly advanced (incremented) R1_SP so it > has to be cut back using sender_SP value, i.e. sender_SP tracks the frame for > which function exactly or "who" is the caller exactly here? > > Thank you. > > Best regards, > Gustavo > >> New webrev: >> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero >> Sent: Donnerstag, 3. Januar 2019 19:36 >> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >> >> Hi Martin, >> >> On 01/03/2019 03:34 PM, Doerr, Martin wrote: >>> Unfortunately, I can't reproduce the crash. TestCRC32C works stable on our machine (with fastdbg build). >>> I guess that the frameless spills mess up the stack. Can you check if the patch below helps? >> >> Thanks for providing a fix so I can try it. >> Yes, I confirm the patch below indeed fixes the sigsegv crash when CRC32C update() method is used. >> I also confirm that I don't observe the crash on the fastdebug build, only on the release build. >> It also only affects the Interpreter mode, so passing -Xcomp avoids the crash on the release build. >> >> Just as reference, I can reproduce it on the release build with the following trivial code: >> >> import java.util.zip.CRC32C; >> >> class CRC32C_v1 { >> public static void main(String[] arg) { >> byte[] b = new byte[1024]; >> >> CRC32C crc32c = new CRC32C(); >> crc32c.update(b, 0, b.length); >> >> System.out.println(crc32c.getValue()); >> } >> } >> >> Thanks for fixing the typos. >> >> >> Best regards, >> Gustavo >> >>> Best regards, >>> Martin >>> >>> >>> diff -r a33f49d5998c src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp >>> --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 17:30:03 2019 +0100 >>> +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 18:33:16 2019 +0100 >>> @@ -1924,6 +1924,9 @@ >>> __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); >>> } >>> >>> + // Restore caller sp for c2i case. >>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>> + >>> StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); >>> >>> if (!VM_Version::has_vpmsumb()) { >>> @@ -1933,8 +1936,6 @@ >>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, true); >>> } >>> >>> - // Restore caller sp for c2i case and return. >>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>> __ blr(); >>> >>> // Generate a vanilla native entry as the slow path. >>> @@ -2014,6 +2015,9 @@ >>> __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); >>> } >>> >>> + // Restore caller sp for c2i case. >>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>> + >>> StubRoutines::ppc64::generate_load_crc32c_table_addr(_masm, table); >>> >>> if (!VM_Version::has_vpmsumb()) { >>> @@ -2023,8 +2027,6 @@ >>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, false); >>> } >>> >>> - // Restore caller sp for c2i case and return. >>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>> __ blr(); >>> >>> BLOCK_COMMENT("} CRC32C_update{Bytes|DirectByteBuffer}"); >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero >>> Sent: Donnerstag, 3. Januar 2019 17:13 >>> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >>> >>> Hi Martin, >>> >>> oh that's nice. You removed the 512-byte block constraint and also wired it up to the Interpreter :) >>> >>> For the worst case, unaligned 512 byte array, I see the gap to aligned 512 byte array reduced by about ~5.7x. >>> >>> On the Interpreter I see an improvement of at least 50% for 1024 bytes. >>> >>> This is all for the CRC32 class. >>> >>> On CRC32C I'm getting a SIGSEV that can be reproduced running against ./test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java. >>> >>> I've upload a full log into http://cr.openjdk.java.net/~gromero/logs/crc32c_sigsegv/ >>> >>> I'm leaving for the lunch and I'll take a closer look when back. But probably you will figure it out before I sit to appreciate the meal :) >>> >>> Finally, since the change does some cleanup, I wonder if it would be worth fixing the following typos: >>> >>> I think it's Barrett const., not Barret. Probably 'barret' is used in the code as a short version >>> for Barrett but it should be changed in >>> >>> + // Point to Barret constants >>> + add_const_optimized(cur_const, constants, outer_consts_size + inner_consts_size); >>> + >>> >>> ? >>> >>> s/not/note/ in: >>> cpu/ppc/macroAssembler_ppc.cpp:3977:// A not on the lookup table address(es): >>> >>> d/lives/ in: >>> cpu/ppc/macroAssembler_ppc.cpp:4265: mtvrwz(VCRC, crc); // crc lives lives in VCRC, now >>> >>> Best regards, >>> Gustavo >>> >>> On 01/03/2019 12:17 PM, Doerr, Martin wrote: >>>> Hi, >>>> >>>> the JVM on PPC64 currently misses usage of the fast vector implementation in the interpreter code. >>>> >>>> In addition, performance is not good for short arrays (unaligned 512 byte arrays or shorter arrays) because the current vector implementation needs at least 512 bytes. >>>> >>>> Bug: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8216060 >>>> >>>> I have addressed these 2 issues + some cleanup with the following webrev: >>>> >>>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ >>>> >>>> Please review. >>>> >>>> Best regards, >>>> >>>> Martin >>>> >>> >> > From gromero at linux.vnet.ibm.com Mon Jan 7 13:52:19 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 7 Jan 2019 11:52:19 -0200 Subject: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: References: <1c4646d554954551b73c077fa40f983d@sap.com> <771a457e-6b4d-f73b-e072-703490c9ced5@linux.vnet.ibm.com> <9863276de30643338249ead2a6ac7fe9@sap.com> <452dcb69-189e-700d-5995-582ba13669b9@linux.vnet.ibm.com> <37d9e6f3b2b4400d8963f54d2fe7767f@sap.com> <406db3e3-2ac3-dc16-f384-99a314e62a42@linux.vnet.ibm.com> <180a6c0b-7abe-9d5c-51e6-dffbb23570d3@linux.vnet.ibm.com> Message-ID: <301fd43a-e5b5-d970-7a1a-2458dbaeec36@linux.vnet.ibm.com> Hi Martin, On 01/07/2019 11:49 AM, Doerr, Martin wrote: > I want to check all places where we use "mr(R1_SP, R21_sender_SP)". There may be more issues with that. I'll probably handle that in a separate change and push this CRC change afterwards. I see. Thanks for letting me know. Best regards, Gustavo > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero > Sent: Freitag, 4. Januar 2019 19:55 > To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays > > Hi Martin, > > On 01/04/2019 02:13 PM, Doerr, Martin wrote: >> Hi Gustavo, >> >> when called from the interpreter (the scenario you observed), R21 is set before resizing the frame to avoid wasted stack space (InterpreterMacroAssembler::call_from_interpreter). > > Got it. Thanks a lot for the explanations. > > I think it doesn't currently matter in practice, but I'm wondering if to be > consistent we should cut back the stack back earlier also in > TemplateInterpreterGenerator::generate_CRC32_update_entry()? > > diff -r a35f8c35d8c9 src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp > --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan 04 10:09:00 2019 +0100 > +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan 04 13:44:37 2019 -0500 > @@ -1840,11 +1840,12 @@ > #endif > __ lwz(crc, 2*wordSize, argP); // Current crc state, zero extend to 64 bit to have a clean register. > > + // Restore caller sp for c2i case and return. > + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. > + > StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); > __ kernel_crc32_singleByte(crc, data, dataLen, table, tmp, true); > > - // Restore caller sp for c2i case and return. > - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. > __ blr(); > > // Generate a vanilla native entry as the slow path. > > Currently there is no issue probably because generated code is simpler and does > no spills. > > Best regards, > Gustavo > >> When called from compiled methods, R21 is set by a c2i adapter which extends the compiled frame by space for arguments (gen_c2i_adapter). >> >> "mr(R1_SP, R21_sender_SP)" is more error-prone than "resize_frame_absolute" so I think the latter would be better (though it takes more registers and instructions), but I don't want to replace that as part of this CRC change. >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero >> Sent: Freitag, 4. Januar 2019 14:44 >> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >> >> Hi Martin, >> >> On 01/04/2019 07:30 AM, Doerr, Martin wrote: >>> thank you very much for confirming. This makes sense. We use different frame headers depending on whether the frame is the top Java frame or not (and on whether it's a debug build or not). Setting R1_SP to sender_SP is a shortcut for leaf calls which relies on having an unmodified stack until this point. So the patch fixes the issue. >> >> Glad to help! Thanks for the additional information, I was not aware that the >> selection of different frame headers could be done at compile time. One last >> question only for my education: what exactly advanced (incremented) R1_SP so it >> has to be cut back using sender_SP value, i.e. sender_SP tracks the frame for >> which function exactly or "who" is the caller exactly here? >> >> Thank you. >> >> Best regards, >> Gustavo >> >>> New webrev: >>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero >>> Sent: Donnerstag, 3. Januar 2019 19:36 >>> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >>> >>> Hi Martin, >>> >>> On 01/03/2019 03:34 PM, Doerr, Martin wrote: >>>> Unfortunately, I can't reproduce the crash. TestCRC32C works stable on our machine (with fastdbg build). >>>> I guess that the frameless spills mess up the stack. Can you check if the patch below helps? >>> >>> Thanks for providing a fix so I can try it. >>> Yes, I confirm the patch below indeed fixes the sigsegv crash when CRC32C update() method is used. >>> I also confirm that I don't observe the crash on the fastdebug build, only on the release build. >>> It also only affects the Interpreter mode, so passing -Xcomp avoids the crash on the release build. >>> >>> Just as reference, I can reproduce it on the release build with the following trivial code: >>> >>> import java.util.zip.CRC32C; >>> >>> class CRC32C_v1 { >>> public static void main(String[] arg) { >>> byte[] b = new byte[1024]; >>> >>> CRC32C crc32c = new CRC32C(); >>> crc32c.update(b, 0, b.length); >>> >>> System.out.println(crc32c.getValue()); >>> } >>> } >>> >>> Thanks for fixing the typos. >>> >>> >>> Best regards, >>> Gustavo >>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> diff -r a33f49d5998c src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp >>>> --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 17:30:03 2019 +0100 >>>> +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 18:33:16 2019 +0100 >>>> @@ -1924,6 +1924,9 @@ >>>> __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); >>>> } >>>> >>>> + // Restore caller sp for c2i case. >>>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>>> + >>>> StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); >>>> >>>> if (!VM_Version::has_vpmsumb()) { >>>> @@ -1933,8 +1936,6 @@ >>>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, true); >>>> } >>>> >>>> - // Restore caller sp for c2i case and return. >>>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>>> __ blr(); >>>> >>>> // Generate a vanilla native entry as the slow path. >>>> @@ -2014,6 +2015,9 @@ >>>> __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); >>>> } >>>> >>>> + // Restore caller sp for c2i case. >>>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>>> + >>>> StubRoutines::ppc64::generate_load_crc32c_table_addr(_masm, table); >>>> >>>> if (!VM_Version::has_vpmsumb()) { >>>> @@ -2023,8 +2027,6 @@ >>>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, false); >>>> } >>>> >>>> - // Restore caller sp for c2i case and return. >>>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>>> __ blr(); >>>> >>>> BLOCK_COMMENT("} CRC32C_update{Bytes|DirectByteBuffer}"); >>>> >>>> >>>> -----Original Message----- >>>> From: Gustavo Romero >>>> Sent: Donnerstag, 3. Januar 2019 17:13 >>>> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >>>> >>>> Hi Martin, >>>> >>>> oh that's nice. You removed the 512-byte block constraint and also wired it up to the Interpreter :) >>>> >>>> For the worst case, unaligned 512 byte array, I see the gap to aligned 512 byte array reduced by about ~5.7x. >>>> >>>> On the Interpreter I see an improvement of at least 50% for 1024 bytes. >>>> >>>> This is all for the CRC32 class. >>>> >>>> On CRC32C I'm getting a SIGSEV that can be reproduced running against ./test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java. >>>> >>>> I've upload a full log into http://cr.openjdk.java.net/~gromero/logs/crc32c_sigsegv/ >>>> >>>> I'm leaving for the lunch and I'll take a closer look when back. But probably you will figure it out before I sit to appreciate the meal :) >>>> >>>> Finally, since the change does some cleanup, I wonder if it would be worth fixing the following typos: >>>> >>>> I think it's Barrett const., not Barret. Probably 'barret' is used in the code as a short version >>>> for Barrett but it should be changed in >>>> >>>> + // Point to Barret constants >>>> + add_const_optimized(cur_const, constants, outer_consts_size + inner_consts_size); >>>> + >>>> >>>> ? >>>> >>>> s/not/note/ in: >>>> cpu/ppc/macroAssembler_ppc.cpp:3977:// A not on the lookup table address(es): >>>> >>>> d/lives/ in: >>>> cpu/ppc/macroAssembler_ppc.cpp:4265: mtvrwz(VCRC, crc); // crc lives lives in VCRC, now >>>> >>>> Best regards, >>>> Gustavo >>>> >>>> On 01/03/2019 12:17 PM, Doerr, Martin wrote: >>>>> Hi, >>>>> >>>>> the JVM on PPC64 currently misses usage of the fast vector implementation in the interpreter code. >>>>> >>>>> In addition, performance is not good for short arrays (unaligned 512 byte arrays or shorter arrays) because the current vector implementation needs at least 512 bytes. >>>>> >>>>> Bug: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8216060 >>>>> >>>>> I have addressed these 2 issues + some cleanup with the following webrev: >>>>> >>>>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ >>>>> >>>>> Please review. >>>>> >>>>> Best regards, >>>>> >>>>> Martin >>>>> >>>> >>> >> > From leo.korinth at oracle.com Mon Jan 7 14:32:06 2019 From: leo.korinth at oracle.com (Leo Korinth) Date: Mon, 7 Jan 2019 15:32:06 +0100 Subject: RFR: 8216154: C4819 warnings at HotSpot sources on Windows In-Reply-To: <3a1836446f59ae6ac89a00dc09867c1d1e97e157.camel@oracle.com> References: <9b1bd147-a261-95df-18f8-a5476415d0a4@gmail.com> <0E131429-5B2F-4A2E-980A-8966354AB7F4@oracle.com> <3a1836446f59ae6ac89a00dc09867c1d1e97e157.camel@oracle.com> Message-ID: Hi! Running: find -name "*.[ch]pp" | xargs file | grep -v ASCII ./src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: C source, UTF-8 Unicode text ./src/hotspot/cpu/aarch64/macroAssembler_aarch64_trig.cpp: C source, UTF-8 Unicode text ./src/hotspot/share/gc/parallel/gcTaskManager.hpp: data ./src/hotspot/share/code/codeHeapState.cpp: C source, UTF-8 Unicode text ./src/hotspot/share/oops/method.cpp: C source, UTF-8 Unicode text ./test/hotspot/gtest/utilities/test_json.cpp: C source, UTF-8 Unicode text The single hpp file seems fine though (just file not understanding that it is a source file). Some questions, as it seems like I am missing something. 1) Should not all of those files be fixed? 2) Why remove warning (in one file, methodMatcher.cpp) instead of changing encoding? 3) methodMatcher.cpp seems to be pure ASCII, why the change in that file at all? $ grep --color -P -n "[^[:ascii:]]" is a good way to find the problematic line. Thanks, Leo On 07/01/2019 14:20, Thomas Schatzl wrote: > Hi, > > On Mon, 2019-01-07 at 21:36 +0900, Yasumasa Suenaga wrote: >> Hi Kim, >> >> On 2019/01/07 7:18, Kim Barrett wrote: >>>> On Jan 6, 2019, at 12:54 PM, Kim Barrett >>>> wrote: >>>> >>>>> On Jan 6, 2019, at 7:53 AM, Yasumasa Suenaga < >>>>> yasuenag at gmail.com> wrote: >>>>> >>>>> Hi Kim, >>>>> >>>>> Thank you for your comment. >>>>> I uploaded new webrev to use pragma warning push/pop: >>>>> >>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.01/ >>>>> >>>>> >>>>> Please review again. >>>> >>>> Looks good. > > I tried to verify these problems on these two files as suggested with > "iconv -f US-ASCII -t UTF8 " which errored out on > codeHeapState.cpp as expected but there has been no error with > methodMatcher.cpp. Am I doing something wrong? > > I am fine with that change if it is really needed for successful > compliation :) I just can't find the non-US-ASCII character used in the > line indicated by the error message. > >>> >>> It later occurred to me to wonder whether _WINDOWS was the right >>> macro to conditionalize on. All other uses of #pragma warning >>> push/pop (there are 5 in HotSpot) use _MSC_VER. >> >> I updated webrev to use _MSC_VER. Is it ok? >> >> http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.02/ >> > > Please add a "// warning C4189: The file contains a character that > cannot be represented in the current code page" comment above or next > to the pragma warning(disable) declaration. > > Not many people know the VC warning numbers by default... > > Looks good otherwise, I do not need a re-review for this comment > change. > > Thanks, > Thomas > > From yasuenag at gmail.com Mon Jan 7 14:38:42 2019 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Mon, 7 Jan 2019 23:38:42 +0900 Subject: RFR: 8216154: C4819 warnings at HotSpot sources on Windows In-Reply-To: <3a1836446f59ae6ac89a00dc09867c1d1e97e157.camel@oracle.com> References: <9b1bd147-a261-95df-18f8-a5476415d0a4@gmail.com> <0E131429-5B2F-4A2E-980A-8966354AB7F4@oracle.com> <3a1836446f59ae6ac89a00dc09867c1d1e97e157.camel@oracle.com> Message-ID: <97640e53-a344-636d-1005-b62bb364aaa7@gmail.com> Hi Thomas, On 2019/01/07 22:20, Thomas Schatzl wrote: > Hi, > > On Mon, 2019-01-07 at 21:36 +0900, Yasumasa Suenaga wrote: >> Hi Kim, >> >> On 2019/01/07 7:18, Kim Barrett wrote: >>>> On Jan 6, 2019, at 12:54 PM, Kim Barrett >>>> wrote: >>>> >>>>> On Jan 6, 2019, at 7:53 AM, Yasumasa Suenaga < >>>>> yasuenag at gmail.com> wrote: >>>>> >>>>> Hi Kim, >>>>> >>>>> Thank you for your comment. >>>>> I uploaded new webrev to use pragma warning push/pop: >>>>> >>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.01/ >>>>> >>>>> >>>>> Please review again. >>>> >>>> Looks good. > > I tried to verify these problems on these two files as suggested with > "iconv -f US-ASCII -t UTF8 " which errored out on > codeHeapState.cpp as expected but there has been no error with > methodMatcher.cpp. Am I doing something wrong? Sorry, it's my mistake. But the error will occur in its file because `RANGE0` which contains non-ASCII characters passes to sscanf(). > I am fine with that change if it is really needed for successful > compliation :) I just can't find the non-US-ASCII character used in the > line indicated by the error message. > >>> >>> It later occurred to me to wonder whether _WINDOWS was the right >>> macro to conditionalize on. All other uses of #pragma warning >>> push/pop (there are 5 in HotSpot) use _MSC_VER. >> >> I updated webrev to use _MSC_VER. Is it ok? >> >> http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.02/ >> > > Please add a "// warning C4189: The file contains a character that > cannot be represented in the current code page" comment above or next > to the pragma warning(disable) declaration. > > Not many people know the VC warning numbers by default... > > Looks good otherwise, I do not need a re-review for this comment > change. Ok, I will add the comment. Thanks, Yasumasa > Thanks, > Thomas > > From yasuenag at gmail.com Mon Jan 7 14:53:31 2019 From: yasuenag at gmail.com (Yasumasa Suenaga) Date: Mon, 7 Jan 2019 23:53:31 +0900 Subject: RFR: 8216154: C4819 warnings at HotSpot sources on Windows In-Reply-To: References: <9b1bd147-a261-95df-18f8-a5476415d0a4@gmail.com> <0E131429-5B2F-4A2E-980A-8966354AB7F4@oracle.com> <3a1836446f59ae6ac89a00dc09867c1d1e97e157.camel@oracle.com> Message-ID: Hi Leo, > 1) Should not all of those files be fixed? > ./src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: C source, UTF-8 Unicode text > ./src/hotspot/share/oops/method.cpp: C source, UTF-8 Unicode text Non-ASCII character(s) in comment line. > ./src/hotspot/cpu/aarch64/macroAssembler_aarch64_trig.cpp: C source, UTF-8 Unicode text It's not used for Windows (AArch64). > ./src/hotspot/share/gc/parallel/gcTaskManager.hpp: data I couldn't find why `file` detects it as "data". So I have no idea for it. > ./src/hotspot/share/code/codeHeapState.cpp: C source, UTF-8 Unicode text It's ASCII file on my laptop :-) > ./test/hotspot/gtest/utilities/test_json.cpp: C source, UTF-8 Unicode text It's test code. > 2) Why remove warning (in one file, methodMatcher.cpp) instead of changing encoding? > 3) methodMatcher.cpp seems to be pure ASCII, why the change in that file at all? The error occurs about `RANGE0`. It has binary data, so it might not be able to change encoding. Thanks, Yasumasa On 2019/01/07 23:32, Leo Korinth wrote: > Hi! > > Running: find -name "*.[ch]pp" | xargs file | grep -v ASCII > ./src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: ???????????????????????? C source, UTF-8 Unicode text > ./src/hotspot/cpu/aarch64/macroAssembler_aarch64_trig.cpp: ???????????????????????? C source, UTF-8 Unicode text > ./src/hotspot/share/gc/parallel/gcTaskManager.hpp: ???????????????????????? data > ./src/hotspot/share/code/codeHeapState.cpp: ???????????????????????? C source, UTF-8 Unicode text > ./src/hotspot/share/oops/method.cpp: ????????????????????????????????????????????????????????????????? C source, UTF-8 Unicode text > ./test/hotspot/gtest/utilities/test_json.cpp: ????????????????????????????????????????????????????????????????? C source, UTF-8 Unicode text > > > The single hpp file seems fine though (just file not understanding that it is a source file). > > Some questions, as it seems like I am missing something. > 1) Should not all of those files be fixed? > 2) Why remove warning (in one file, methodMatcher.cpp) instead of changing encoding? > 3) methodMatcher.cpp seems to be pure ASCII, why the change in that file at all? > > $ grep --color -P -n "[^[:ascii:]]" is a good way to find the problematic line. > > Thanks, Leo > > On 07/01/2019 14:20, Thomas Schatzl wrote: >> Hi, >> >> On Mon, 2019-01-07 at 21:36 +0900, Yasumasa Suenaga wrote: >>> Hi Kim, >>> >>> On 2019/01/07 7:18, Kim Barrett wrote: >>>>> On Jan 6, 2019, at 12:54 PM, Kim Barrett >>>>> wrote: >>>>> >>>>>> On Jan 6, 2019, at 7:53 AM, Yasumasa Suenaga < >>>>>> yasuenag at gmail.com> wrote: >>>>>> >>>>>> Hi Kim, >>>>>> >>>>>> Thank you for your comment. >>>>>> I uploaded new webrev to use pragma warning push/pop: >>>>>> >>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.01/ >>>>>> >>>>>> >>>>>> Please review again. >>>>> >>>>> Looks good. >> >> I tried to verify these problems on these two files as suggested with >> "iconv -f US-ASCII -t UTF8 " which errored out on >> codeHeapState.cpp as expected but there has been no error with >> methodMatcher.cpp. Am I doing something wrong? >> >> I am fine with that change if it is really needed for successful >> compliation :) I just can't find the non-US-ASCII character used in the >> line indicated by the error message. >> >>>> >>>> It later occurred to me to wonder whether _WINDOWS was the right >>>> macro to conditionalize on.? All other uses of #pragma warning >>>> push/pop (there are 5 in HotSpot) use _MSC_VER. >>> >>> I updated webrev to use _MSC_VER. Is it ok? >>> >>> ??? http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.02/ >>> >> >> Please add a "// warning C4189: The file contains a character that >> cannot be represented in the current code page" comment above or next >> to the pragma warning(disable) declaration. >> >> Not many people know the VC warning numbers by default... >> >> Looks good otherwise, I do not need a re-review for this comment >> change. >> >> Thanks, >> ?? Thomas >> >> From zgu at redhat.com Mon Jan 7 15:38:12 2019 From: zgu at redhat.com (zgu at redhat.com) Date: Mon, 07 Jan 2019 10:38:12 -0500 Subject: RFR(T) 8216199: Local variable arg defined but never used in BCEscapeAnalyzer::compute_escape_for_intrinsic() Message-ID: <1546875492.3477.36.camel@redhat.com> Please review this trivial change to remove unused local variable. Bug: https://bugs.openjdk.java.net/browse/JDK-8216199 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8216199/webrev.00/ Test: hotspot_compiler on Linux 64 (fastdebug and release) Thanks, -Zhengyu From tobias.hartmann at oracle.com Mon Jan 7 15:45:06 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 7 Jan 2019 16:45:06 +0100 Subject: RFR(T) 8216199: Local variable arg defined but never used in BCEscapeAnalyzer::compute_escape_for_intrinsic() In-Reply-To: <1546875492.3477.36.camel@redhat.com> References: <1546875492.3477.36.camel@redhat.com> Message-ID: Hi Zhengyu, looks good and trivial to me. Thanks, Tobias On 07.01.19 16:38, zgu at redhat.com wrote: > Please review this trivial change to remove unused local variable. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8216199 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8216199/webrev.00/ > > Test: > > hotspot_compiler on Linux 64 (fastdebug and release) > > Thanks, > > -Zhengyu > From zgu at redhat.com Mon Jan 7 15:59:23 2019 From: zgu at redhat.com (zgu at redhat.com) Date: Mon, 07 Jan 2019 10:59:23 -0500 Subject: RFR(T) 8216200: BCEscapeAnalyzer::ArgumentMap::set_intersect() is incorrect Message-ID: <1546876763.3477.43.camel@redhat.com> Please review this trivial change that removes unused/incorrect method. BCEscapeAnalyzer::ArgumentMap::set_intersect()'s implementation is wrong. The reason that it did not blowup anything, is that it does not have users. We can fix it or remove it: based on Tobias' comment in bug, let's simply remove it. Bug: https://bugs.openjdk.java.net/browse/JDK-8216200 Webrev: http://cr.openjdk.java.net/~zgu/JDK-8216200/webrev.00/ Test: hotspot_compiler on Linux 64. Thanks, -Zhengyu From tobias.hartmann at oracle.com Mon Jan 7 16:01:19 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 7 Jan 2019 17:01:19 +0100 Subject: RFR(T) 8216200: BCEscapeAnalyzer::ArgumentMap::set_intersect() is incorrect In-Reply-To: <1546876763.3477.43.camel@redhat.com> References: <1546876763.3477.43.camel@redhat.com> Message-ID: Hi Zhengyu, looks good and trivial to me. Best regards, Tobias On 07.01.19 16:59, zgu at redhat.com wrote: > Please review this trivial change that removes unused/incorrect method. > > BCEscapeAnalyzer::ArgumentMap::set_intersect()'s implementation is > wrong. The reason that it did not blowup anything, is that it does not > have users. > > We can fix it or remove it: based on Tobias' comment in bug, let's > simply remove it. > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8216200 > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8216200/webrev.00/ > > Test: > > hotspot_compiler on Linux 64. > > Thanks, > > -Zhengyu > From zgu at redhat.com Mon Jan 7 16:03:46 2019 From: zgu at redhat.com (zgu at redhat.com) Date: Mon, 07 Jan 2019 11:03:46 -0500 Subject: RFR(T) 8216200: BCEscapeAnalyzer::ArgumentMap::set_intersect() is incorrect In-Reply-To: References: <1546876763.3477.43.camel@redhat.com> Message-ID: <1546877026.3477.44.camel@redhat.com> Thanks for the quick review, Tobias. -Zhengyu On Mon, 2019-01-07 at 17:01 +0100, Tobias Hartmann wrote: > Hi Zhengyu, > > looks good and trivial to me. > > Best regards, > Tobias > > On 07.01.19 16:59, zgu at redhat.com wrote: > > Please review this trivial change that removes unused/incorrect > > method. > > > > BCEscapeAnalyzer::ArgumentMap::set_intersect()'s implementation is > > wrong. The reason that it did not blowup anything, is that it does > > not > > have users. > > > > We can fix it or remove it: based on Tobias' comment in bug, let's > > simply remove it. > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8216200 > > Webrev: http://cr.openjdk.java.net/~zgu/JDK-8216200/webrev.00/ > > > > Test: > > > > hotspot_compiler on Linux 64. > > > > Thanks, > > > > -Zhengyu > > From vladimir.kozlov at oracle.com Mon Jan 7 16:02:55 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 7 Jan 2019 08:02:55 -0800 Subject: RFR(S): 8214862: assert(proj != __null) at compile.cpp:3251 In-Reply-To: <878t0thi36.fsf@redhat.com> References: <87d0qfhtyo.fsf@redhat.com> <3167fa2c-a9bd-ae8d-a084-7a09275b35e1@oracle.com> <874lbpish9.fsf@redhat.com> <19387ab7-2dbc-61ed-5722-ff5ecbcc3b51@oracle.com> <878t0thi36.fsf@redhat.com> Message-ID: <1632cec4-c3a1-239a-9f95-3b991c68ff97@oracle.com> On 12/13/18 1:49 AM, Roland Westrelin wrote: > >> Do you hit next bailout (with fix)?: >> >> http://hg.openjdk.java.net/jdk/jdk/file/24525070d934/src/hotspot/share/opto/compile.cpp#l3669 > > yes. > >> Is fall-through path eliminated because it is not reachable from Root because of infinite loop? > > yes. > >> I think we should detect infinite loop very early, after first PhaseRemoveUseless. Or may be just before or during >> PhaseRemoveUseless when we still have path. > > Isn't there a chance that the path that leads to the infinite loop can > be optimized out during optimizations so bailing out early could cause a > valid method to never be compiled? It could be. It seems your solution is simplest one. I agree with it. Thanks, vladimir > >> What happens if a method has *only* infinite loop? In which phase we detect it and bailout? > > This: > > private static void test() { > while (true); > } > > bails out in: > > bool Compile::final_graph_reshaping() { > // an infinite loop may have been eliminated by the optimizer, > // in which case the graph will be empty. > if (root()->req() == 1) { > record_method_not_compilable("trivial infinite loop"); > return true; > } > > > Roland. > From leo.korinth at oracle.com Mon Jan 7 16:46:26 2019 From: leo.korinth at oracle.com (Leo Korinth) Date: Mon, 7 Jan 2019 17:46:26 +0100 Subject: RFR: 8216154: C4819 warnings at HotSpot sources on Windows In-Reply-To: References: <9b1bd147-a261-95df-18f8-a5476415d0a4@gmail.com> <0E131429-5B2F-4A2E-980A-8966354AB7F4@oracle.com> <3a1836446f59ae6ac89a00dc09867c1d1e97e157.camel@oracle.com> Message-ID: Hi! On 07/01/2019 15:53, Yasumasa Suenaga wrote: > Hi Leo, > >> 1) Should not all of those files be fixed? > > >> ./src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: >> C source, UTF-8 Unicode text >> ./src/hotspot/share/oops/method.cpp: >> C source, UTF-8 Unicode text > > Non-ASCII character(s) in comment line. > > >> ./src/hotspot/cpu/aarch64/macroAssembler_aarch64_trig.cpp: >> C source, UTF-8 Unicode text > > It's not used for Windows (AArch64). > > >> ./src/hotspot/share/gc/parallel/gcTaskManager.hpp: >> data > > I couldn't find why `file` detects it as "data". So I have no idea for it. > > >> ./src/hotspot/share/code/codeHeapState.cpp:????????????????????????? C >> source, UTF-8 Unicode text > > It's ASCII file on my laptop :-) Check line 1979: 00015f20: 7468 6579 2068 6176 6520 6e6f 2063 6f6d they have no com 00015f30: 7069 6c61 7469 6f6e c2a0 4944 2061 7373 pilation..ID ass c2a0 is not ASCII but Latin-1 Supplement (NO-BREAK SPACE) and is not in a comment. I do not have a windows environment up and running. The compiler seems to warn in strange places, and _not_ warn in others. I can not get a confirmation in the style guide that we are to use only ASCII, so please ignore my questions as I do not know what to recommend. Thanks, Leo > > >> ./test/hotspot/gtest/utilities/test_json.cpp: >> C source, UTF-8 Unicode text > > It's test code. > > >> 2) Why remove warning (in one file, methodMatcher.cpp) instead of >> changing encoding? >> 3) methodMatcher.cpp seems to be pure ASCII, why the change in that >> file at all? > > The error occurs about `RANGE0`. It has binary data, so it might not be > able to change encoding. > > Thanks, > > Yasumasa > > > On 2019/01/07 23:32, Leo Korinth wrote: >> Hi! >> >> Running: find -name "*.[ch]pp" | xargs file | grep -v ASCII >> ./src/hotspot/cpu/x86/macroAssembler_x86_sha.cpp: >> ???????????????????????? C source, UTF-8 Unicode text >> ./src/hotspot/cpu/aarch64/macroAssembler_aarch64_trig.cpp: >> ???????????????????????? C source, UTF-8 Unicode text >> ./src/hotspot/share/gc/parallel/gcTaskManager.hpp: >> ???????????????????????? data >> ./src/hotspot/share/code/codeHeapState.cpp: ???????????????????????? C >> source, UTF-8 Unicode text >> ./src/hotspot/share/oops/method.cpp: >> ????????????????????????????????????????????????????????????????? C >> source, UTF-8 Unicode text >> ./test/hotspot/gtest/utilities/test_json.cpp: >> ????????????????????????????????????????????????????????????????? C >> source, UTF-8 Unicode text >> >> >> The single hpp file seems fine though (just file not understanding >> that it is a source file). >> >> Some questions, as it seems like I am missing something. >> 1) Should not all of those files be fixed? >> 2) Why remove warning (in one file, methodMatcher.cpp) instead of >> changing encoding? >> 3) methodMatcher.cpp seems to be pure ASCII, why the change in that >> file at all? >> >> $ grep --color -P -n "[^[:ascii:]]" is a good way to find the >> problematic line. >> >> Thanks, Leo >> >> On 07/01/2019 14:20, Thomas Schatzl wrote: >>> Hi, >>> >>> On Mon, 2019-01-07 at 21:36 +0900, Yasumasa Suenaga wrote: >>>> Hi Kim, >>>> >>>> On 2019/01/07 7:18, Kim Barrett wrote: >>>>>> On Jan 6, 2019, at 12:54 PM, Kim Barrett >>>>>> wrote: >>>>>> >>>>>>> On Jan 6, 2019, at 7:53 AM, Yasumasa Suenaga < >>>>>>> yasuenag at gmail.com> wrote: >>>>>>> >>>>>>> Hi Kim, >>>>>>> >>>>>>> Thank you for your comment. >>>>>>> I uploaded new webrev to use pragma warning push/pop: >>>>>>> >>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.01/ >>>>>>> >>>>>>> >>>>>>> Please review again. >>>>>> >>>>>> Looks good. >>> >>> I tried to verify these problems on these two files as suggested with >>> "iconv -f US-ASCII -t UTF8 " which errored out on >>> codeHeapState.cpp as expected but there has been no error with >>> methodMatcher.cpp. Am I doing something wrong? >>> >>> I am fine with that change if it is really needed for successful >>> compliation :) I just can't find the non-US-ASCII character used in the >>> line indicated by the error message. >>> >>>>> >>>>> It later occurred to me to wonder whether _WINDOWS was the right >>>>> macro to conditionalize on.? All other uses of #pragma warning >>>>> push/pop (there are 5 in HotSpot) use _MSC_VER. >>>> >>>> I updated webrev to use _MSC_VER. Is it ok? >>>> >>>> ??? http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.02/ >>>> >>> >>> Please add a "// warning C4189: The file contains a character that >>> cannot be represented in the current code page" comment above or next >>> to the pragma warning(disable) declaration. >>> >>> Not many people know the VC warning numbers by default... >>> >>> Looks good otherwise, I do not need a re-review for this comment >>> change. >>> >>> Thanks, >>> ?? Thomas >>> >>> From sandhya.viswanathan at intel.com Mon Jan 7 20:44:17 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Mon, 7 Jan 2019 20:44:17 +0000 Subject: RFR (XS): 8216290: Backport of 8215888 to JDK11u (Register to register spill may use AVX 512 move instruction on unsupported platform) Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A4AC0F@FMSMSX126.amr.corp.intel.com> This is a request to backport the 8215888 fix to JDK11u. The fix has been in JDK 12 branch for a couple of days now and passed nightly testing. The backport bug request is at: https://bugs.openjdk.java.net/browse/JDK-8216290 The backport webrev is at: http://cr.openjdk.java.net/~sviswanathan/8216290/webrev.00/ Best Regards, Sandhya -------------- next part -------------- An HTML attachment was scrubbed... URL: From kim.barrett at oracle.com Mon Jan 7 20:50:54 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 7 Jan 2019 15:50:54 -0500 Subject: RFR: 8216154: C4819 warnings at HotSpot sources on Windows In-Reply-To: <3a1836446f59ae6ac89a00dc09867c1d1e97e157.camel@oracle.com> References: <9b1bd147-a261-95df-18f8-a5476415d0a4@gmail.com> <0E131429-5B2F-4A2E-980A-8966354AB7F4@oracle.com> <3a1836446f59ae6ac89a00dc09867c1d1e97e157.camel@oracle.com> Message-ID: <884D1538-A96A-41D3-8039-103C99BA6A30@oracle.com> > On Jan 7, 2019, at 8:20 AM, Thomas Schatzl wrote: > > Hi, > > On Mon, 2019-01-07 at 21:36 +0900, Yasumasa Suenaga wrote: >> Hi Kim, >> >> On 2019/01/07 7:18, Kim Barrett wrote: >>>> On Jan 6, 2019, at 12:54 PM, Kim Barrett >>>> wrote: >>>> >>>>> On Jan 6, 2019, at 7:53 AM, Yasumasa Suenaga < >>>>> yasuenag at gmail.com> wrote: >>>>> >>>>> Hi Kim, >>>>> >>>>> Thank you for your comment. >>>>> I uploaded new webrev to use pragma warning push/pop: >>>>> >>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.01/ >>>>> >>>>> >>>>> Please review again. >>>> >>>> Looks good. > > I tried to verify these problems on these two files as suggested with > "iconv -f US-ASCII -t UTF8 " which errored out on > codeHeapState.cpp as expected but there has been no error with > methodMatcher.cpp. Am I doing something wrong? > > I am fine with that change if it is really needed for successful > compliation :) I just can't find the non-US-ASCII character used in the > line indicated by the error message. The problem is in RANGEBASE, which is referenced directly and appended with other strings into RANGE0 and RANGESLASH, all of which are only referenced in parse_method_pattern. RANGEBASE is strings of '\xXX' encoded characters. At the source level this is all fine. Even after preprocessing it should all be fine, as the string/char encoding reduction doesn't happen until translation phase 5, e.g. after preprocessing. But during that encoding reduction the compiler is noticing that some of the encodings (or sequence thereof?) don't map to valid characters in the currently selected code page for the OS (Japanese in Yasumasa's case). So it complains. It's kind of an annoying complaint, for several reasons, but oh well. It is because the warning is triggered by that encoding reduction during compilation, rather than by literal characters in the source code, that nothing problematic shows up with iconv/grep/&etc. From kim.barrett at oracle.com Mon Jan 7 20:54:42 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 7 Jan 2019 15:54:42 -0500 Subject: RFR: 8216154: C4819 warnings at HotSpot sources on Windows In-Reply-To: References: <9b1bd147-a261-95df-18f8-a5476415d0a4@gmail.com> <0E131429-5B2F-4A2E-980A-8966354AB7F4@oracle.com> Message-ID: <34070646-9304-4EBE-BB9D-0FE5E6BFF2FF@oracle.com> > On Jan 7, 2019, at 7:36 AM, Yasumasa Suenaga wrote: > > Hi Kim, > > On 2019/01/07 7:18, Kim Barrett wrote: >>> On Jan 6, 2019, at 12:54 PM, Kim Barrett wrote: >>> >>>> On Jan 6, 2019, at 7:53 AM, Yasumasa Suenaga wrote: >>>> >>>> Hi Kim, >>>> >>>> Thank you for your comment. >>>> I uploaded new webrev to use pragma warning push/pop: >>>> >>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.01/ >>>> >>>> >>>> Please review again. >>> >>> Looks good. >> It later occurred to me to wonder whether _WINDOWS was the right macro to conditionalize >> on. All other uses of #pragma warning push/pop (there are 5 in HotSpot) use _MSC_VER. > > I updated webrev to use _MSC_VER. Is it ok? > > http://cr.openjdk.java.net/~ysuenaga/JDK-8216154/webrev.02/ Thanks for doing that. I don?t know that _WINDOWS was actually wrong, or that _MSC_VER is actually better, but it seems better to be consistent about it. And sorry for not noticing earlier. Looks good. From david.holmes at oracle.com Mon Jan 7 21:21:18 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 8 Jan 2019 07:21:18 +1000 Subject: RFR: 8216262: Remove develop flag DelayCompilationDuringStartup In-Reply-To: <818084dc-3e98-97da-20f4-aa00f3f6545e@oracle.com> References: <64522ad1-1473-8012-b63a-5309bc72c5cc@oracle.com> <9a9f466b-5ffc-06de-9fde-6a8ef78622ee@oracle.com> <2708906c-c27e-4514-3066-0f2a86fbae9e@oracle.com> <818084dc-3e98-97da-20f4-aa00f3f6545e@oracle.com> Message-ID: <2c837971-967c-285c-f85d-a88cfe82aac3@oracle.com> On 7/01/2019 11:31 pm, Claes Redestad wrote: > > > On 2019-01-07 14:01, Claes Redestad wrote: >>> >>> Normally we would follow a staged removal process: deprecate, >>> obsolete, then expire - see arguments.cpp and special_jvm_flags >>> table. In this case we can probably start at obsoletion, but that >>> would leave expiration for JDK 14. Or compiler folk can argue for / >>> justify immediate full expiration/removal. >> >> I'm under the impression this process does not apply to develop flags >> (which are not visible an anything but debug builds)? > > We've removed develop flags without obsoletion + expiry many times in > the past[1], and while this goes against the written down expiration > in arguments.cpp, I believe it to be a misguided recommendation for > develop flags. There have been and still can be exceptions depending on the actual flag but the general guideline is: * To remove internal options (e.g. diagnostic, experimental, develop options), use * a 2-step model adding major release numbers to the obsolete and expire columns. Compiler folk can identify whether this flag can be expired immediately. Thanks, David > /Claes > > [1] > https://bugs.openjdk.java.net/browse/JDK-8191870 > https://bugs.openjdk.java.net/browse/JDK-8132318 > https://bugs.openjdk.java.net/browse/JDK-8186042 > https://bugs.openjdk.java.net/browse/JDK-8180423 > https://bugs.openjdk.java.net/browse/JDK-8058259 From eric.caspole at oracle.com Mon Jan 7 22:47:00 2019 From: eric.caspole at oracle.com (Eric Caspole) Date: Mon, 7 Jan 2019 17:47:00 -0500 Subject: RFR: 8076988: reevaluate trivial method policy Message-ID: Hi everyone, Could I get reviews or comments for a fix/simplification of the trivial method policy. I have an internal benchmark where a very hot "trivial" method gets compiled at level 1 and it leads to a ~9% regression compared to getting compiled with C2 level 4. Others have expressed thoughts that this policy might now not as useful as originally intended. I have run performance testing of throughput and startup time with no noticeable regressions. This webrev passed regular tier1 and tier 2 testing. Thanks, Eric JBS: https://bugs.openjdk.java.net/browse/JDK-8076988 Webrev: http://cr.openjdk.java.net/~ecaspole/JDK-8076988/01/webrev/ From shade at redhat.com Mon Jan 7 22:51:14 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 7 Jan 2019 23:51:14 +0100 Subject: RFR: 8076988: reevaluate trivial method policy In-Reply-To: References: Message-ID: <245f8a33-56f4-fb63-a592-014db7d3db82@redhat.com> On 1/7/19 11:47 PM, Eric Caspole wrote: > JBS: > https://bugs.openjdk.java.net/browse/JDK-8076988 > > Webrev: > http://cr.openjdk.java.net/~ecaspole/JDK-8076988/01/webrev/ I like it. Accessors and constant getters may also compile better with C2, especially with advanced GCs, but that might not be as significant. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From claes.redestad at oracle.com Mon Jan 7 23:13:26 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 8 Jan 2019 00:13:26 +0100 Subject: RFR: 8216262: Remove develop flag DelayCompilationDuringStartup In-Reply-To: <2c837971-967c-285c-f85d-a88cfe82aac3@oracle.com> References: <64522ad1-1473-8012-b63a-5309bc72c5cc@oracle.com> <9a9f466b-5ffc-06de-9fde-6a8ef78622ee@oracle.com> <2708906c-c27e-4514-3066-0f2a86fbae9e@oracle.com> <818084dc-3e98-97da-20f4-aa00f3f6545e@oracle.com> <2c837971-967c-285c-f85d-a88cfe82aac3@oracle.com> Message-ID: On 2019-01-07 22:21, David Holmes wrote: >>> >>> I'm under the impression this process does not apply to develop flags >>> (which are not visible an anything but debug builds)? >> >> We've removed develop flags without obsoletion + expiry many times in >> the past[1], and while this goes against the written down expiration >> in arguments.cpp, I believe it to be a misguided recommendation for >> develop flags. > > There have been and still can be exceptions depending on the actual > flag but the general guideline is: > > ?* To remove internal options (e.g. diagnostic, experimental, develop > options), use > ?* a 2-step model adding major release numbers to the obsolete and > expire columns. > > Compiler folk can identify whether this flag can be expired immediately. To me it seems to be the general rule rather than an exception lately, and see no point in sticking to that recommendation. I've filed https://bugs.openjdk.java.net/browse/JDK-8216311 to drop develop flags from that recommendation. /Claes From david.holmes at oracle.com Tue Jan 8 01:27:47 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 8 Jan 2019 11:27:47 +1000 Subject: RFR: 8216262: Remove develop flag DelayCompilationDuringStartup In-Reply-To: References: <64522ad1-1473-8012-b63a-5309bc72c5cc@oracle.com> <9a9f466b-5ffc-06de-9fde-6a8ef78622ee@oracle.com> <2708906c-c27e-4514-3066-0f2a86fbae9e@oracle.com> <818084dc-3e98-97da-20f4-aa00f3f6545e@oracle.com> <2c837971-967c-285c-f85d-a88cfe82aac3@oracle.com> Message-ID: <3a286b71-c723-0cf3-47d5-e663963d9b11@oracle.com> On 8/01/2019 9:13 am, Claes Redestad wrote: > On 2019-01-07 22:21, David Holmes wrote: >>>> >>>> I'm under the impression this process does not apply to develop flags >>>> (which are not visible an anything but debug builds)? >>> >>> We've removed develop flags without obsoletion + expiry many times in >>> the past[1], and while this goes against the written down expiration >>> in arguments.cpp, I believe it to be a misguided recommendation for >>> develop flags. >> >> There have been and still can be exceptions depending on the actual >> flag but the general guideline is: >> >> ?* To remove internal options (e.g. diagnostic, experimental, develop >> options), use >> ?* a 2-step model adding major release numbers to the obsolete and >> expire columns. >> >> Compiler folk can identify whether this flag can be expired immediately. > > To me it seems to be the general rule rather than an exception lately, Perhaps ... I didn't do a complete census. I see the process being used: 8198635: Remove unused safepoint message functions and ShowSafepointMsgs and also not used: 6909265: assert(_OnDeck != Self->_MutexEvent,"invariant") with -XX:+PrintMallocFree but in this case using the flag led to an assertion failure so it was a reasonable assumption that the flag was not actually being used in practice and so could be immediately removed. > and see no point in > sticking to that recommendation. I've filed > https://bugs.openjdk.java.net/browse/JDK-8216311 > to drop develop flags from that recommendation. Noted. David > /Claes > > From dean.long at oracle.com Tue Jan 8 04:09:22 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Mon, 7 Jan 2019 20:09:22 -0800 Subject: RFR: 8076988: reevaluate trivial method policy In-Reply-To: References: Message-ID: <66aa281f-ae60-5d89-7666-d33cafbb9b6c@oracle.com> Eric, you should be able to revert 8145579 at the same time. dl On 1/7/19 2:47 PM, Eric Caspole wrote: > Hi everyone, > Could I get reviews or comments for a fix/simplification of the > trivial method policy. I have an internal benchmark where a very hot > "trivial" method gets compiled at level 1 and it leads to a ~9% > regression compared to getting compiled with C2 level 4. Others have > expressed thoughts that this policy might now not as useful as > originally intended. I have run performance testing of throughput and > startup time with no noticeable regressions. > > This webrev passed regular tier1 and tier 2 testing. > Thanks, > Eric > > > JBS: > https://bugs.openjdk.java.net/browse/JDK-8076988 > > Webrev: > http://cr.openjdk.java.net/~ecaspole/JDK-8076988/01/webrev/ From OGATAK at jp.ibm.com Tue Jan 8 04:59:42 2019 From: OGATAK at jp.ibm.com (Kazunori Ogata) Date: Tue, 8 Jan 2019 13:59:42 +0900 Subject: [8u] RFR for backport of 8154156: PPC64: improve array copy stubs by using vector instructions In-Reply-To: <6adf0a283eda47b29df02a3a2d8550ee@sap.com> References: <6adf0a283eda47b29df02a3a2d8550ee@sap.com> Message-ID: Hi Martin, Thank you for reviewing the patch. I'll submit RFR to jdk8u-dev mailing list referring your reply. Regards, Ogata "Doerr, Martin" wrote on 2019/01/07 22:08:57: > From: "Doerr, Martin" > To: Kazunori Ogata , "hotspot-compiler- > dev at openjdk.java.net" , "ppc-aix- > port-dev at openjdk.java.net" > Date: 2019/01/07 22:09 > Subject: RE: [8u] RFR for backport of 8154156: PPC64: improve array copy > stubs by using vector instructions > > Hi Ogata, > > looks good to me. However, I'm not a jdk8u reviewer. > > Best regards, > Martin > > > -----Original Message----- > From: ppc-aix-port-dev On > Behalf Of Kazunori Ogata > Sent: Montag, 7. Januar 2019 06:14 > To: hotspot-compiler-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > Subject: Re: [8u] RFR for backport of 8154156: PPC64: improve array copy > stubs by using vector instructions > > Hi, > > Ping. Can anyone review this enhancement backport request? > > Regards, > Ogata > > > Kazunori Ogata/Japan/IBM wrote on 2018/12/18 23:41:16: > > > From: Kazunori Ogata/Japan/IBM > > To: hotspot-compiler-dev at openjdk.java.net, > ppc-aix-port-dev at openjdk.java.net > > Date: 2018/12/18 23:41 > > Subject: [8u] RFR for backport of 8154156: PPC64: improve array copy > stubs > > by using vector instructions > > > > Hi, > > > > May I get review for enhancement backport of 8154156: PPC64: improve > array > > copy stubs by using vector instructions? > > > > To make this patch buildable (and usable by other planned backports > listed > > in [1]), I cherry picked config_dscr() and its dependent code from [2,3] > > > and has_mfdscr() from [4]. > > > > Original patch: http://hg.openjdk.java.net/jdk/jdk/rev/c9d756fa846e > > Weberv: http://cr.openjdk.java.net/~horii/jdk8u_aes_be/8154156/webrev.01/ > > > > I confirmed it was buildable for both relase and fastdebug builds, and > > JTREG caused no degradation. > > > > Refs: > > [1] http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2018-December/ > 003818.html > > [2] 8149655: PPC64: Implement CompactString intrinsics > > http://hg.openjdk.java.net/jdk/jdk/rev/6241574f5982 > > [3] 8080684: PPC64: Fix little-endian build after "8077838: Recent > > developments for ppc" > > http://hg.openjdk.java.net/jdk/jdk/rev/12ccf8b26eb0 > > [4] 8077838: Recent developments for ppc. > > http://hg.openjdk.java.net/jdk/jdk/rev/c703c89fddbf > > > > Regards, > > Ogata > > From vladimir.kozlov at oracle.com Tue Jan 8 06:27:24 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 7 Jan 2019 22:27:24 -0800 Subject: RFR: 8216262: Remove develop flag DelayCompilationDuringStartup In-Reply-To: <64522ad1-1473-8012-b63a-5309bc72c5cc@oracle.com> References: <64522ad1-1473-8012-b63a-5309bc72c5cc@oracle.com> Message-ID: <13ef0777-a122-555d-4719-ff09c17e674e@oracle.com> This mechanism was added before I joined the team. I see that it is present from day one. I speculate it was added to avoid allocating CPU (very limited at that time) cycles to compilation during VM startup. I agree that currently it is not the case since we have enough cyles and need to compile MHs very eagerly. JVMCI has other mechanism which delay compilation until it is initialized. And I don't think we should delay usage of AOT code. In short - I agree with changes and removal this archaic feature. I would suggest immediate removal (or shortest available). Thanks, Vladimir On 1/7/19 4:36 AM, Claes Redestad wrote: > Hi, > > DelayCompilationAtStartup doesn't delay any compilations. > > Webrev: http://cr.openjdk.java.net/~redestad/8216262/open.00/ > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8216262 > > Testing: tier1 > > Thanks! > > /Claes From Nick.Gasson at arm.com Tue Jan 8 08:03:43 2019 From: Nick.Gasson at arm.com (Nick Gasson (Arm Technology China)) Date: Tue, 8 Jan 2019 08:03:43 +0000 Subject: RFR: 8216350: AArch64: monitor unlock fast path not called Message-ID: <19afbcdf-6d69-88ca-1794-c03e8e81f171@arm.com> Hi, While looking at the profiling output of some micro-benchmarks for locking on AArch64, I noticed that the monitor unlock fast-path in aarch64_enc_fast_unlock in aarch64.ad (under label `object_has_monitor') is almost never executed, even though the lock in the test is inflated. In order to branch to this fast-path we check if bit #1 is set in the displaced header word on the stack: __ tbnz(disp_hdr, exact_log2(markOopDesc::monitor_value), object_has_monitor); But in the common case the value in the dhw is set by the monitor locking fast-path in aarch64_enc_fast_lock, where we use the pointer to the dhw as an arbitrary non-null value. But the lower three bits of this pointer will always be zero, and so won't trigger the unlock fast-path which is looking for bit #1 set, and we will fall through to call the runtime to unlock the monitor. // store a non-null value into the box. __ str(box, Address(box, BasicLock::displaced_header_offset_in_bytes())); It seems that the unlock fast-path will only be executed when the monitor was originally locked by the runtime (e.g. when the lock was first inflated), because ObjectSynchronizer::slow_enter will store markOopDesc::unused_mark into the dhw, and this value has bit #1 set. Can someone help me review this change to aarch64_enc_fast_lock to use markOopDesc::unused_mark as the arbitrary non-null value rather than `box' to match ObjectSynchronizer::slow_enter? Webrev: http://cr.openjdk.java.net/~njian/8216350/webrev.0/ Bug: https://bugs.openjdk.java.net/browse/JDK-8216350 Also removed an unnecessary double branch in the unlock code. Ran jtreg + jcstress. I also added a new micro-benchmark to test/micro/org/openjdk/bench/vm/lang/LockUnlock.java so you can see this behaviour: Without patch: Result "org.openjdk.bench.vm.lang.LockUnlock.testContendedLock": 597.855 ?(99.9%) 73.183 ns/op [Average] (min, avg, max) = (438.862, 597.855, 861.028), stdev = 97.697 CI (99.9%): [524.672, 671.038] (assumes normal distribution) With patch: Result "org.openjdk.bench.vm.lang.LockUnlock.testContendedLock": 219.067 ?(99.9%) 21.146 ns/op [Average] (min, avg, max) = (176.379, 219.067, 300.186), stdev = 28.229 CI (99.9%): [197.921, 240.212] (assumes normal distribution) This is with -XX:+UseLSE, -UseLSE has a similar improvement. Thanks, Nick From aph at redhat.com Tue Jan 8 08:49:21 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 8 Jan 2019 08:49:21 +0000 Subject: RFR: 8216350: AArch64: monitor unlock fast path not called In-Reply-To: <19afbcdf-6d69-88ca-1794-c03e8e81f171@arm.com> References: <19afbcdf-6d69-88ca-1794-c03e8e81f171@arm.com> Message-ID: <6890ad14-5af5-33ba-dcf1-e6e258633a29@redhat.com> On 1/8/19 8:03 AM, Nick Gasson (Arm Technology China) wrote: > It seems that the unlock fast-path will only be executed when the > monitor was originally locked by the runtime (e.g. when the lock was > first inflated), because ObjectSynchronizer::slow_enter will store > markOopDesc::unused_mark into the dhw, and this value has bit #1 set. > > Can someone help me review this change to aarch64_enc_fast_lock to use > markOopDesc::unused_mark as the arbitrary non-null value rather than > `box' to match ObjectSynchronizer::slow_enter? Thanks. How does this compare with the x86 code? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From Nick.Gasson at arm.com Tue Jan 8 09:00:25 2019 From: Nick.Gasson at arm.com (Nick Gasson (Arm Technology China)) Date: Tue, 8 Jan 2019 09:00:25 +0000 Subject: RFR: 8216350: AArch64: monitor unlock fast path not called In-Reply-To: <6890ad14-5af5-33ba-dcf1-e6e258633a29@redhat.com> References: <19afbcdf-6d69-88ca-1794-c03e8e81f171@arm.com> <6890ad14-5af5-33ba-dcf1-e6e258633a29@redhat.com> Message-ID: <02950bd9-254a-ee81-041e-13ef7143a17b@arm.com> Hi Andrew, On 08/01/2019 16:49, Andrew Haley wrote: > > Thanks. How does this compare with the x86 code? > In macroAssembler_x86.cpp MacroAssembler::fast_lock the _LP64 version also uses markOopDesc::unused_mark() // Unconditionally set box->_displaced_header = markOopDesc::unused_mark(). // Without cast to int32_t movptr will destroy r10 which is typically obj. movptr(Address(boxReg, 0), (int32_t)intptr_t(markOopDesc::unused_mark())); (The !_LP64 version uses the literal "3" which is just markOopDesc::unused_mark anyway.) And then in the x86 fast_unlock they are testing the same bit as AArch64: testptr(tmpReg, markOopDesc::monitor_value); // Inflated? Thanks, Nick From aph at redhat.com Tue Jan 8 09:02:48 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 8 Jan 2019 09:02:48 +0000 Subject: RFR: 8216350: AArch64: monitor unlock fast path not called In-Reply-To: <02950bd9-254a-ee81-041e-13ef7143a17b@arm.com> References: <19afbcdf-6d69-88ca-1794-c03e8e81f171@arm.com> <6890ad14-5af5-33ba-dcf1-e6e258633a29@redhat.com> <02950bd9-254a-ee81-041e-13ef7143a17b@arm.com> Message-ID: <0c1968bf-d9c6-cbc5-70d9-cd576938dfd2@redhat.com> On 1/8/19 9:00 AM, Nick Gasson (Arm Technology China) wrote: > Hi Andrew, > > On 08/01/2019 16:49, Andrew Haley wrote: >> >> Thanks. How does this compare with the x86 code? >> > > In macroAssembler_x86.cpp MacroAssembler::fast_lock the _LP64 version > also uses markOopDesc::unused_mark() > > // Unconditionally set box->_displaced_header = > markOopDesc::unused_mark(). > // Without cast to int32_t movptr will destroy r10 which is typically > obj. > movptr(Address(boxReg, 0), > (int32_t)intptr_t(markOopDesc::unused_mark())); > > (The !_LP64 version uses the literal "3" which is just > markOopDesc::unused_mark anyway.) > > And then in the x86 fast_unlock they are testing the same bit as AArch64: > > testptr(tmpReg, markOopDesc::monitor_value); // Inflated? OK, the patch is good. Thanks. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Tue Jan 8 09:42:19 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 8 Jan 2019 10:42:19 +0100 Subject: RFR(S): 8214862: assert(proj != __null) at compile.cpp:3251 In-Reply-To: <1632cec4-c3a1-239a-9f95-3b991c68ff97@oracle.com> References: <87d0qfhtyo.fsf@redhat.com> <3167fa2c-a9bd-ae8d-a084-7a09275b35e1@oracle.com> <874lbpish9.fsf@redhat.com> <19387ab7-2dbc-61ed-5722-ff5ecbcc3b51@oracle.com> <878t0thi36.fsf@redhat.com> <1632cec4-c3a1-239a-9f95-3b991c68ff97@oracle.com> Message-ID: <1b9a2e27-b61c-97ec-fc9b-c7460237684a@oracle.com> Hi Roland, > http://cr.openjdk.java.net/~roland/8214862/webrev.00/ This looks reasonable to me as well. Ship it! Best regards, Tobias From tobias.hartmann at oracle.com Tue Jan 8 09:45:55 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 8 Jan 2019 10:45:55 +0100 Subject: RFR: 8216262: Remove develop flag DelayCompilationDuringStartup In-Reply-To: <13ef0777-a122-555d-4719-ff09c17e674e@oracle.com> References: <64522ad1-1473-8012-b63a-5309bc72c5cc@oracle.com> <13ef0777-a122-555d-4719-ff09c17e674e@oracle.com> Message-ID: Hi Claes, On 08.01.19 07:27, Vladimir Kozlov wrote: > In short - I agree with changes and removal this archaic feature. > I would suggest immediate removal (or shortest available). +1 Best regards, Tobias From claes.redestad at oracle.com Tue Jan 8 09:53:35 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 8 Jan 2019 10:53:35 +0100 Subject: RFR: 8216262: Remove develop flag DelayCompilationDuringStartup In-Reply-To: References: <64522ad1-1473-8012-b63a-5309bc72c5cc@oracle.com> <13ef0777-a122-555d-4719-ff09c17e674e@oracle.com> Message-ID: Vladimir, Tobias, thanks for reviewing! /Claes On 2019-01-08 10:45, Tobias Hartmann wrote: > Hi Claes, > > On 08.01.19 07:27, Vladimir Kozlov wrote: >> In short - I agree with changes and removal this archaic feature. >> I would suggest immediate removal (or shortest available). > > +1 > > Best regards, > Tobias > From tobias.hartmann at oracle.com Tue Jan 8 09:52:15 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 8 Jan 2019 10:52:15 +0100 Subject: RFR: 8076988: reevaluate trivial method policy In-Reply-To: <66aa281f-ae60-5d89-7666-d33cafbb9b6c@oracle.com> References: <66aa281f-ae60-5d89-7666-d33cafbb9b6c@oracle.com> Message-ID: Hi Eric, looks good to me too. On 08.01.19 05:09, dean.long at oracle.com wrote: > Eric, you should be able to revert 8145579 at the same time. +1 Best regards, Tobias From tobias.hartmann at oracle.com Tue Jan 8 10:00:52 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 8 Jan 2019 11:00:52 +0100 Subject: RFR (XS): 8216290: Backport of 8215888 to JDK11u (Register to register spill may use AVX 512 move instruction on unsupported platform) In-Reply-To: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A4AC0F@FMSMSX126.amr.corp.intel.com> References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A4AC0F@FMSMSX126.amr.corp.intel.com> Message-ID: Hi Sandhya, this looks good to me. Please request approval according to: https://openjdk.java.net/projects/jdk-updates/approval.html Best regards, Tobias On 07.01.19 21:44, Viswanathan, Sandhya wrote: > This is a request to backport the 8215888 fix to > JDK11u. The fix has been in JDK 12 branch for a couple of days now and passed nightly testing. > > ? > > The backport bug request is at: > > https://bugs.openjdk.java.net/browse/JDK-8216290 > > ? > > The backport webrev is at: > > http://cr.openjdk.java.net/~sviswanathan/8216290/webrev.00/ > > ? > > Best Regards, > > Sandhya > > ? > From rwestrel at redhat.com Tue Jan 8 12:27:27 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 08 Jan 2019 13:27:27 +0100 Subject: RFR(S): 8214862: assert(proj != __null) at compile.cpp:3251 In-Reply-To: <1b9a2e27-b61c-97ec-fc9b-c7460237684a@oracle.com> References: <87d0qfhtyo.fsf@redhat.com> <3167fa2c-a9bd-ae8d-a084-7a09275b35e1@oracle.com> <874lbpish9.fsf@redhat.com> <19387ab7-2dbc-61ed-5722-ff5ecbcc3b51@oracle.com> <878t0thi36.fsf@redhat.com> <1632cec4-c3a1-239a-9f95-3b991c68ff97@oracle.com> <1b9a2e27-b61c-97ec-fc9b-c7460237684a@oracle.com> Message-ID: <87k1jf1gi8.fsf@redhat.com> Thanks for the reviews, Vladimir & Tobias. Roland. From claes.redestad at oracle.com Tue Jan 8 13:01:06 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 8 Jan 2019 14:01:06 +0100 Subject: RFR: 8216359: Remove develop flags TraceCompilationPolicy and TimeCompilationPolicy Message-ID: Hi, the develop flags Trace- and TimeCompilationPolicy are not implemented for any of the current default compilation policy implementations and should be removed. (They _are_ implemented for StackWalkCompPolicy which I'm proposing to be deprecated). (This also removes the declaration of _in_vm_startup that should have been removed by JDK-8216262) Webrev: http://cr.openjdk.java.net/~redestad/8216359/open.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8216359 Thanks! /Claes From rwestrel at redhat.com Tue Jan 8 13:12:29 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 08 Jan 2019 14:12:29 +0100 Subject: RFR(S): 8214862: assert(proj != __null) at compile.cpp:3251 In-Reply-To: <3167fa2c-a9bd-ae8d-a084-7a09275b35e1@oracle.com> References: <87d0qfhtyo.fsf@redhat.com> <3167fa2c-a9bd-ae8d-a084-7a09275b35e1@oracle.com> Message-ID: <87h8ej1ef6.fsf@redhat.com> > I would suggest to create a small function (in same file) to call from Compile::Optimize() - we > don't do graph transformations in it but call other functions. > > You don't need to use 'C->' because it is already Compile's method. FTR, I'll push a fix that includes the suggestions above: http://cr.openjdk.java.net/~roland/8214862/webrev.01/ Roland. From nils.eliasson at oracle.com Tue Jan 8 13:50:19 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 8 Jan 2019 14:50:19 +0100 Subject: RFR: 8216359: Remove develop flags TraceCompilationPolicy and TimeCompilationPolicy In-Reply-To: References: Message-ID: <9eb59b35-1f96-18a6-252c-9776e115ac8c@oracle.com> Looks great! // Nils On 2019-01-08 14:01, Claes Redestad wrote: > Hi, > > the develop flags Trace- and TimeCompilationPolicy are not implemented > for any of the current default compilation policy implementations and > should be removed. (They _are_ implemented for StackWalkCompPolicy > which I'm proposing to be deprecated). > > (This also removes the declaration of _in_vm_startup that should have > been removed by JDK-8216262) > > Webrev: http://cr.openjdk.java.net/~redestad/8216359/open.00/ > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8216359 > > Thanks! > > /Claes From tobias.hartmann at oracle.com Tue Jan 8 14:03:20 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 8 Jan 2019 15:03:20 +0100 Subject: RFR: 8216359: Remove develop flags TraceCompilationPolicy and TimeCompilationPolicy In-Reply-To: References: Message-ID: <061751cf-4ff7-34fd-89a9-b2d924ee7cf3@oracle.com> Hi Claes, looks good to me. Best regards, Tobias On 08.01.19 14:01, Claes Redestad wrote: > Hi, > > the develop flags Trace- and TimeCompilationPolicy are not implemented > for any of the current default compilation policy implementations and > should be removed. (They _are_ implemented for StackWalkCompPolicy > which I'm proposing to be deprecated). > > (This also removes the declaration of _in_vm_startup that should have > been removed by JDK-8216262) > > Webrev: http://cr.openjdk.java.net/~redestad/8216359/open.00/ > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8216359 > > Thanks! > > /Claes From per.liden at oracle.com Tue Jan 8 14:32:39 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 8 Jan 2019 15:32:39 +0100 Subject: RFR: 8215708: ZGC: Add missing LoadBarrierNode::size_of() Message-ID: LoadBarrierNode should implement size_of(). Otherwise cloning of such nodes is broken since only part of the object will be copied. This caused incorrect load barriers to be used in random places. For example, we could generate a weak barrier instead of a strong barrier, because the _weak member was not properly initialized when cloned. This patch also implements three other methods (cmp, adr_type and match_edge) with an immediate call to ShouldNotReachHere(). This is a pure safety net to catch any misuse of these. These should never be called, but if they are called today we might not notice and instead silently do the wrong thing. Bug: https://bugs.openjdk.java.net/browse/JDK-8215708 Webrev: http://cr.openjdk.java.net/~pliden/8215708/webrev.0 Testing: tier{1,6,7} /Per From erik.osterlund at oracle.com Tue Jan 8 14:43:46 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 8 Jan 2019 15:43:46 +0100 Subject: RFR: 8215708: ZGC: Add missing LoadBarrierNode::size_of() In-Reply-To: References: Message-ID: Hi Per, Looks good. Thanks, /Erik On 2019-01-08 15:32, Per Liden wrote: > LoadBarrierNode should implement size_of(). Otherwise cloning of such > nodes is broken since only part of the object will be copied. This > caused incorrect load barriers to be used in random places. For > example, we could generate a weak barrier instead of a strong barrier, > because the _weak member was not properly initialized when cloned. > > This patch also implements three other methods (cmp, adr_type and > match_edge) with an immediate call to ShouldNotReachHere(). This is a > pure safety net to catch any misuse of these. These should never be > called, but if they are called today we might not notice and instead > silently do the wrong thing. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8215708 > Webrev: http://cr.openjdk.java.net/~pliden/8215708/webrev.0 > > Testing: tier{1,6,7} > > /Per From per.liden at oracle.com Tue Jan 8 15:04:32 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 8 Jan 2019 16:04:32 +0100 Subject: RFR: 8215708: ZGC: Add missing LoadBarrierNode::size_of() In-Reply-To: References: Message-ID: <47472a6a-b852-afe7-a3e0-104fbbe449d8@oracle.com> Thanks Erik! /Per On 1/8/19 3:43 PM, Erik ?sterlund wrote: > Hi Per, > > Looks good. > > Thanks, > /Erik > > On 2019-01-08 15:32, Per Liden wrote: >> LoadBarrierNode should implement size_of(). Otherwise cloning of such >> nodes is broken since only part of the object will be copied. This >> caused incorrect load barriers to be used in random places. For >> example, we could generate a weak barrier instead of a strong barrier, >> because the _weak member was not properly initialized when cloned. >> >> This patch also implements three other methods (cmp, adr_type and >> match_edge) with an immediate call to ShouldNotReachHere(). This is a >> pure safety net to catch any misuse of these. These should never be >> called, but if they are called today we might not notice and instead >> silently do the wrong thing. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8215708 >> Webrev: http://cr.openjdk.java.net/~pliden/8215708/webrev.0 >> >> Testing: tier{1,6,7} >> >> /Per > From claes.redestad at oracle.com Tue Jan 8 15:25:18 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 8 Jan 2019 16:25:18 +0100 Subject: RFR: 8216359: Remove develop flags TraceCompilationPolicy and TimeCompilationPolicy In-Reply-To: <061751cf-4ff7-34fd-89a9-b2d924ee7cf3@oracle.com> References: <061751cf-4ff7-34fd-89a9-b2d924ee7cf3@oracle.com> Message-ID: <59040f0d-d1d4-9dc9-0539-22da1fa34db1@oracle.com> Nils, Tobias, thanks for reviewing! /Claes On 2019-01-08 15:03, Tobias Hartmann wrote: > Hi Claes, > > looks good to me. > > Best regards, > Tobias > > On 08.01.19 14:01, Claes Redestad wrote: >> Hi, >> >> the develop flags Trace- and TimeCompilationPolicy are not implemented >> for any of the current default compilation policy implementations and >> should be removed. (They _are_ implemented for StackWalkCompPolicy >> which I'm proposing to be deprecated). >> >> (This also removes the declaration of _in_vm_startup that should have >> been removed by JDK-8216262) >> >> Webrev: http://cr.openjdk.java.net/~redestad/8216359/open.00/ >> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8216359 >> >> Thanks! >> >> /Claes From eric.caspole at oracle.com Tue Jan 8 15:22:05 2019 From: eric.caspole at oracle.com (Eric Caspole) Date: Tue, 8 Jan 2019 10:22:05 -0500 Subject: RFR: 8076988: reevaluate trivial method policy In-Reply-To: <66aa281f-ae60-5d89-7666-d33cafbb9b6c@oracle.com> References: <66aa281f-ae60-5d89-7666-d33cafbb9b6c@oracle.com> Message-ID: <5d86dac1-364d-ce68-feb6-f01f8cdfcc9d@oracle.com> Hi Dean, I will make a new CR to revert 8145579 and do that as a separate change next, ok? Eric On 1/7/19 23:09, dean.long at oracle.com wrote: > Eric, you should be able to revert 8145579 at the same time. > > dl > > On 1/7/19 2:47 PM, Eric Caspole wrote: >> Hi everyone, >> Could I get reviews or comments for a fix/simplification of the >> trivial method policy. I have an internal benchmark where a very hot >> "trivial" method gets compiled at level 1 and it leads to a ~9% >> regression compared to getting compiled with C2 level 4. Others have >> expressed thoughts that this policy might now not as useful as >> originally intended. I have run performance testing of throughput and >> startup time with no noticeable regressions. >> >> This webrev passed regular tier1 and tier 2 testing. >> Thanks, >> Eric >> >> >> JBS: >> https://bugs.openjdk.java.net/browse/JDK-8076988 >> >> Webrev: >> http://cr.openjdk.java.net/~ecaspole/JDK-8076988/01/webrev/ > From dean.long at oracle.com Tue Jan 8 16:11:22 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 8 Jan 2019 08:11:22 -0800 Subject: RFR: 8076988: reevaluate trivial method policy In-Reply-To: <5d86dac1-364d-ce68-feb6-f01f8cdfcc9d@oracle.com> References: <66aa281f-ae60-5d89-7666-d33cafbb9b6c@oracle.com> <5d86dac1-364d-ce68-feb6-f01f8cdfcc9d@oracle.com> Message-ID: OK. dl On 1/8/19 7:22 AM, Eric Caspole wrote: > Hi Dean, I will make a new CR to revert 8145579 and do that as a > separate change next, ok? > Eric > > > On 1/7/19 23:09, dean.long at oracle.com wrote: >> Eric, you should be able to revert 8145579 at the same time. >> >> dl >> >> On 1/7/19 2:47 PM, Eric Caspole wrote: >>> Hi everyone, >>> Could I get reviews or comments for a fix/simplification of the >>> trivial method policy. I have an internal benchmark where a very hot >>> "trivial" method gets compiled at level 1 and it leads to a ~9% >>> regression compared to getting compiled with C2 level 4. Others have >>> expressed thoughts that this policy might now not as useful as >>> originally intended. I have run performance testing of throughput >>> and startup time with no noticeable regressions. >>> >>> This webrev passed regular tier1 and tier 2 testing. >>> Thanks, >>> Eric >>> >>> >>> JBS: >>> https://bugs.openjdk.java.net/browse/JDK-8076988 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~ecaspole/JDK-8076988/01/webrev/ >> From nils.eliasson at oracle.com Tue Jan 8 17:04:29 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 8 Jan 2019 18:04:29 +0100 Subject: RFR: 8215708: ZGC: Add missing LoadBarrierNode::size_of() In-Reply-To: References: Message-ID: <030aede6-259b-242f-738a-5f39d0ea8144@oracle.com> Looks good! // Nils On 2019-01-08 15:32, Per Liden wrote: > LoadBarrierNode should implement size_of(). Otherwise cloning of such > nodes is broken since only part of the object will be copied. This > caused incorrect load barriers to be used in random places. For > example, we could generate a weak barrier instead of a strong barrier, > because the _weak member was not properly initialized when cloned. > > This patch also implements three other methods (cmp, adr_type and > match_edge) with an immediate call to ShouldNotReachHere(). This is a > pure safety net to catch any misuse of these. These should never be > called, but if they are called today we might not notice and instead > silently do the wrong thing. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8215708 > Webrev: http://cr.openjdk.java.net/~pliden/8215708/webrev.0 > > Testing: tier{1,6,7} > > /Per From nils.eliasson at oracle.com Tue Jan 8 17:23:14 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 8 Jan 2019 18:23:14 +0100 Subject: RFR (S): 8216372: Put C2 load barrier stub routines in separate codeblobs Message-ID: Hi, Please review this small clean up of the load barrier stub routine generation. The main improvement is having separate blobs for strong and weak barriers. This gives us PrintAssembly output that is clearly annotated with the barrier type: 0x00007f8fb91b4b58: lea 0x10(%r9),%r11 ??0x00007f8fb91b4b5c: callq 0x00007f8fb9009a60 ; {runtime_call zgc_load_barrier_weak_stubs} ??0x00007f8fb91b4b61: jmpq 0x00007f8fb91b4a23 Bug: https://bugs.openjdk.java.net/browse/JDK-8216372 Webrev: http://cr.openjdk.java.net/~neliasso/8216372/webrev.02/ Regards, Nils -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.osterlund at oracle.com Tue Jan 8 17:39:16 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 8 Jan 2019 18:39:16 +0100 Subject: RFR (S): 8216372: Put C2 load barrier stub routines in separate codeblobs In-Reply-To: References: Message-ID: <3b8359a7-7038-92f7-7030-69f6fce69e2c@oracle.com> Hi Nils, Looks good. Thanks, /Erik On 2019-01-08 18:23, Nils Eliasson wrote: > Hi, > > Please review this small clean up of the load barrier stub routine > generation. The main improvement is having separate blobs for strong and > weak barriers. This gives us PrintAssembly output that is clearly > annotated with the barrier type: > > 0x00007f8fb91b4b58: lea 0x10(%r9),%r11 > ??0x00007f8fb91b4b5c: callq 0x00007f8fb9009a60 ; {runtime_call > zgc_load_barrier_weak_stubs} > ??0x00007f8fb91b4b61: jmpq 0x00007f8fb91b4a23 > > Bug: https://bugs.openjdk.java.net/browse/JDK-8216372 > > Webrev: http://cr.openjdk.java.net/~neliasso/8216372/webrev.02/ > > Regards, > > Nils > From sandhya.viswanathan at intel.com Tue Jan 8 18:42:45 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 8 Jan 2019 18:42:45 +0000 Subject: RFR (XS): 8216290: Backport of 8215888 to JDK11u (Register to register spill may use AVX 512 move instruction on unsupported platform) In-Reply-To: References: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A4AC0F@FMSMSX126.amr.corp.intel.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A4B182@FMSMSX126.amr.corp.intel.com> Hi Tobias, Thanks, I have updated the bug request as per the steps described in the link you gave. Best Regards, Sandhya -----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Tuesday, January 08, 2019 2:01 AM To: Viswanathan, Sandhya ; hotspot compiler ; vladimir.kozlov at oracle.com Subject: Re: RFR (XS): 8216290: Backport of 8215888 to JDK11u (Register to register spill may use AVX 512 move instruction on unsupported platform) Hi Sandhya, this looks good to me. Please request approval according to: https://openjdk.java.net/projects/jdk-updates/approval.html Best regards, Tobias On 07.01.19 21:44, Viswanathan, Sandhya wrote: > This is a request to backport the 8215888 > fix to JDK11u. The fix has been in JDK 12 branch for a couple of days now and passed nightly testing. > > ? > > The backport bug request is at: > > https://bugs.openjdk.java.net/browse/JDK-8216290 > > ? > > The backport webrev is at: > > http://cr.openjdk.java.net/~sviswanathan/8216290/webrev.00/ > > ? > > Best Regards, > > Sandhya > > ? > From eric.caspole at oracle.com Tue Jan 8 19:27:19 2019 From: eric.caspole at oracle.com (Eric Caspole) Date: Tue, 8 Jan 2019 14:27:19 -0500 Subject: RFR (S) 8216375: Revert JDK-8145579 after JDK-8076988 is resolved Message-ID: <916836c2-7a25-76ff-9fca-3ed0547a15c7@oracle.com> Hi everybody, Could I get reviews on this small change. As Dean suggested, now that JDK-8076988 to simplify the trivial method check is done, the change of JDK-8145579 is no longer needed, so this webrev reverts it. This passed tier1 and tier2 testing. Thanks, Eric JBS: https://bugs.openjdk.java.net/browse/JDK-8216375 Webrev: http://cr.openjdk.java.net/~ecaspole/JDK-8216375/01/webrev/ From per.liden at oracle.com Tue Jan 8 20:32:36 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 8 Jan 2019 21:32:36 +0100 Subject: RFR: 8215708: ZGC: Add missing LoadBarrierNode::size_of() In-Reply-To: <030aede6-259b-242f-738a-5f39d0ea8144@oracle.com> References: <030aede6-259b-242f-738a-5f39d0ea8144@oracle.com> Message-ID: <7c2351e1-e89f-f3dd-8693-b960c2325eae@oracle.com> Thanks Nils! /Per On 01/08/2019 06:04 PM, Nils Eliasson wrote: > Looks good! > > // Nils > > On 2019-01-08 15:32, Per Liden wrote: >> LoadBarrierNode should implement size_of(). Otherwise cloning of such >> nodes is broken since only part of the object will be copied. This >> caused incorrect load barriers to be used in random places. For >> example, we could generate a weak barrier instead of a strong barrier, >> because the _weak member was not properly initialized when cloned. >> >> This patch also implements three other methods (cmp, adr_type and >> match_edge) with an immediate call to ShouldNotReachHere(). This is a >> pure safety net to catch any misuse of these. These should never be >> called, but if they are called today we might not notice and instead >> silently do the wrong thing. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8215708 >> Webrev: http://cr.openjdk.java.net/~pliden/8215708/webrev.0 >> >> Testing: tier{1,6,7} >> >> /Per From per.liden at oracle.com Tue Jan 8 20:34:53 2019 From: per.liden at oracle.com (Per Liden) Date: Tue, 8 Jan 2019 21:34:53 +0100 Subject: RFR (S): 8216372: Put C2 load barrier stub routines in separate codeblobs In-Reply-To: References: Message-ID: <150d31c1-e59d-13e1-ecda-43d25e38dcce@oracle.com> Looks good! /Per On 01/08/2019 06:23 PM, Nils Eliasson wrote: > Hi, > > Please review this small clean up of the load barrier stub routine > generation. The main improvement is having separate blobs for strong and > weak barriers. This gives us PrintAssembly output that is clearly > annotated with the barrier type: > > 0x00007f8fb91b4b58: lea 0x10(%r9),%r11 > 0x00007f8fb91b4b5c: callq 0x00007f8fb9009a60 ; {runtime_call > zgc_load_barrier_weak_stubs} > 0x00007f8fb91b4b61: jmpq 0x00007f8fb91b4a23 > > Bug: https://bugs.openjdk.java.net/browse/JDK-8216372 > > Webrev: http://cr.openjdk.java.net/~neliasso/8216372/webrev.02/ > > Regards, > > Nils > From dean.long at oracle.com Wed Jan 9 02:00:02 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 8 Jan 2019 18:00:02 -0800 Subject: RFR (S) 8216375: Revert JDK-8145579 after JDK-8076988 is resolved In-Reply-To: <916836c2-7a25-76ff-9fca-3ed0547a15c7@oracle.com> References: <916836c2-7a25-76ff-9fca-3ed0547a15c7@oracle.com> Message-ID: <27d62e29-aa4d-cdca-d0b8-9af1dd35bb9e@oracle.com> Looks good to me.? Don't forget to update the copyright year. dl On 1/8/19 11:27 AM, Eric Caspole wrote: > Hi everybody, > Could I get reviews on this small change. As Dean suggested, now that > JDK-8076988 to simplify the trivial method check is done, the change > of JDK-8145579 is no longer needed, so this webrev reverts it. > > This passed tier1 and tier2 testing. > Thanks, > Eric > > JBS: > https://bugs.openjdk.java.net/browse/JDK-8216375 > > Webrev: > http://cr.openjdk.java.net/~ecaspole/JDK-8216375/01/webrev/ From derekw at marvell.com Wed Jan 9 02:19:19 2019 From: derekw at marvell.com (Derek White) Date: Wed, 9 Jan 2019 02:19:19 +0000 Subject: [aarch64-port-dev ] RFR: 8216350: AArch64: monitor unlock fast path not called Message-ID: Hi Nick, Very nice find! My only comments are to fix up some comments (pre-existing), and some trivial cleanups of pre-existing code. These are judgement calls, and it would be good to get the approval of at least one Andrew. Comments: 1) The TODO comment before aarch64_enc_fast_unlock() has been done since 2014, so it can go away. 2) In aarch64_enc_fast_lock() and aarch64_enc_fast_unlock(), there are three comment blocks referring to old code using cmpxchgptr. At this point in time I find the new code clearer, and these comments don't add much? Cleanup suggestions (untested!): 3) In aarch64_enc_fast_lock(): // we can use AArch64's bit test and branch here but // markoopDesc does not define a bit index just the bit value // so assert in case the bit pos changes # define __monitor_value_log2 1 assert(markOopDesc::monitor_value == (1 << __monitor_value_log2), "incorrect bit position"); __ tbnz(disp_hdr, __monitor_value_log2, object_has_monitor); # undef __monitor_value_log2 Can be replaced with: __ tbnz(disp_hdr, exact_log2(markOopDesc::monitor_value), object_has_monitor); It looks like this was fixed in several places a long time ago, but this one got missed. 4) Slightly better comment for last instruction of fast_unlock (and explicitly use zr). __ stlr(zr, tmp); // set unowned - Derek --------------------- Patch on original code (not your patch, sorry!) ----------------------------- --- src/hotspot/cpu/aarch64/aarch64.ad +++ src/hotspot/cpu/aarch64/aarch64.ad @@ -3418,13 +3418,7 @@ } // Handle existing monitor - // we can use AArch64's bit test and branch here but - // markoopDesc does not define a bit index just the bit value - // so assert in case the bit pos changes -# define __monitor_value_log2 1 - assert(markOopDesc::monitor_value == (1 << __monitor_value_log2), "incorrect bit position"); - __ tbnz(disp_hdr, __monitor_value_log2, object_has_monitor); -# undef __monitor_value_log2 + __ tbnz(disp_hdr, exact_log2(markOopDesc::monitor_value), object_has_monitor); // Set displaced_header to be (markOop of object | UNLOCK_VALUE). __ orr(disp_hdr, disp_hdr, markOopDesc::unlocked_value); @@ -3455,14 +3449,6 @@ __ b(retry_load); } - // Formerly: - // __ cmpxchgptr(/*oldv=*/disp_hdr, - // /*newv=*/box, - // /*addr=*/oop, - // /*tmp=*/tmp, - // cont, - // /*fail*/NULL); - assert(oopDesc::mark_offset_in_bytes() == 0, "offset of _mark is not 0"); // If the compare-and-exchange succeeded, then we found an unlocked @@ -3511,15 +3497,6 @@ __ bind(fail); } - // Label next; - // __ cmpxchgptr(/*oldv=*/disp_hdr, - // /*newv=*/rthread, - // /*addr=*/tmp, - // /*tmp=*/rscratch1, - // /*succeed*/next, - // /*fail*/NULL); - // __ bind(next); - // store a non-null value into the box. __ str(box, Address(box, BasicLock::displaced_header_offset_in_bytes())); @@ -3544,9 +3521,6 @@ %} - // TODO - // reimplement this with custom cmpxchgptr code - // which avoids some of the unnecessary branching enc_class aarch64_enc_fast_unlock(iRegP object, iRegP box, iRegP tmp, iRegP tmp2) %{ MacroAssembler _masm(&cbuf); Register oop = as_Register($object$$reg); @@ -3597,12 +3571,6 @@ __ b(retry_load); } - // __ cmpxchgptr(/*compare_value=*/box, - // /*exchange_value=*/disp_hdr, - // /*where=*/oop, - // /*result=*/tmp, - // cont, - // /*cas_failed*/NULL); assert(oopDesc::mark_offset_in_bytes() == 0, "offset of _mark is not 0"); __ bind(cas_failed); @@ -3626,7 +3594,7 @@ __ cbnz(rscratch1, cont); // need a release store here __ lea(tmp, Address(tmp, ObjectMonitor::owner_offset_in_bytes())); - __ stlr(rscratch1, tmp); // rscratch1 is zero + __ stlr(zr, tmp); // set unowned __ bind(cont); // flag == EQ indicates success > -----Original Message----- > From: aarch64-port-dev On > Behalf Of Nick Gasson (Arm Technology China) > Sent: Tuesday, January 08, 2019 3:04 AM > To: hotspot-compiler-dev at openjdk.java.net compiler dev at openjdk.java.net> > Cc: nd ; aarch64-port-dev at openjdk.java.net > Subject: [EXT] [aarch64-port-dev ] RFR: 8216350: AArch64: monitor unlock > fast path not called > > ---------------------------------------------------------------------- > Hi, > > While looking at the profiling output of some micro-benchmarks for locking > on AArch64, I noticed that the monitor unlock fast-path in > aarch64_enc_fast_unlock in aarch64.ad (under label `object_has_monitor') is > almost never executed, even though the lock in the test is inflated. > > In order to branch to this fast-path we check if bit #1 is set in the displaced > header word on the stack: > > __ tbnz(disp_hdr, exact_log2(markOopDesc::monitor_value), > object_has_monitor); > > But in the common case the value in the dhw is set by the monitor locking > fast-path in aarch64_enc_fast_lock, where we use the pointer to the dhw as > an arbitrary non-null value. But the lower three bits of this pointer will > always be zero, and so won't trigger the unlock fast-path which is looking for > bit #1 set, and we will fall through to call the runtime to unlock the monitor. > > // store a non-null value into the box. > __ str(box, Address(box, BasicLock::displaced_header_offset_in_bytes())); > > It seems that the unlock fast-path will only be executed when the monitor > was originally locked by the runtime (e.g. when the lock was first inflated), > because ObjectSynchronizer::slow_enter will store > markOopDesc::unused_mark into the dhw, and this value has bit #1 set. > > Can someone help me review this change to aarch64_enc_fast_lock to use > markOopDesc::unused_mark as the arbitrary non-null value rather than `box' > to match ObjectSynchronizer::slow_enter? > > Webrev: http://cr.openjdk.java.net/~njian/8216350/webrev.0/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8216350 > > Also removed an unnecessary double branch in the unlock code. > > Ran jtreg + jcstress. > > I also added a new micro-benchmark to > test/micro/org/openjdk/bench/vm/lang/LockUnlock.java so you can see this > behaviour: > > Without patch: > > Result "org.openjdk.bench.vm.lang.LockUnlock.testContendedLock": > 597.855 ?(99.9%) 73.183 ns/op [Average] > (min, avg, max) = (438.862, 597.855, 861.028), stdev = 97.697 > CI (99.9%): [524.672, 671.038] (assumes normal distribution) > > With patch: > > Result "org.openjdk.bench.vm.lang.LockUnlock.testContendedLock": > 219.067 ?(99.9%) 21.146 ns/op [Average] > (min, avg, max) = (176.379, 219.067, 300.186), stdev = 28.229 > CI (99.9%): [197.921, 240.212] (assumes normal distribution) > > This is with -XX:+UseLSE, -UseLSE has a similar improvement. > > Thanks, > Nick From Nick.Gasson at arm.com Wed Jan 9 02:50:53 2019 From: Nick.Gasson at arm.com (Nick Gasson (Arm Technology China)) Date: Wed, 9 Jan 2019 02:50:53 +0000 Subject: [aarch64-port-dev ] RFR: 8216350: AArch64: monitor unlock fast path not called In-Reply-To: References: Message-ID: <680089e7-ec26-a4cd-6143-4d36182e971a@arm.com> Hi Derek > My only comments are to fix up some comments (pre-existing), and some trivial cleanups of pre-existing code. These are judgement calls, and it would be good to get the approval of at least one Andrew. I agree all of these are good, especially #3 which obscures the symmetry between the lock and unlock functions. But I think we ought to create a separate patch, to separate code cleanup with no functional change from this patch which is a bug fix / functional change? Also two minor things: * There is inconsistent (four space) indentation under "// Check if it is still a light weight lock ..." in aarch64_enc_fast_unlock. * At the end of aarch64_enc_fast_lock there is a commented out block "// PPC port checks the following invariants": I guess we should either implement these if we think they're useful or delete this whole block. (FWIW x86 doesn't do any extra checking #ifdef ASSERT). Finally we could also consider moving these two functions into macroAssembler_aarch64.cpp to match the other ports. Thanks, Nick On 09/01/2019 10:19, Derek White wrote: > Hi Nick, > > Very nice find! > > My only comments are to fix up some comments (pre-existing), and some trivial cleanups of pre-existing code. These are judgement calls, and it would be good to get the approval of at least one Andrew. > > Comments: > 1) The TODO comment before aarch64_enc_fast_unlock() has been done since 2014, so it can go away. > > 2) In aarch64_enc_fast_lock() and aarch64_enc_fast_unlock(), there are three comment blocks referring to old code using cmpxchgptr. At this point in time I find the new code clearer, and these comments don't add much? > > Cleanup suggestions (untested!): > 3) In aarch64_enc_fast_lock(): > // we can use AArch64's bit test and branch here but > // markoopDesc does not define a bit index just the bit value > // so assert in case the bit pos changes > # define __monitor_value_log2 1 > assert(markOopDesc::monitor_value == (1 << __monitor_value_log2), "incorrect bit position"); > __ tbnz(disp_hdr, __monitor_value_log2, object_has_monitor); > # undef __monitor_value_log2 > > Can be replaced with: > __ tbnz(disp_hdr, exact_log2(markOopDesc::monitor_value), object_has_monitor); > It looks like this was fixed in several places a long time ago, but this one got missed. > > 4) Slightly better comment for last instruction of fast_unlock (and explicitly use zr). > __ stlr(zr, tmp); // set unowned > > - Derek > > > --------------------- Patch on original code (not your patch, sorry!) ----------------------------- > --- src/hotspot/cpu/aarch64/aarch64.ad > +++ src/hotspot/cpu/aarch64/aarch64.ad > @@ -3418,13 +3418,7 @@ > } > > // Handle existing monitor > - // we can use AArch64's bit test and branch here but > - // markoopDesc does not define a bit index just the bit value > - // so assert in case the bit pos changes > -# define __monitor_value_log2 1 > - assert(markOopDesc::monitor_value == (1 << __monitor_value_log2), "incorrect bit position"); > - __ tbnz(disp_hdr, __monitor_value_log2, object_has_monitor); > -# undef __monitor_value_log2 > + __ tbnz(disp_hdr, exact_log2(markOopDesc::monitor_value), object_has_monitor); > > // Set displaced_header to be (markOop of object | UNLOCK_VALUE). > __ orr(disp_hdr, disp_hdr, markOopDesc::unlocked_value); > @@ -3455,14 +3449,6 @@ > __ b(retry_load); > } > > - // Formerly: > - // __ cmpxchgptr(/*oldv=*/disp_hdr, > - // /*newv=*/box, > - // /*addr=*/oop, > - // /*tmp=*/tmp, > - // cont, > - // /*fail*/NULL); > - > assert(oopDesc::mark_offset_in_bytes() == 0, "offset of _mark is not 0"); > > // If the compare-and-exchange succeeded, then we found an unlocked > @@ -3511,15 +3497,6 @@ > __ bind(fail); > } > > - // Label next; > - // __ cmpxchgptr(/*oldv=*/disp_hdr, > - // /*newv=*/rthread, > - // /*addr=*/tmp, > - // /*tmp=*/rscratch1, > - // /*succeed*/next, > - // /*fail*/NULL); > - // __ bind(next); > - > // store a non-null value into the box. > __ str(box, Address(box, BasicLock::displaced_header_offset_in_bytes())); > > @@ -3544,9 +3521,6 @@ > > %} > > - // TODO > - // reimplement this with custom cmpxchgptr code > - // which avoids some of the unnecessary branching > enc_class aarch64_enc_fast_unlock(iRegP object, iRegP box, iRegP tmp, iRegP tmp2) %{ > MacroAssembler _masm(&cbuf); > Register oop = as_Register($object$$reg); > @@ -3597,12 +3571,6 @@ > __ b(retry_load); > } > > - // __ cmpxchgptr(/*compare_value=*/box, > - // /*exchange_value=*/disp_hdr, > - // /*where=*/oop, > - // /*result=*/tmp, > - // cont, > - // /*cas_failed*/NULL); > assert(oopDesc::mark_offset_in_bytes() == 0, "offset of _mark is not 0"); > > __ bind(cas_failed); > @@ -3626,7 +3594,7 @@ > __ cbnz(rscratch1, cont); > // need a release store here > __ lea(tmp, Address(tmp, ObjectMonitor::owner_offset_in_bytes())); > - __ stlr(rscratch1, tmp); // rscratch1 is zero > + __ stlr(zr, tmp); // set unowned > > __ bind(cont); > // flag == EQ indicates success > > >> -----Original Message----- >> From: aarch64-port-dev On >> Behalf Of Nick Gasson (Arm Technology China) >> Sent: Tuesday, January 08, 2019 3:04 AM >> To: hotspot-compiler-dev at openjdk.java.net compiler > dev at openjdk.java.net> >> Cc: nd ; aarch64-port-dev at openjdk.java.net >> Subject: [EXT] [aarch64-port-dev ] RFR: 8216350: AArch64: monitor unlock >> fast path not called >> >> ---------------------------------------------------------------------- >> Hi, >> >> While looking at the profiling output of some micro-benchmarks for locking >> on AArch64, I noticed that the monitor unlock fast-path in >> aarch64_enc_fast_unlock in aarch64.ad (under label `object_has_monitor') is >> almost never executed, even though the lock in the test is inflated. >> >> In order to branch to this fast-path we check if bit #1 is set in the displaced >> header word on the stack: >> >> __ tbnz(disp_hdr, exact_log2(markOopDesc::monitor_value), >> object_has_monitor); >> >> But in the common case the value in the dhw is set by the monitor locking >> fast-path in aarch64_enc_fast_lock, where we use the pointer to the dhw as >> an arbitrary non-null value. But the lower three bits of this pointer will >> always be zero, and so won't trigger the unlock fast-path which is looking for >> bit #1 set, and we will fall through to call the runtime to unlock the monitor. >> >> // store a non-null value into the box. >> __ str(box, Address(box, BasicLock::displaced_header_offset_in_bytes())); >> >> It seems that the unlock fast-path will only be executed when the monitor >> was originally locked by the runtime (e.g. when the lock was first inflated), >> because ObjectSynchronizer::slow_enter will store >> markOopDesc::unused_mark into the dhw, and this value has bit #1 set. >> >> Can someone help me review this change to aarch64_enc_fast_lock to use >> markOopDesc::unused_mark as the arbitrary non-null value rather than `box' >> to match ObjectSynchronizer::slow_enter? >> >> Webrev: http://cr.openjdk.java.net/~njian/8216350/webrev.0/ >> Bug: https://bugs.openjdk.java.net/browse/JDK-8216350 >> >> Also removed an unnecessary double branch in the unlock code. >> >> Ran jtreg + jcstress. >> >> I also added a new micro-benchmark to >> test/micro/org/openjdk/bench/vm/lang/LockUnlock.java so you can see this >> behaviour: >> >> Without patch: >> >> Result "org.openjdk.bench.vm.lang.LockUnlock.testContendedLock": >> 597.855 ?(99.9%) 73.183 ns/op [Average] >> (min, avg, max) = (438.862, 597.855, 861.028), stdev = 97.697 >> CI (99.9%): [524.672, 671.038] (assumes normal distribution) >> >> With patch: >> >> Result "org.openjdk.bench.vm.lang.LockUnlock.testContendedLock": >> 219.067 ?(99.9%) 21.146 ns/op [Average] >> (min, avg, max) = (176.379, 219.067, 300.186), stdev = 28.229 >> CI (99.9%): [197.921, 240.212] (assumes normal distribution) >> >> This is with -XX:+UseLSE, -UseLSE has a similar improvement. >> >> Thanks, >> Nick From Pengfei.Li at arm.com Wed Jan 9 06:50:35 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Wed, 9 Jan 2019 06:50:35 +0000 Subject: [aarch64-port-dev ] RFR(S): 8214922: Add vectorization support for fmin/fmax In-Reply-To: References: <87d0pv2iow.fsf@redhat.com> <877eg32bzq.fsf@redhat.com> <871s6a3map.fsf@redhat.com> <87va371n6b.fsf@redhat.com> Message-ID: Hi Andrew, Do you have further comments on my 2nd min/max vectorization patch? > > > http://cr.openjdk.java.net/~pli/rfr/8214922/webrev.01/ > > -- Thanks, Pengfei From aph at redhat.com Wed Jan 9 09:23:32 2019 From: aph at redhat.com (Andrew Haley) Date: Wed, 9 Jan 2019 09:23:32 +0000 Subject: [aarch64-port-dev ] RFR: 8216350: AArch64: monitor unlock fast path not called In-Reply-To: <680089e7-ec26-a4cd-6143-4d36182e971a@arm.com> References: <680089e7-ec26-a4cd-6143-4d36182e971a@arm.com> Message-ID: <51023960-0e8f-56aa-20a4-279017251585@redhat.com> On 1/9/19 2:50 AM, Nick Gasson (Arm Technology China) wrote: > I agree all of these are good, especially #3 which obscures the > symmetry between the lock and unlock functions. But I think we ought > to create a separate patch, to separate code cleanup with no > functional change from this patch which is a bug fix / functional > change? HotSpot policy is that we can do minor cleanups as we go along: experience has shown that unless you do so, cruft tends to accumulate. These cleanups are OK for this patch. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From felix.yang at huawei.com Wed Jan 9 09:29:00 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 9 Jan 2019 09:29:00 +0000 Subject: [aarch64-port-dev ] RFR: 8216350: AArch64: monitor unlock fast path not called In-Reply-To: <19afbcdf-6d69-88ca-1794-c03e8e81f171@arm.com> References: <19afbcdf-6d69-88ca-1794-c03e8e81f171@arm.com> Message-ID: > Webrev: http://cr.openjdk.java.net/~njian/8216350/webrev.0/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8216350 > > Also removed an unnecessary double branch in the unlock code. > > Ran jtreg + jcstress. > Hi, I think the Copyright year for this file also needs to be updated as you changed it : src/hotspot/cpu/aarch64/aarch64.ad Otherwise, LGTM(Not a Reviewer) Thanks, Felix From tobias.hartmann at oracle.com Wed Jan 9 09:33:08 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 9 Jan 2019 10:33:08 +0100 Subject: RFR (S) 8216375: Revert JDK-8145579 after JDK-8076988 is resolved In-Reply-To: <916836c2-7a25-76ff-9fca-3ed0547a15c7@oracle.com> References: <916836c2-7a25-76ff-9fca-3ed0547a15c7@oracle.com> Message-ID: <4e6e41f0-7a8c-e42e-033c-4928f89ae79a@oracle.com> Hi Eric, looks good. Best regards, Tobias On 08.01.19 20:27, Eric Caspole wrote: > Hi everybody, > Could I get reviews on this small change. As Dean suggested, now that JDK-8076988 to simplify the > trivial method check is done, the change of JDK-8145579 is no longer needed, so this webrev reverts it. > > This passed tier1 and tier2 testing. > Thanks, > Eric > > JBS: > https://bugs.openjdk.java.net/browse/JDK-8216375 > > Webrev: > http://cr.openjdk.java.net/~ecaspole/JDK-8216375/01/webrev/ From Nick.Gasson at arm.com Wed Jan 9 09:40:56 2019 From: Nick.Gasson at arm.com (Nick Gasson (Arm Technology China)) Date: Wed, 9 Jan 2019 09:40:56 +0000 Subject: [aarch64-port-dev ] RFR: 8216350: AArch64: monitor unlock fast path not called In-Reply-To: <51023960-0e8f-56aa-20a4-279017251585@redhat.com> References: <680089e7-ec26-a4cd-6143-4d36182e971a@arm.com> <51023960-0e8f-56aa-20a4-279017251585@redhat.com> Message-ID: <5edeae5f-972f-72a3-8589-b72180f67949@arm.com> Hi Andrew, On 09/01/2019 17:23, Andrew Haley wrote: > HotSpot policy is that we can do minor cleanups as we go along: > experience has shown that unless you do so, cruft tends to > accumulate. These cleanups are OK for this patch. > Sure. I'll test with the cleanups and send the updated webrev tomorrow. Thanks, Nick From rwestrel at redhat.com Wed Jan 9 09:59:51 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 09 Jan 2019 10:59:51 +0100 Subject: RFR(M): 8216135: C2 assert(!had_error) failed: bad dominance Message-ID: <87ef9m178o.fsf@redhat.com> http://cr.openjdk.java.net/~roland/8216135/webrev.00/ Range check elimination is applied to a loop and then the loop is unrolled. After the loop is unrolled, the range of values for the induction variable conflicts with a range check CastII (the loop is over unrolled and the main loop would never be executed), the CastII's value becomes top, a data path dies but the corresponding control path is kept alive. This results in a broken graph. This scenario is supposed to be caught by the skeleton predicates added by 8193130 but it's not for 2 reasons: 1- With 8203915 & 8205033, Tobias extended skeleton predicates to cover not only the first value of the induction variable of the first loop iteration but also the last value of an unrolled loop. But his changes only apply to loop predicates, not range check elimination. 2- With 8203915 & 8205033, Tobias used an Opaque1 node as a place holder so on each unrolling, he could update the skeleton predicate with the new stride. The problem is that the Opaque1 node blocks type propagation and the skeleton predicate only has a chance to remove a dead main loop after loop opts are over. In the case of this bug, the CastII becomes dead before loop opts are finished. The problem with 2- is that if the Opaque1 node is not added, on the next unrolling there's no way to find what predicate and what part of the predicate to update. The fix I propose, is to keep 3 predicates after the first unrolling: 1 for the first value of the first iteration 1 for the last value of the last iteration, without an Opaque1 node 1 with an Opaque1 node that can be used as a template On the next unrolling pass, the 1st and 2nd predicates above could have been optimized out. Rather than try to locate and update the 2nd predicate, the 1st and 2nd predicates are removed if they are found and, once the code finds the 3rd predicate, it clones it once to produce the check on the first value again and a second time to produce an updated check on the new last value. Roland. From rkennke at redhat.com Wed Jan 9 12:13:22 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 9 Jan 2019 13:13:22 +0100 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions Message-ID: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> While poking around x86_64.ad's cmovP instructions (because I needed it for an experiment in Shenandoah), I noticed that 2 of them are disabled/commented-out: cmovP_mem and cmovP_memU. This means that a cmovp with a 2nd argument that is a LoadP will generate two instructions: mov %r1, $mem cmov %r2, %1 instead of just one: cmov %r2, $mem The comment there says that adlc doesn't compute the bottom-type correctly, and that implicit null-checking is broken, but I couldn't confirm either of those. I checked hg annotate, but the commented-out block stems from revision #1 and cannot be traced to a bug or so. I did notice a bug though: the two instructions would encode to cmov to 32bit register instead to 64bit register. I added the missing REX_reg_reg_wide(dst, src) and now everything seems to work fine and generated code looks better. I cannot say if if this has performance implication. I suspect not. If it has, it's probably miniscule improvement. I can't see how it could be worse though. http://cr.openjdk.java.net/~rkennke/JDK-8216392/webrev.00/ Testing: tier1 (hotspot/jdk/langtools) passes on linux-x86 WDYT? Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From nils.eliasson at oracle.com Wed Jan 9 12:31:57 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 9 Jan 2019 13:31:57 +0100 Subject: [12] RFR(XS): 8215755: ZGC: split_barrier_thru_phi: check number of inputs of phi Message-ID: Hi, This fix adds a check of number of inputs before a check of a specific input. Bug: https://bugs.openjdk.java.net/browse/JDK-8215755 Webrev: http://cr.openjdk.java.net/~neliasso/8215755 Please review, Nils From per.liden at oracle.com Wed Jan 9 13:40:13 2019 From: per.liden at oracle.com (Per Liden) Date: Wed, 9 Jan 2019 14:40:13 +0100 Subject: [12] RFR(XS): 8215755: ZGC: split_barrier_thru_phi: check number of inputs of phi In-Reply-To: References: Message-ID: <92147126-c7cf-bb23-d37d-063b2d2461aa@oracle.com> Looks good! /Per On 2019-01-09 13:31, Nils Eliasson wrote: > Hi, > > This fix adds a check of number of inputs before a check of a specific > input. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8215755 > > Webrev: http://cr.openjdk.java.net/~neliasso/8215755 > > Please review, > > Nils > > From tobias.hartmann at oracle.com Wed Jan 9 13:59:00 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 9 Jan 2019 14:59:00 +0100 Subject: [12] RFR(XS): 8215755: ZGC: split_barrier_thru_phi: check number of inputs of phi In-Reply-To: References: Message-ID: <780f64cb-e0c9-516b-3d53-29b527718a97@oracle.com> Hi Nils, looks good and trivial. Best regards, Tobias On 09.01.19 13:31, Nils Eliasson wrote: > Hi, > > This fix adds a check of number of inputs before a check of a specific input. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8215755 > > Webrev: http://cr.openjdk.java.net/~neliasso/8215755 > > Please review, > > Nils > > From nils.eliasson at oracle.com Wed Jan 9 14:05:08 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 9 Jan 2019 15:05:08 +0100 Subject: [12] RFR(XS): 8215755: ZGC: split_barrier_thru_phi: check number of inputs of phi In-Reply-To: <780f64cb-e0c9-516b-3d53-29b527718a97@oracle.com> References: <780f64cb-e0c9-516b-3d53-29b527718a97@oracle.com> Message-ID: <7be32373-7275-6adc-b7f1-04e2c77f3341@oracle.com> Thanks Per and Tobias! // Nils On 2019-01-09 14:59, Tobias Hartmann wrote: > Hi Nils, > > looks good and trivial. > > Best regards, > Tobias > > On 09.01.19 13:31, Nils Eliasson wrote: >> Hi, >> >> This fix adds a check of number of inputs before a check of a specific input. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8215755 >> >> Webrev: http://cr.openjdk.java.net/~neliasso/8215755 >> >> Please review, >> >> Nils >> >> From claes.redestad at oracle.com Wed Jan 9 14:20:25 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 9 Jan 2019 15:20:25 +0100 Subject: RFR (trivial): 8216423: Remove FillDelaySlots Message-ID: <40c7616a-9daf-975f-2dfd-b0ab9802f950@oracle.com> Hi, remove unused and unimplemented flag FillDelaySlots (not to be confused with LIRFillDelaySlots). Bug: https://bugs.openjdk.java.net/browse/JDK-8216423 Patch: diff -r 48d09a9f4d2b src/hotspot/share/runtime/globals.hpp --- a/src/hotspot/share/runtime/globals.hpp Tue Jan 08 10:29:02 2019 -0500 +++ b/src/hotspot/share/runtime/globals.hpp Wed Jan 09 15:09:26 2019 +0100 @@ -1330,9 +1330,6 @@ develop(bool, TypeProfileCasts, true, \ "treat casts like calls for purposes of type profiling") \ \ - develop(bool, FillDelaySlots, true, \ - "Fill delay slots (on SPARC only)") \ - \ develop(bool, TimeLivenessAnalysis, false, \ "Time computation of bytecode liveness analysis") \ \ Thanks! /Claes From tobias.hartmann at oracle.com Wed Jan 9 14:17:58 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 9 Jan 2019 15:17:58 +0100 Subject: RFR (trivial): 8216423: Remove FillDelaySlots In-Reply-To: <40c7616a-9daf-975f-2dfd-b0ab9802f950@oracle.com> References: <40c7616a-9daf-975f-2dfd-b0ab9802f950@oracle.com> Message-ID: <6ae8eeec-eceb-c2eb-42f7-d6d396362ec3@oracle.com> Hi Claes, looks good. Best regards, Tobias On 09.01.19 15:20, Claes Redestad wrote: > Hi, > > remove unused and unimplemented flag FillDelaySlots (not to be > confused with LIRFillDelaySlots). > > Bug:?? https://bugs.openjdk.java.net/browse/JDK-8216423 > Patch: > > diff -r 48d09a9f4d2b src/hotspot/share/runtime/globals.hpp > --- a/src/hotspot/share/runtime/globals.hpp??? Tue Jan 08 10:29:02 2019 -0500 > +++ b/src/hotspot/share/runtime/globals.hpp??? Wed Jan 09 15:09:26 2019 +0100 > @@ -1330,9 +1330,6 @@ > ?? develop(bool, TypeProfileCasts,? true, ???? \ > ?????????? "treat casts like calls for purposes of type profiling") ???? \ > > ???? \ > -? develop(bool, FillDelaySlots, true, ??? \ > -????????? "Fill delay slots (on SPARC only)") ??? \ > - ??? \ > ?? develop(bool, TimeLivenessAnalysis, false, ???? \ > ?????????? "Time computation of bytecode liveness analysis") ???? \ > > ???? \ > > > Thanks! > > /Claes From claes.redestad at oracle.com Wed Jan 9 14:41:00 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 9 Jan 2019 15:41:00 +0100 Subject: RFR (trivial): 8216423: Remove FillDelaySlots In-Reply-To: <6ae8eeec-eceb-c2eb-42f7-d6d396362ec3@oracle.com> References: <40c7616a-9daf-975f-2dfd-b0ab9802f950@oracle.com> <6ae8eeec-eceb-c2eb-42f7-d6d396362ec3@oracle.com> Message-ID: Thanks, Tobias! /Claes On 2019-01-09 15:17, Tobias Hartmann wrote: > Hi Claes, > > looks good. > > Best regards, > Tobias > > On 09.01.19 15:20, Claes Redestad wrote: >> Hi, >> >> remove unused and unimplemented flag FillDelaySlots (not to be >> confused with LIRFillDelaySlots). >> >> Bug:?? https://bugs.openjdk.java.net/browse/JDK-8216423 >> Patch: >> >> diff -r 48d09a9f4d2b src/hotspot/share/runtime/globals.hpp >> --- a/src/hotspot/share/runtime/globals.hpp??? Tue Jan 08 10:29:02 2019 -0500 >> +++ b/src/hotspot/share/runtime/globals.hpp??? Wed Jan 09 15:09:26 2019 +0100 >> @@ -1330,9 +1330,6 @@ >> ?? develop(bool, TypeProfileCasts,? true, ???? \ >> ?????????? "treat casts like calls for purposes of type profiling") ???? \ >> >> ???? \ >> -? develop(bool, FillDelaySlots, true, ??? \ >> -????????? "Fill delay slots (on SPARC only)") ??? \ >> - ??? \ >> ?? develop(bool, TimeLivenessAnalysis, false, ???? \ >> ?????????? "Time computation of bytecode liveness analysis") ???? \ >> >> ???? \ >> >> >> Thanks! >> >> /Claes From dmitrij.pochepko at bell-sw.com Wed Jan 9 14:50:54 2019 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Wed, 9 Jan 2019 17:50:54 +0300 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: References: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> Message-ID: <07582b62-ccdf-97c8-5bd9-f441b488fa03@bell-sw.com> Hi all, here is my version of this patch consisting of single "sub" instruction (I haven't changed test): http://cr.openjdk.java.net/~dpochepk/8215792/webrev.01/ cnt2 is a counter for characters yet to be checked. So, instead of checking all characters in source string for first character match (which was initial reason for this bug), now it check (str2len - str1len + 1). Actually I think this "sub" instruction was initially lost while working on this instrinsic and moving this instruction between this block (generate_string_indexof_linear) and caller code. Regular tests couldn't catch this problem. I run some testing to ensure regular usecases are not affected and it seems fine. Affected testcase and your test pass as well. btw: now this code is even faster, because less characters will be loaded and checked Thanks, Dmitrij On 04/01/2019 3:52 PM, Dmitrij Pochepko wrote: > Sure. > > I could miss something, so, need to try it. I'll send webrev with > patch once it's done. > > > Thanks, > > Dmitrij > > > On 04.01.2019 14:04, Pengfei Li (Arm Technology China) wrote: >> Hi Dmitrij, >> >> Thanks a lot for your reply. >> >>> since cnt2 is used as counter, wouldn't it be easier and shorter >>> just to substract cnt1 from cnt2 at the beginning of this code. >>> Total (cnt2 - cnt1 +1) combinations must be checked. That is why >>> first sustraction is by (wordSize/str2_chr_size - 1). >>> Then whole fix will be probably just 1 line at the beginning: >>> sub(cnt2, cnt2, cnt1); >> I don't think the whole fix could be as easy as "sub(cnt2, cnt2, >> cnt1)" because cnt2 is the counter which counts number of bytes not >> processed. It could be different from the number of bytes after >> current first-character-match index. >> >> But this is just my thought. Perhaps I didn't understand your idea >> and code thoroughly. So could you post your shorter fix and let's >> test if it's right? >> >> -- >> Thanks, >> Pengfei >> > From claes.redestad at oracle.com Wed Jan 9 15:10:40 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 9 Jan 2019 16:10:40 +0100 Subject: RFR: 8216424: Remove or clean up TimeLivenessAnalysis Message-ID: <9ab53da1-d6d8-d5f8-10b6-da960444aa6c@oracle.com> Hi, implementation for the develop flag TimeLivenessAnalysis leaves a few breadcrumbs in product builds (in particular TraceTime constructors/destructors aren't being inlined, so the compiler doesn't realize these objects aren't actually doing anything) Bug: https://bugs.openjdk.java.net/browse/JDK-8216424 This should be either cleaned up: http://cr.openjdk.java.net/~redestad/8216424/cleanup.00/ .. or the flag should be removed altogether: http://cr.openjdk.java.net/~redestad/8216424/remove.00/ I favor removal since the statistics collected by this analysis does not seem very useful and any real performance effect could/should be estimated using real profiling tools on product builds, anyhow. Thanks! /Claes From adinn at redhat.com Wed Jan 9 15:55:59 2019 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 9 Jan 2019 15:55:59 +0000 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: <07582b62-ccdf-97c8-5bd9-f441b488fa03@bell-sw.com> References: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> <07582b62-ccdf-97c8-5bd9-f441b488fa03@bell-sw.com> Message-ID: <79558b49-6375-f0d4-1278-f66a4f470b13@redhat.com> Hello Dmitrij, On 09/01/2019 14:50, Dmitrij Pochepko wrote: > here is my version of this patch consisting of single "sub" instruction > (I haven't changed test): > http://cr.openjdk.java.net/~dpochepk/8215792/webrev.01/ > > cnt2 is a counter for characters yet to be checked. So, instead of > checking all characters in source string for first character match > (which was initial reason for this bug), now it check (str2len - str1len > + 1). That looks like a simpler fix than Pengfei's although I think his is also correct. However, when I say 'correct' note that I can only make that judgement relative to this current bug. I have no confidence that there are no other bugs in your code. > Actually I think this "sub" instruction was initially lost while working > on this instrinsic and moving this instruction between this block > (generate_string_indexof_linear) and caller code. Regular tests couldn't > catch this problem. That's a somewhat contentious and, I would suggest, dubious statement. If you design code based on some algorithm -- especially a complex one like the one employed here -- then you need to put at least as much work into designing tests that check for problems in the encoding of that algorithm as you put into the code. 'Regular' is rather a weasel word to use at this point when it is clear that the test provision was not adequate. Having looked at your code I am at a loss to see how it is accurately described by the piece of C code -- i.e. the original Boyer-Moore algorithm -- that sits in macroAssembler_aarch64.cpp and purports to explain it. As happened with the trig/log code, your code actually follows an algorithm that is significantly more complex that that C original. Also, once again, it employs various coding tricks that are not explained at all. The latter can be understood with study but proper commenting would make maintenance and bug-fixing much easier and quicker. This is exactly the same problem and just as major a problem as it was with the trig/log code for *all* the same reasons. > I run some testing to ensure regular usecases are not affected and it > seems fine. Affected testcase and your test pass as well. 'some testing'? I'd really like to have full details of those tests. Ideally, they should be comprehensive. That really means they should come with a test plan that identifies all the different possible paths through the code and provides a measure of the coverage the tests actually provide that is high enough to instil some confidence in the testing. There are indeed quite a few such paths (not just in the stubs but also the intrinsics that cover the small cases) so I would expect the test plan and test suite to be fairly large. Do you have such a test plan and suite? Given your previous lack of success at testing your own code I'm not at all happy to accept your say so that 'oh, the code is fine'. I'm currently more inclined to ask you to revert your first patch and go back to the original Boyer-Moore code we had before you injected this bug (and who knows what others?). > btw: now this code is even faster, because less characters will be > loaded and checked Well, of course, you could make it even faster by deleting half the code. If you don't place too much priority on correctness you can achieve incredible performance. Unfortunately, speed has to be secondary to correctness. So, you need to stop concentrating on shaving cycles and concentrate on writing readable, maintainable code that clearly implements a well-defined algorithm. Can you provide any credible assurance that this code is worth keeping? If not then I'd personally recommend reversion of all your changes. Of course, I'll see what Andrew Haley has to say before pressing for that action. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From adinn at redhat.com Wed Jan 9 16:02:08 2019 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 9 Jan 2019 16:02:08 +0000 Subject: [aarch64-port-dev ] RFR(S): 8214922: Add vectorization support for fmin/fmax In-Reply-To: References: <87d0pv2iow.fsf@redhat.com> <877eg32bzq.fsf@redhat.com> <871s6a3map.fsf@redhat.com> <87va371n6b.fsf@redhat.com> Message-ID: <40d1a9a7-47f3-4e13-032d-70932b03d215@redhat.com> Hi Pengfei, On 09/01/2019 06:50, Pengfei Li (Arm Technology China) wrote: > Hi Andrew, > > Do you have further comments on my 2nd min/max vectorization patch? > >>>> http://cr.openjdk.java.net/~pli/rfr/8214922/webrev.01/ I am ok with this version of the patch. If the use of the max/min2F rules doesn't cause any regressions on all the architectures you tested then it is probably ok to push it. However, that said, I'm not clear what you mean by one comment: "BTW: I'm also struggling to find a simple JMH case which can trigger reduction auto-vectorization." Do you mean that you have not been able to exercise the reduction code at all? Or is it just that you cannot get it to work in a JMH test? Obviously, it would be better if we would provide a JMH test that does work. I'll see if I can provide a test. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From tobias.hartmann at oracle.com Wed Jan 9 16:54:17 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 9 Jan 2019 17:54:17 +0100 Subject: RFR: 8216424: Remove or clean up TimeLivenessAnalysis In-Reply-To: <9ab53da1-d6d8-d5f8-10b6-da960444aa6c@oracle.com> References: <9ab53da1-d6d8-d5f8-10b6-da960444aa6c@oracle.com> Message-ID: Hi Claes, Both webrevs look good to me but I would prefer removal as well. I haven't ever seen anyone using that flag but let's wait for more opinions. Best regards, Tobias On 09.01.19 16:10, Claes Redestad wrote: > Hi, > > implementation for the develop flag TimeLivenessAnalysis leaves a few > breadcrumbs in product builds (in particular TraceTime > constructors/destructors aren't being inlined, so the compiler doesn't > realize these objects aren't actually doing anything) > > Bug: https://bugs.openjdk.java.net/browse/JDK-8216424 > > This should be either cleaned up: > http://cr.openjdk.java.net/~redestad/8216424/cleanup.00/ > > .. or the flag should be removed altogether: > http://cr.openjdk.java.net/~redestad/8216424/remove.00/ > > I favor removal since the statistics collected by this analysis does > not seem very useful and any real performance effect could/should be > estimated using real profiling tools on product builds, anyhow. > > Thanks! > > /Claes From claes.redestad at oracle.com Wed Jan 9 17:33:17 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 9 Jan 2019 18:33:17 +0100 Subject: RFR: 8216424: Remove or clean up TimeLivenessAnalysis In-Reply-To: References: <9ab53da1-d6d8-d5f8-10b6-da960444aa6c@oracle.com> Message-ID: Hi Tobias, On 2019-01-09 17:54, Tobias Hartmann wrote: > Hi Claes, > > Both webrevs look good to me but I would prefer removal as well. I haven't ever seen anyone using > that flag but let's wait for more opinions. thanks, and your vote in favor of removal has been noted. I'll wait a day or two for other opinions. :-) /Claes From igor.ignatyev at oracle.com Wed Jan 9 21:48:15 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 9 Jan 2019 13:48:15 -0800 Subject: RFR(T) : 8216441 : problem list org.graalvm.compiler.hotspot.test.ExplicitExceptionTest Message-ID: http://cr.openjdk.java.net/~iignatyev//8216441/webrev.00/index.html > 2 lines changed: 2 ins; 0 del; 0 mod; Hi all, could you please review this tiny and trivial patch which problem list graal unit test ExplicitExceptionTest till 8213249 is fixed? webrev: http://cr.openjdk.java.net/~iignatyev//8216441/webrev.00/index.html JBS: https://bugs.openjdk.java.net/browse/JDK-8216441 Thanks, -- Igor From david.holmes at oracle.com Thu Jan 10 01:07:36 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 10 Jan 2019 11:07:36 +1000 Subject: RFR(T) : 8216441 : problem list org.graalvm.compiler.hotspot.test.ExplicitExceptionTest In-Reply-To: References: Message-ID: Hi Igor, + org.graalvm.compiler.hotspot.test.ExplicitExceptionTest 8216441 The bug id should be the bug that will fix the underlying problem (8213249), not the bug used to update the problem-list. Thanks, David > Hi all, > > could you please review this tiny and trivial patch which problem list graal unit test ExplicitExceptionTest till 8213249 is fixed? > > webrev: http://cr.openjdk.java.net/~iignatyev//8216441/webrev.00/index.html > JBS: https://bugs.openjdk.java.net/browse/JDK-8216441 From igor.ignatyev at oracle.com Thu Jan 10 01:13:02 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 9 Jan 2019 17:13:02 -0800 Subject: RFR(T) : 8216441 : problem list org.graalvm.compiler.hotspot.test.ExplicitExceptionTest In-Reply-To: References: Message-ID: <648BF524-C3AE-4E6B-95F4-DA9DB8FFEB4A@oracle.com> HI David, thanks for spotting that, apparently I copy-pasted the wrong id. corrected and pushed. -- Igor > On Jan 9, 2019, at 5:07 PM, David Holmes wrote: > > Hi Igor, > > + org.graalvm.compiler.hotspot.test.ExplicitExceptionTest 8216441 > > The bug id should be the bug that will fix the underlying problem (8213249), not the bug used to update the problem-list. > > Thanks, > David > >> Hi all, >> could you please review this tiny and trivial patch which problem list graal unit test ExplicitExceptionTest till 8213249 is fixed? >> webrev: http://cr.openjdk.java.net/~iignatyev//8216441/webrev.00/index.html >> JBS: https://bugs.openjdk.java.net/browse/JDK-8216441 From aph at redhat.com Thu Jan 10 09:18:10 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 10 Jan 2019 09:18:10 +0000 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions In-Reply-To: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> References: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> Message-ID: On 1/9/19 12:13 PM, Roman Kennke wrote: > I cannot say if if this has performance implication. I suspect not. If > it has, it's probably miniscule improvement. I can't see how it could be > worse though. I can. x86 can have some very weird performance characteristics. It'd be helpful to do some measurement. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Thu Jan 10 09:45:40 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 10 Jan 2019 10:45:40 +0100 Subject: RFR(M): 8216135: C2 assert(!had_error) failed: bad dominance In-Reply-To: <87ef9m178o.fsf@redhat.com> References: <87ef9m178o.fsf@redhat.com> Message-ID: <7a2054ff-8bc8-7a5f-3233-4e45a3a577f8@oracle.com> Hi Roland, Nice analysis! Still took me a while to review but the fix looks good (I've also executed some extended testing and all passed). Let's hope this skeleton predicate stuff is finally stable. A 2nd review would be good. Best regards, Tobias From Pengfei.Li at arm.com Thu Jan 10 10:53:48 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Thu, 10 Jan 2019 10:53:48 +0000 Subject: [aarch64-port-dev ] RFR(S): 8214922: Add vectorization support for fmin/fmax In-Reply-To: <40d1a9a7-47f3-4e13-032d-70932b03d215@redhat.com> References: <87d0pv2iow.fsf@redhat.com> <877eg32bzq.fsf@redhat.com> <871s6a3map.fsf@redhat.com> <87va371n6b.fsf@redhat.com> <40d1a9a7-47f3-4e13-032d-70932b03d215@redhat.com> Message-ID: Hi adinn, roland, Sorry that I uploaded a new webrev for this today because I found that I made a mistake hidden in vectornode.cpp. http://cr.openjdk.java.net/~pli/rfr/8214922/webrev.02/ The reduction min/max ops do not correspond to the original ones correctly in below part of code. + case Op_MinF: + assert(bt == T_FLOAT, "must be"); + vopc = Op_MinReductionV; + break; + case Op_MinD: + assert(bt == T_DOUBLE, "must be"); + vopc = Op_MaxReductionV; + break; + case Op_MaxF: + assert(bt == T_FLOAT, "must be"); + vopc = Op_MinReductionV; + break; + case Op_MaxD: + assert(bt == T_DOUBLE, "must be"); + vopc = Op_MaxReductionV; + break; I've fixed it in my 3rd webrev. So could you help review it again? And for Andrew Dinn's question: > Do you mean that you have not been able to exercise the reduction code at > all? Or is it just that you cannot get it to work in a JMH test? > > Obviously, it would be better if we would provide a JMH test that does work. > I'll see if I can provide a test. I mean that I tried it hard and finally find one that works. As Vladimir Ivanov said the simple reduction auto-vectorization is disabled in current JDK, so we have to construct that in a more complex code shape. Below code in my previous uploaded JMH case[1] could generate the min/max reduction instructions. for (int i = 0; i < LENGTH; i++) { min = Math.min(min, fa[i] + fb[i]); } Part of disassembly outputted by JMH perfasm is like below. 0x0000ffff9cca1650: fminv s18, v19.4s 0x0000ffff9cca1654: fmin s18, s18, s16 0x0000ffff9cca1658: fminv s19, v20.4s 0x0000ffff9cca165c: fmin s19, s19, s18 0x0000ffff9cca1660: fminv s16, v22.4s 0x0000ffff9cca1664: fmin s16, s16, s19 0x0000ffff9cca1668: fminv s19, v21.4s 0x0000ffff9cca166c: fmin s19, s19, s16 [1] http://cr.openjdk.java.net/~pli/rfr/8214922/TestSIMDFpMinMax.java -- Thanks, Pengfei From rwestrel at redhat.com Thu Jan 10 11:27:25 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 10 Jan 2019 12:27:25 +0100 Subject: RFR(M): 8216135: C2 assert(!had_error) failed: bad dominance In-Reply-To: <7a2054ff-8bc8-7a5f-3233-4e45a3a577f8@oracle.com> References: <87ef9m178o.fsf@redhat.com> <7a2054ff-8bc8-7a5f-3233-4e45a3a577f8@oracle.com> Message-ID: <87tvigzraa.fsf@redhat.com> Hi Tobias, > Nice analysis! Still took me a while to review but the fix looks good (I've also executed some > extended testing and all passed). Let's hope this skeleton predicate stuff is finally stable. Thanks for the review. FTR, I will also need to undo: http://hg.openjdk.java.net/jdk/jdk12/rev/ea921dca7f33 when I push this. Roland. From tobias.hartmann at oracle.com Thu Jan 10 11:30:10 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 10 Jan 2019 12:30:10 +0100 Subject: RFR(M): 8216135: C2 assert(!had_error) failed: bad dominance In-Reply-To: <87tvigzraa.fsf@redhat.com> References: <87ef9m178o.fsf@redhat.com> <7a2054ff-8bc8-7a5f-3233-4e45a3a577f8@oracle.com> <87tvigzraa.fsf@redhat.com> Message-ID: <8a5b0525-3218-b6bf-a7ad-7442d98c1a2e@oracle.com> On 10.01.19 12:27, Roland Westrelin wrote: > FTR, I will also need to undo: > > http://hg.openjdk.java.net/jdk/jdk12/rev/ea921dca7f33 > > when I push this. Yes, good catch. Thanks, Tobias From tobias.hartmann at oracle.com Thu Jan 10 11:47:42 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 10 Jan 2019 12:47:42 +0100 Subject: [13] RFR(T): 8216480: Typo in test/hotspot/jtreg/compiler/graalunit/README.md Message-ID: Hi, please review the following trivial patch that fixes a typo in https://bugs.openjdk.java.net/browse/JDK-8216480 http://cr.openjdk.java.net/~thartmann/8216480/webrev.00/ Thanks, Tobias From rwestrel at redhat.com Thu Jan 10 12:59:07 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 10 Jan 2019 13:59:07 +0100 Subject: RFR(T): 8216482: Shenandoah: typo in ShenandoahBarrierSetC2::clone_barrier_at_expansion() causes failed compilations Message-ID: <87o98ozn1g.fsf@redhat.com> This was observed to sometimes hurt performance: http://cr.openjdk.java.net/~roland/8216482/webrev.00/ Roland. From rkennke at redhat.com Thu Jan 10 13:01:46 2019 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 10 Jan 2019 14:01:46 +0100 Subject: RFR(T): 8216482: Shenandoah: typo in ShenandoahBarrierSetC2::clone_barrier_at_expansion() causes failed compilations In-Reply-To: <87o98ozn1g.fsf@redhat.com> References: <87o98ozn1g.fsf@redhat.com> Message-ID: Looks good to me. Also, trivial. Thanks for spotting this! Roman > This was observed to sometimes hurt performance: > > http://cr.openjdk.java.net/~roland/8216482/webrev.00/ > > Roland. > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From shade at redhat.com Thu Jan 10 13:19:07 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 10 Jan 2019 14:19:07 +0100 Subject: RFR(T): 8216482: Shenandoah: typo in ShenandoahBarrierSetC2::clone_barrier_at_expansion() causes failed compilations In-Reply-To: References: <87o98ozn1g.fsf@redhat.com> Message-ID: <15790b53-877f-95f2-b94a-a5b17168f875@redhat.com> Looks good. This can be pushed to jdk/jdk12, I think. -Aleksey On 1/10/19 2:01 PM, Roman Kennke wrote: > Looks good to me. Also, trivial. > > Thanks for spotting this! > > Roman > >> This was observed to sometimes hurt performance: >> >> http://cr.openjdk.java.net/~roland/8216482/webrev.00/ >> >> Roland. >> > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Thu Jan 10 13:17:10 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 10 Jan 2019 14:17:10 +0100 Subject: RFR(T): 8216482: Shenandoah: typo in ShenandoahBarrierSetC2::clone_barrier_at_expansion() causes failed compilations In-Reply-To: <87o98ozn1g.fsf@redhat.com> References: <87o98ozn1g.fsf@redhat.com> Message-ID: <9027c1f5-f562-5abf-6495-645284122da6@oracle.com> Hi Roland, looks good. Best regards, Tobias On 10.01.19 13:59, Roland Westrelin wrote: > > This was observed to sometimes hurt performance: > > http://cr.openjdk.java.net/~roland/8216482/webrev.00/ > > Roland. > From rwestrel at redhat.com Thu Jan 10 13:23:12 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 10 Jan 2019 14:23:12 +0100 Subject: RFR(T): 8216482: Shenandoah: typo in ShenandoahBarrierSetC2::clone_barrier_at_expansion() causes failed compilations In-Reply-To: <9027c1f5-f562-5abf-6495-645284122da6@oracle.com> References: <87o98ozn1g.fsf@redhat.com> <9027c1f5-f562-5abf-6495-645284122da6@oracle.com> Message-ID: <87lg3szlxb.fsf@redhat.com> Thanks for the review, Tobias. Oracle doesn't build Shenandoah, right? so I can push that straight to jdk 12, no need to go through the submit repo? Roland. From rwestrel at redhat.com Thu Jan 10 13:23:31 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 10 Jan 2019 14:23:31 +0100 Subject: RFR(T): 8216482: Shenandoah: typo in ShenandoahBarrierSetC2::clone_barrier_at_expansion() causes failed compilations In-Reply-To: <15790b53-877f-95f2-b94a-a5b17168f875@redhat.com> References: <87o98ozn1g.fsf@redhat.com> <15790b53-877f-95f2-b94a-a5b17168f875@redhat.com> Message-ID: <87imywzlws.fsf@redhat.com> Thank for the reviews, Roman & Aleksey. Roland. From tobias.hartmann at oracle.com Thu Jan 10 13:30:09 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 10 Jan 2019 14:30:09 +0100 Subject: RFR(T): 8216482: Shenandoah: typo in ShenandoahBarrierSetC2::clone_barrier_at_expansion() causes failed compilations In-Reply-To: <87lg3szlxb.fsf@redhat.com> References: <87o98ozn1g.fsf@redhat.com> <9027c1f5-f562-5abf-6495-645284122da6@oracle.com> <87lg3szlxb.fsf@redhat.com> Message-ID: <51c87854-36c0-903e-c647-0c612ac42c5d@oracle.com> On 10.01.19 14:23, Roland Westrelin wrote: > Thanks for the review, Tobias. Oracle doesn't build Shenandoah, right? > so I can push that straight to jdk 12, no need to go through the submit > repo? Shenandoah is build by default, right? And there are some tests that set -XX:+UseShenandoahGC, not sure though if they are executed with the submit repo. Best regards, Tobias From rwestrel at redhat.com Thu Jan 10 13:44:57 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 10 Jan 2019 14:44:57 +0100 Subject: RFR(T): 8216482: Shenandoah: typo in ShenandoahBarrierSetC2::clone_barrier_at_expansion() causes failed compilations In-Reply-To: <51c87854-36c0-903e-c647-0c612ac42c5d@oracle.com> References: <87o98ozn1g.fsf@redhat.com> <9027c1f5-f562-5abf-6495-645284122da6@oracle.com> <87lg3szlxb.fsf@redhat.com> <51c87854-36c0-903e-c647-0c612ac42c5d@oracle.com> Message-ID: <87ftu0zkx2.fsf@redhat.com> > Shenandoah is build by default, right? And there are some tests that set -XX:+UseShenandoahGC, not > sure though if they are executed with the submit repo. Right, but there is this: https://bugs.openjdk.java.net/browse/JDK-8215030 "Disable shenandoah in Oracle builds" Roland. From tobias.hartmann at oracle.com Thu Jan 10 13:48:56 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 10 Jan 2019 14:48:56 +0100 Subject: RFR(T): 8216482: Shenandoah: typo in ShenandoahBarrierSetC2::clone_barrier_at_expansion() causes failed compilations In-Reply-To: <87ftu0zkx2.fsf@redhat.com> References: <87o98ozn1g.fsf@redhat.com> <9027c1f5-f562-5abf-6495-645284122da6@oracle.com> <87lg3szlxb.fsf@redhat.com> <51c87854-36c0-903e-c647-0c612ac42c5d@oracle.com> <87ftu0zkx2.fsf@redhat.com> Message-ID: <76c0f61e-9c2f-8d76-2beb-52ba927ed14c@oracle.com> You are right, we don't build it in our CI. Feel free to push your fix then! Best regards, Tobias On 10.01.19 14:44, Roland Westrelin wrote: > >> Shenandoah is build by default, right? And there are some tests that set -XX:+UseShenandoahGC, not >> sure though if they are executed with the submit repo. > > Right, but there is this: > > https://bugs.openjdk.java.net/browse/JDK-8215030 > "Disable shenandoah in Oracle builds" > > Roland. > From rwestrel at redhat.com Thu Jan 10 13:54:41 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 10 Jan 2019 14:54:41 +0100 Subject: RFR(T): 8216482: Shenandoah: typo in ShenandoahBarrierSetC2::clone_barrier_at_expansion() causes failed compilations In-Reply-To: <76c0f61e-9c2f-8d76-2beb-52ba927ed14c@oracle.com> References: <87o98ozn1g.fsf@redhat.com> <9027c1f5-f562-5abf-6495-645284122da6@oracle.com> <87lg3szlxb.fsf@redhat.com> <51c87854-36c0-903e-c647-0c612ac42c5d@oracle.com> <87ftu0zkx2.fsf@redhat.com> <76c0f61e-9c2f-8d76-2beb-52ba927ed14c@oracle.com> Message-ID: <87d0p4zkgu.fsf@redhat.com> > You are right, we don't build it in our CI. Feel free to push your fix then! Thanks for confirming. Roland. From dmitrij.pochepko at bell-sw.com Thu Jan 10 15:10:55 2019 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Thu, 10 Jan 2019 18:10:55 +0300 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: <79558b49-6375-f0d4-1278-f66a4f470b13@redhat.com> References: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> <07582b62-ccdf-97c8-5bd9-f441b488fa03@bell-sw.com> <79558b49-6375-f0d4-1278-f66a4f470b13@redhat.com> Message-ID: <75d28ca7-9e80-4bd4-11a6-c858048e4380@bell-sw.com> Hi Andrew, I?ll focus on addressing your technical questions about testing this patch and intrinsic first. By 'Regular' in previous email I meant all JCK and current jtreg tests which were also run [1]. This was to highlight the difference with IndexOfTest and IndexOfSameTest tests [2] developed for this intrinsic which was run for the original webrev and this patch. These tests cover all combinations of strings and substring lengths up to a specified length (IndexOfTest uses unique characters as padding and IndexOfSameTest is using first character from pattern to have cases with partial match tested). These tests are parameterized and require source_size parameter as first argument. Calling it with 30 and 300 results in testing all modified indexof algorithms by brute force: A. pattern size = 1: covered if source_size >= 1 B. pattern size = 2: covered if source_size >= 2 C. pattern size = 3: covered if source_size >= 3 D. special case with pattern size = 4: algorithm wasn't changed within the original webrev E. pattern_size in [8, 256) and pattern_size < source_size/4: Boyer-Moore implementation: covered if source_size > 32. This is the algorithm that has the comment you mentioned in your email. F. "pattern_size in [5, 8)" or "pattern_size in [8, 15] and pattern_size >= source_size/4": Simple linear search, which loads and compares char-by-char: covered if source_size in [5, 32] G. This is the one added by me. "pattern_size in [16, 256) and pattern_size >= source_size/4" or pattern_size >= 256. Block linear search (loads data by 8 byte chunks in search of first symbol): covered if source_size >= 16. Below is listing of all branches in algorithm G and coverage test cases with sample parameter values when the branches are taken (test name, test parameter, pattern_size and expected_index inputs used in test during iterations, which leads to each branch taken/not_taken(fallthrough) at least once. Suffix U/L/UL denotes different encoding cases, where U = UTF-16 source string, L = Latin1 source string, UL = both cases). line 4399: __ br(__ LE, L_SMALL);??????????????????? // IndexOfTest with parameter 300: taken(UL): pattern_size=298, expected_index=0. Not taken(UL): pattern_size=250, expected_index=0 line 4410: __ br(__ NE, L_HAS_ZERO);???????????????? // IndexOfTest with parameter 300: taken(UL): pattern_size=250, expected_index=0. Not taken(UL): pattern_size=250, expected_index=-1 line 4414: __ br(__ LT, L_POST_LOOP);??????????????? // IndexOfTest with parameter 300: taken(U):? pattern_size=295, expected_index=5. Not taken(U):? pattern_size=250, expected_index=8 ???????????????????????????????????????????????????? // IndexOfTest with parameter 300: taken(L):? pattern_size=290, expected_index=8. Not taken(L):? pattern_size=250, expected_index=8 line 4421: __ br(__ NE, L_HAS_ZERO);???????????????? // IndexOfTest with parameter 300: taken(U):? pattern_size=250, expected_index=5. Not taken(U):? pattern_size=250, expected_index=8 ???????????????????????????????????????????????????? // IndexOfTest with parameter 300: taken(L):? pattern_size=250, expected_index=8. Not taken(L):? pattern_size=250, expected_index=16 line 4426: __ br(__ GE, L_LOOP);???????????????????? // IndexOfTest with parameter 300: taken(U):? pattern_size=250, expected_index=20. Not taken(U):? pattern_size=290, expected_index=8 ???????????????????????????????????????????????????? // IndexOfTest with parameter 300: taken(L):? pattern_size=250, expected_index=30. Not taken(L):? pattern_size=280, expected_index=16 line 4429: __ br(__ LE, NOMATCH);??????????????????? // IndexOfTest with parameter 300: taken(U):? pattern_size=293, expected_index=-1. Not taken(U):? pattern_size=290, expected_index=-1 ???????????????????????????????????????????????????? // IndexOfTest with parameter 300: taken(L):? pattern_size=285, expected_index=-1. Not taken(L):? pattern_size=280, expected_index=-1 line 4455: __ br(__ EQ, NOMATCH);??????????????????? // IndexOfTest with parameter 300: taken(UL): pattern_size=298, expected_index=-1. Not taken(UL): pattern_size=298, expected_index=0 line 4459: __ br(__ LE, L_SMALL_CMP_LOOP_LAST_CMP2); // this branch is not reached with current performance heuristic for algorithm selection (see MacroAssembler_aarch64.cpp:4599). It was also tested with heuristic disabled to keep algorithm generic and allow changes to heuristics line 4478: __ br(__ NE, L_SMALL_CMP_LOOP_NOMATCH);?? // IndexOfSameTest with parameter 300: taken(UL): pattern_size=298, expected_index=-1. Not taken(UL): pattern_size=298,expected_index=0 line 4486: __ br(__ GE, L_SMALL_CMP_LOOP_LAST_CMP);? // IndexOfTest with parameter 300: taken(UL): pattern_size=298, expected_index=0. Not taken(UL): pattern_size=298, expected_index=-1 line 4488: __ br(__ EQ, L_SMALL_CMP_LOOP);?????????? // IndexOfTest with parameter 300: taken(UL): pattern_size=298, expected_index=0. Not taken(UL): pattern_size=298, expected_index=-1 line 4490: __ cbz(tmp2, NOMATCH);??????????????????? // IndexOfSameTest with parameter 300: taken UL: pattern_size=298, expected_index=-1. Not taken(UL): pattern_size=298, expected_index=0 line 4498: __ br(__ NE, L_SMALL_CMP_LOOP_NOMATCH);?? // IndexOfTest with parameter 300: taken(UL): pattern_size=298, expected_index=-1. Not taken(UL): pattern_size=298, expected_index=0 line 4519: __ br(__ NE, L_SMALL_CMP_LOOP_NOMATCH);?? // this branch is not reached with current performance heuristic for algorithm selection (see MacroAssembler_aarch64.cpp:4599). It was also tested with heuristic disabled to keep algorithm generic and allow changes to heuristics line 4533: __ br(__ GE, L_CMP_LOOP_LAST_CMP2);?????? // this branch is not taken with current performance heuristic for algorithm selection (see MacroAssembler_aarch64.cpp:4599). It was also tested with heuristic disabled to keep algorithm generic and allow changes to heuristics line 4557: __ br(__ NE, L_CMP_LOOP_NOMATCH);???????? // IndexOfTest with parameter 300: taken(UL): pattern_size=250, expected_index=1. Not taken(UL): pattern_size=250, expected_index=0 line 4565: __ br(__ GE, L_CMP_LOOP_LAST_CMP);??????? // IndexOfTest with parameter 300: taken(UL): pattern_size=250, expected_index=0. Not taken(UL): pattern_size=250, expected_index=-1 line 4567: __ br(__ EQ, L_CMP_LOOP);???????????????? // IndexOfTest with parameter 300: taken(UL): pattern_size=250, expected_index=0. Not taken(UL): pattern_size=250, expected_index=-1 line 4570: __ cbz(tmp2, L_HAS_ZERO_LOOP_NOMATCH);??? // IndexOfSameTest with parameter 300: taken(UL): pattern_size=250, expected_index=20. Not taken(UL): pattern_size=250, expected_index=0 line 4577: __ br(__ NE, L_CMP_LOOP_NOMATCH);???????? // IndexOfSameTest with parameter 300: taken(UL): pattern_size=250, expected_index=-1. IndexOfTest with parameter 300: Not taken(UL): pattern_size=250, expected_index=0 line 4601: __ br(__ NE, L_CMP_LOOP_NOMATCH);???????? // this branch is not reached with current performance heuristic for algorithm selection (see MacroAssembler_aarch64.cpp:4599). It was also tested with heuristic disabled to keep algorithm generic and allow changes to heuristics source_size = 0 is covered by a pre-condition and the intrinsic is not called. I referenced this test in initial review request for this intrinsic. It takes a long time to run, so I did not include it in the webrev. I'm going to update the webrev to include a subset of this test as jtreg. Even brute force tests with 100% code coverage don't guarantee 100% correctness. The search-garbage-after-string test case for "algorithm G" and StringBuilder::setLength usage is a good catch by Stefan and Pengfei. And recent webrev addresses this case. I also tested a case symmetric to Pengfei's case checking that no "garbage" is read before specified source string [4]. I also am going to include it in the webrev. Indeed it is hard to review complex algorithms. The Boyer-Moore comments you referenced were updated as part of the original webrev to describe changes in algorithm E, which is in macroAssembler_aarch64.cpp. I once asked to validate the level of comments with you during pow function review [3]. If this is the level of comments you find reasonable, I?ll be happy to improve it here and elsewhere to this level. Once again, this is to address your question around testing for this intrinsic and patch. We are working on testing and review complex intrinsics to handle the wider problem of ensuring better quality of AArch64 intrinsics. We?ll follow up in a different email on that. -Dmitrij [1] all JCK, hotspot jtreg and jdk tier1-tier3 jtreg tests, including http://hg.openjdk.java.net/jdk/jdk/file/tip/test/hotspot/jtreg/compiler/intrinsics and http://hg.openjdk.java.net/jdk/jdk/file/tip/test/jdk/java/lang/String/ [2] http://cr.openjdk.java.net/~dpochepk/8189103/IndexOfTest.java, http://cr.openjdk.java.net/~dpochepk/8189103/IndexOfSameTest.java [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-October/031092.html [4] http://cr.openjdk.java.net/~dpochepk/8215792/IndexOfBeforeTest.java On 09/01/2019 6:55 PM, Andrew Dinn wrote: > Hello Dmitrij, > > On 09/01/2019 14:50, Dmitrij Pochepko wrote: >> here is my version of this patch consisting of single "sub" instruction >> (I haven't changed test): >> http://cr.openjdk.java.net/~dpochepk/8215792/webrev.01/ >> >> cnt2 is a counter for characters yet to be checked. So, instead of >> checking all characters in source string for first character match >> (which was initial reason for this bug), now it check (str2len - str1len >> + 1). > That looks like a simpler fix than Pengfei's although I think his is > also correct. However, when I say 'correct' note that I can only make > that judgement relative to this current bug. I have no confidence that > there are no other bugs in your code. > >> Actually I think this "sub" instruction was initially lost while working >> on this instrinsic and moving this instruction between this block >> (generate_string_indexof_linear) and caller code. Regular tests couldn't >> catch this problem. > That's a somewhat contentious and, I would suggest, dubious statement. > If you design code based on some algorithm -- especially a complex one > like the one employed here -- then you need to put at least as much work > into designing tests that check for problems in the encoding of that > algorithm as you put into the code. 'Regular' is rather a weasel word to > use at this point when it is clear that the test provision was not adequate. > > Having looked at your code I am at a loss to see how it is accurately > described by the piece of C code -- i.e. the original Boyer-Moore > algorithm -- that sits in macroAssembler_aarch64.cpp and purports to > explain it. As happened with the trig/log code, your code actually > follows an algorithm that is significantly more complex that that C > original. Also, once again, it employs various coding tricks that are > not explained at all. The latter can be understood with study but proper > commenting would make maintenance and bug-fixing much easier and > quicker. This is exactly the same problem and just as major a problem as > it was with the trig/log code for *all* the same reasons. > >> I run some testing to ensure regular usecases are not affected and it >> seems fine. Affected testcase and your test pass as well. > 'some testing'? I'd really like to have full details of those tests. > Ideally, they should be comprehensive. That really means they should > come with a test plan that identifies all the different possible paths > through the code and provides a measure of the coverage the tests > actually provide that is high enough to instil some confidence in the > testing. There are indeed quite a few such paths (not just in the stubs > but also the intrinsics that cover the small cases) so I would expect > the test plan and test suite to be fairly large. Do you have such a test > plan and suite? > > Given your previous lack of success at testing your own code I'm not at > all happy to accept your say so that 'oh, the code is fine'. I'm > currently more inclined to ask you to revert your first patch and go > back to the original Boyer-Moore code we had before you injected this > bug (and who knows what others?). > >> btw: now this code is even faster, because less characters will be >> loaded and checked > Well, of course, you could make it even faster by deleting half the > code. If you don't place too much priority on correctness you can > achieve incredible performance. > > Unfortunately, speed has to be secondary to correctness. So, you need to > stop concentrating on shaving cycles and concentrate on writing > readable, maintainable code that clearly implements a well-defined > algorithm. Can you provide any credible assurance that this code is > worth keeping? If not then I'd personally recommend reversion of all > your changes. Of course, I'll see what Andrew Haley has to say before > pressing for that action. > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From doug.simon at oracle.com Thu Jan 10 15:09:36 2019 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 10 Jan 2019 16:09:36 +0100 Subject: [12] RFR(S): 8215313: [AOT] java/lang/String/Split.java fails with AOTed java.base Message-ID: Please review this fix supplied by Josef Haider for an incorrect compilation of String.split. When the String.indexOf intrinsic on AMD64 reaches the end of a string, it tries to vectorize the last compare operations by reading past the bounds of the character/byte array. This is not safe if the out-of-bounds read would cross a page boundary, so in that case characters are compared one-by-one. This is done with a `cmpl`-instruction, which only works as long as the bytes/chars are not sign extended. The fix is to simply `and` the characters we are searching for with `0xff`/`0xffff` in order to eliminate any erroneous sign extensions. http://cr.openjdk.java.net/~dnsimon/8215313 https://bugs.openjdk.java.net/browse/JDK-8215313 -Doug From dean.long at oracle.com Thu Jan 10 18:04:12 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 10 Jan 2019 10:04:12 -0800 Subject: [12] RFR(S): 8215313: [AOT] java/lang/String/Split.java fails with AOTed java.base In-Reply-To: References: Message-ID: Is it OK to modify the values of searchValue[i]?? If the search value is already sign-extended, how about sign-extending cmpResult instead of zero-extending searchValue? dl On 1/10/19 7:09 AM, Doug Simon wrote: > Please review this fix supplied by Josef Haider for an incorrect compilation of String.split. > > When the String.indexOf intrinsic on AMD64 reaches the end of a string, it tries to vectorize the last compare operations by reading past the bounds of the character/byte array. This is not safe if the out-of-bounds read would cross a page boundary, so in that case characters are compared one-by-one. This is done with a `cmpl`-instruction, which only works as long as the bytes/chars are not sign extended. > > The fix is to simply `and` the characters we are searching for with `0xff`/`0xffff` in order to eliminate any erroneous sign extensions. > > http://cr.openjdk.java.net/~dnsimon/8215313 > https://bugs.openjdk.java.net/browse/JDK-8215313 > > -Doug From dean.long at oracle.com Thu Jan 10 18:53:51 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 10 Jan 2019 10:53:51 -0800 Subject: [12] RFR(S): 8215313: [AOT] java/lang/String/Split.java fails with AOTed java.base In-Reply-To: References: Message-ID: Taking another look, it seems like cmpl could be replaced with the size-appropriate cmpb, cmpw, or cmpl based on byteMode(kind) and findTwoCharPrefix, but I guess AMD64Assembler only supports cmpl and cmpq right now. dl On 1/10/19 10:04 AM, dean.long at oracle.com wrote: > Is it OK to modify the values of searchValue[i]?? If the search value > is already sign-extended, how about sign-extending cmpResult instead > of zero-extending searchValue? > > dl > > On 1/10/19 7:09 AM, Doug Simon wrote: >> Please review this fix supplied by Josef Haider for an incorrect >> compilation of String.split. >> >> When the String.indexOf intrinsic on AMD64 reaches the end of a >> string, it tries to vectorize the last compare operations by reading >> past the bounds of the character/byte array. This is not safe if the >> out-of-bounds read would cross a page boundary, so in that case >> characters are compared one-by-one. This is done with a >> `cmpl`-instruction, which only works as long as the bytes/chars are >> not sign extended. >> >> The fix is to simply `and` the characters we are searching for with >> `0xff`/`0xffff` in order to eliminate any erroneous sign extensions. >> >> http://cr.openjdk.java.net/~dnsimon/8215313 >> https://bugs.openjdk.java.net/browse/JDK-8215313 >> >> -Doug > From josef.haider at khg.jku.at Thu Jan 10 21:52:53 2019 From: josef.haider at khg.jku.at (Josef Haider) Date: Thu, 10 Jan 2019 22:52:53 +0100 Subject: [12] RFR(S): 8215313: [AOT] java/lang/String/Split.java fails with AOTed java.base Message-ID: Agreed, cmpw/cmpb would make more sense here, i just wanted to keep the changeset minimal, since the entire method may soon be changed again, anyway. - Josef > Taking another look, it seems like cmpl could be replaced with the > size-appropriate cmpb, cmpw, or cmpl based on byteMode(kind) and > findTwoCharPrefix, but I guess AMD64Assembler only supports cmpl and > cmpq right now. > > dl > > On 1/10/19 10:04 AM, dean.long at oracle.com wrote: > >/Is it OK to modify the values of searchValue[i]?? If the search value />/is already sign-extended, how about sign-extending cmpResult instead />/of zero-extending searchValue? />//>/dl />//>/On 1/10/19 7:09 AM, Doug Simon wrote: />>/Please review this fix supplied by Josef Haider for an incorrect />>/compilation of String.split. />>//>>/When the String.indexOf intrinsic on AMD64 reaches the end of a />>/string, it tries to vectorize the last compare operations by reading />>/past the bounds of the character/byte array. This is not safe if the />>/out-of-bounds read would cross a page boundary, so in that case />>/characters are compared one-by-one. This is done with a />>/`cmpl`-instruction, which only works as long as the bytes/chars are />>/not sign extended. />>//>>/The fix is to simply `and` the characters we are searching for with />>/`0xff`/`0xffff` in order to eliminate any erroneous sign extensions. />>//>>/http://cr.openjdk.java.net/~dnsimon/8215313 />>/https://bugs.openjdk.java.net/browse/JDK-8215313 />>//>>/-Doug />// > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ekaterina.pavlova at oracle.com Thu Jan 10 22:03:30 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Thu, 10 Jan 2019 14:03:30 -0800 Subject: [13] RFR(T): 8216480: Typo in test/hotspot/jtreg/compiler/graalunit/README.md In-Reply-To: References: Message-ID: <2ebd8dec-7f1f-7a13-72af-697e41bb63f7@oracle.com> good, thanks for fixing this. -katya On 1/10/19 3:47 AM, Tobias Hartmann wrote: > Hi, > > please review the following trivial patch that fixes a typo in > https://bugs.openjdk.java.net/browse/JDK-8216480 > http://cr.openjdk.java.net/~thartmann/8216480/webrev.00/ > > Thanks, > Tobias > From vladimir.x.ivanov at oracle.com Fri Jan 11 02:01:59 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 10 Jan 2019 18:01:59 -0800 Subject: [12] RFR (S): 8215757: C2: PhaseIdealLoop::spinup() computes wrong post-dominating point Message-ID: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> http://cr.openjdk.java.net/~vlivanov/8215757/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8215757 Crash happens during SplitIf transformation when PhaseIdealLoop::spinup() erroneously uses the Region being eliminated as a post-dominating merge point (prior_n). Sequence of events during PhaseIdealLoop pass which leads to the crash (IR in question [1]): #0: RegionNode 1722 (R1722) starts with IDOM(R1722) = IfNode 1511 (I1511) #1: Loop strip mining takes place and inserts new loop limit check: Loop: N1572/N1601 limit_check sfpts={ 1595 } ... Counted Loop: N1866/N1601 counted [2,int),+1 (-1 iters) ... Loop: N1865/N1864 limit_check Loop: N1866/N1601 limit_check counted [2,int),+1 (-1 iters) has_sfpt strip_mined #2: As part of loop limit check insertion, new IfNode is created (If 1854) and linked to R1722 as an input which causes R1722 IDOM to be updated [2]. It changes R1722 IDOM (I1511 => R1784), since dom_lca() normalizes the result using find_non_split_ctrl(). #3: SplitIf is performed on I1511 and Phi 1790 is being processed. It has 3 users (197, 198, 199) which are attached to R1710, R1716, and R1722 respectively. At this point: IDOM(R1710) = I1511 IDOM(R1716) = I1511 IDOM(R1722) = R1784 <== #4: PhaseIdealLoop::handle_use() works fine for 197 & 198: 197 =idom=> R1710 =idom=> I796 ( == iff_dom) 198 =idom=> R1716 =idom=> I796 but fails on 199 when it tries to process R1784 (being eliminated) in nested PhaseIdealLoop::spinup() call: 199 =idom=> R1722 =idom=> R1784 =idom=> I796 The root cause is that while PhaseIdealLoop::do_split_if() updates IDOM for If & its projections, it doesn't do that for the corresponding Region (R1784) until the splitting is finished [3]. Proposed fix is to take into account delayed IDOM update (region -> region_dom) and explicitly check for old Region in PhaseIdealLoop::spinup() treating it as iff_dom. Testing: failing test (replay), hs-precheckin-comp, hs-tier1, hs-tier2 (in progress) Thanks! Best regards, Vladimir Ivanov [1] http://cr.openjdk.java.net/~vlivanov/8215757/split_if_1511.png [2] http://hg.openjdk.java.net/jdk/jdk/file/2e1fd6414c4b/src/hotspot/share/opto/loopPredicate.cpp#l161 [3] http://hg.openjdk.java.net/jdk/jdk/file/2e1fd6414c4b/src/hotspot/share/opto/split_if.cpp#l525 void PhaseIdealLoop::do_split_if( Node *iff ) { ... // Lazy replace IDOM info with the region's dominator lazy_replace( iff, region_dom ); ... // Now make the original merge point go dead, by handling all its uses. ... lazy_replace( region, region_dom ); } From Nick.Gasson at arm.com Fri Jan 11 02:36:47 2019 From: Nick.Gasson at arm.com (Nick Gasson (Arm Technology China)) Date: Fri, 11 Jan 2019 02:36:47 +0000 Subject: [aarch64-port-dev ] RFR: 8216350: AArch64: monitor unlock fast path not called In-Reply-To: <51023960-0e8f-56aa-20a4-279017251585@redhat.com> References: <680089e7-ec26-a4cd-6143-4d36182e971a@arm.com> <51023960-0e8f-56aa-20a4-279017251585@redhat.com> Message-ID: Hi all, On 09/01/2019 17:23, Andrew Haley wrote: > > HotSpot policy is that we can do minor cleanups as we go along: > experience has shown that unless you do so, cruft tends to > accumulate. These cleanups are OK for this patch. > Please see the updated webrev here: http://cr.openjdk.java.net/~ngasson/8216350/webrev.1/ Includes cleanups according to Derek's comments and updated the copyright year (thanks Felix). > 4) Slightly better comment for last instruction of fast_unlock (and explicitly use zr). > __ stlr(zr, tmp); // set unowned Note I needed to change the definition of load_store_exclusive to allow ZR here. I've checked that this is OK for the other instructions that use this. Thanks, Nick From vivek.r.deshpande at intel.com Fri Jan 11 06:58:05 2019 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Fri, 11 Jan 2019 06:58:05 +0000 Subject: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A9A148B9C@ORSMSX106.amr.corp.intel.com> Hi Tobias I have webrev for the fixes for the problems with the VNNI optimization. This has 3 fixes. 1) Fix for the crash by matching the operand by swapping to right positions. 2) Cost based generation of vpdpwssd instruction. 3) Fix generation of vector code by allowing adjacent LoadS nodes to be isomorphic when they have different control RangeCheck nodes for a[i] and a[i+1] accesses in same MulAddS2I node Bug ID: https://bugs.openjdk.java.net/browse/JDK-8216050 Webrev: http://cr.openjdk.java.net/~vdeshpande/8216050/webrev.00/ Could you please take a look and review it. Regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Fri Jan 11 08:22:44 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 11 Jan 2019 09:22:44 +0100 Subject: [13] RFR(T): 8216480: Typo in test/hotspot/jtreg/compiler/graalunit/README.md In-Reply-To: <2ebd8dec-7f1f-7a13-72af-697e41bb63f7@oracle.com> References: <2ebd8dec-7f1f-7a13-72af-697e41bb63f7@oracle.com> Message-ID: <46ebd33e-8e18-4b35-f1c7-fd8eac5d87e2@oracle.com> Thanks Katya. Best regards, Tobias On 10.01.19 23:03, Ekaterina Pavlova wrote: > good, thanks for fixing this. > > -katya > > On 1/10/19 3:47 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following trivial patch that fixes a typo in >> https://bugs.openjdk.java.net/browse/JDK-8216480 >> http://cr.openjdk.java.net/~thartmann/8216480/webrev.00/ >> >> Thanks, >> Tobias >> > From doug.simon at oracle.com Fri Jan 11 09:02:24 2019 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 11 Jan 2019 10:02:24 +0100 Subject: [12] RFR(S): 8215313: [AOT] java/lang/String/Split.java fails with AOTed java.base In-Reply-To: References: Message-ID: <88068DB1-E235-448C-A7D8-B1208143CCCF@oracle.com> Hi Josef, > On 10 Jan 2019, at 22:52, Josef Haider wrote: > > Agreed, cmpw/cmpb would make more sense here, i just wanted > to keep the changeset minimal, since the entire method may soon be > changed again, anyway. > Can you please say more about this? Would you recommend applying your current patch as is to fix the crash or will you have the changes you mention ready soon? -Doug >> Taking another look, it seems like cmpl could be replaced with the >> size-appropriate cmpb, cmpw, or cmpl based on byteMode(kind) and >> findTwoCharPrefix, but I guess AMD64Assembler only supports cmpl and >> cmpq right now. >> >> dl >> >> On 1/10/19 10:04 AM, dean.long at oracle.com wrote: >> > Is it OK to modify the values of searchValue[i]? If the search value >> > is already sign-extended, how about sign-extending cmpResult instead >> > of zero-extending searchValue? >> > >> > dl >> > >> > On 1/10/19 7:09 AM, Doug Simon wrote: >> >> Please review this fix supplied by Josef Haider for an incorrect >> >> compilation of String.split. >> >> >> >> When the String.indexOf intrinsic on AMD64 reaches the end of a >> >> string, it tries to vectorize the last compare operations by reading >> >> past the bounds of the character/byte array. This is not safe if the >> >> out-of-bounds read would cross a page boundary, so in that case >> >> characters are compared one-by-one. This is done with a >> >> `cmpl`-instruction, which only works as long as the bytes/chars are >> >> not sign extended. >> >> >> >> The fix is to simply `and` the characters we are searching for with >> >> `0xff`/`0xffff` in order to eliminate any erroneous sign extensions. >> >> >> >> http://cr.openjdk.java.net/~dnsimon/8215313 >> >> https://bugs.openjdk.java.net/browse/JDK-8215313 >> >> >> >> -Doug >> > >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Fri Jan 11 09:16:53 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 11 Jan 2019 10:16:53 +0100 Subject: RFR(XS): 8216549: Mismatched unsafe access to non escaping object fails Message-ID: <877efbzh8a.fsf@redhat.com> http://cr.openjdk.java.net/~roland/8216549/webrev.00/ test1(), test2() and test3() perform an unsafe access with a mismatched access. test1() compiles to an unschedulable graph and causes the compiler to crash. The memory input of the load from a non escaping allocation initially points to a membar but is set to bypass the membar while control stays set to the membar. The load is not eliminated because it's a mismatched memory access, an anti dependence is added between the membar and the load and the graph is unschedulable. test2() and test3() return wrong results: the access is mismatched and misaligned, it's given its own alias by c2 but the MergeMem right after the allocation only points to the allocation for actual fields of the newly allocated object. So the load memory input is set to the memory state on method entry and the load is optimized as zero. I simply propose to make non escaping allocations with mismatched accesses to be non scalar replaceable. Roland. From tobias.hartmann at oracle.com Fri Jan 11 09:31:41 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 11 Jan 2019 10:31:41 +0100 Subject: [13] RFR(S): 8213249: compiler/graalunit/HotspotTest.java failed in ExplicitExceptionTest Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8213249 http://cr.openjdk.java.net/~thartmann/8213249/webrev.01/ The problem is C2's -XX:OmitStackTraceInFastThrow which is enabled by default. Instead of deoptimizing at frequent throws (such as the ArrayIndexOutOfBoundsException in this case), C2 emits code to throw a pre-allocated exception object (see code in GraphKit::builtin_throw()) which has a null message. Thanks, Tobias From rwestrel at redhat.com Fri Jan 11 09:53:41 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 11 Jan 2019 10:53:41 +0100 Subject: [12] RFR (S): 8215757: C2: PhaseIdealLoop::spinup() computes wrong post-dominating point In-Reply-To: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> References: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> Message-ID: <874lafzfiy.fsf@redhat.com> Hi Vladimir, > #2: As part of loop limit check insertion, new IfNode is created (If > 1854) and linked to R1722 as an input which causes R1722 IDOM to be > updated [2]. It changes R1722 IDOM (I1511 => R1784), since dom_lca() > normalizes the result using find_non_split_ctrl(). Isn't that the root cause: the idom of R1722 is still I1511 and not R1784? Roland. From erik.osterlund at oracle.com Fri Jan 11 09:53:23 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 11 Jan 2019 10:53:23 +0100 Subject: RFR: 8216427: ciMethodData::load_extra_data() does not always unpack the last entry Message-ID: <983e6611-417e-9c7a-0643-389fffa0e2cf@oracle.com> Hi, When unpacking the extra data section of the MDOs, the source and destination might not have the same number of entries, because there can be safepoints between cloning the extra data section of the MDO and unpacking the source entries to the destination entries. Therefore the unpacking loop loops through all the source entries and copies them to the destination. Except the last DataLayout::arg_info_data_tag entry, that never gets copied form the source to the destination. Therefore, if a safepoint occurred between cloning the extra data section and unpacking its entries in ciMethodData::load_extra_data(), the last entry could contain random bogus memory. It seems like the reason the last entry is not copied is because the copying of an entry requires a length which is currently calculated by taking the difference between the current entry and the next entry in the loop. But as there is no notion of a next entry when you are at the last DataLayout::arg_info_data_tag entry (because it is always the last one when present), so you can't do that. Therefore, the solution of choice seems to have been simply not copying the last DataLayout::arg_info_data_tag entry, instead of calculating what the length of that entry would be. This patch appropriately calculates the length of the entries instead (which is also defined for DataLayout::arg_info_data_tag) in the copying loop, allowing the last DataLayout::arg_info_data_tag entry to be copied as well. Webrev: http://cr.openjdk.java.net/~eosterlund/8216427/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8216427 Tested through hs-tier1-3. Thanks, /Erik From tobias.hartmann at oracle.com Fri Jan 11 12:48:32 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 11 Jan 2019 13:48:32 +0100 Subject: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A9A148B9C@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A9A148B9C@ORSMSX106.amr.corp.intel.com> Message-ID: <045def80-a536-ae87-7384-09b30e5a8d78@oracle.com> Hi Vivek, On 11.01.19 07:58, Deshpande, Vivek R wrote: > 1) Fix for the crash by matching the operand by swapping to right positions. Looks good but the change to loopopts.cpp:530 screwed up the indentation around the ifs, please fix. > 2) Cost based generation of vpdpwssd instruction. Other instructions added by JDK-8214751 still miss a cost definition, for example: http://hg.openjdk.java.net/jdk/jdk/rev/4bb6e0871bf7#l5.20 > 3) Fix generation of vector code by allowing adjacent LoadS nodes to be isomorphic when they have > different control RangeCheck nodes? > ????for a[i] and a[i+1] accesses in same MulAddS2I node This is unrelated to the original bug, right? If so, this should be integrated with a separate RFE. Thanks, Tobias From martin.doerr at sap.com Fri Jan 11 12:55:22 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 11 Jan 2019 12:55:22 +0000 Subject: RFR(S): 8216556: Unnecessary liveness computation with JVMTI Message-ID: <88842ba1a169406d9628ab06665bd787@sap.com> Hi, I'd like to contribute a small JIT improvement for JVMTI to avoid calling raw_liveness_at_bci when its result is not needed. Bug with description: https://bugs.openjdk.java.net/browse/JDK-8216556 Webrev: http://cr.openjdk.java.net/~mdoerr/8216556_JVMTI_liveness/webrev.00/ Please review. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwestrel at redhat.com Fri Jan 11 13:35:56 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 11 Jan 2019 14:35:56 +0100 Subject: RFR(T): 8216482: Shenandoah: typo in ShenandoahBarrierSetC2::clone_barrier_at_expansion() causes failed compilations In-Reply-To: <87d0p4zkgu.fsf@redhat.com> References: <87o98ozn1g.fsf@redhat.com> <9027c1f5-f562-5abf-6495-645284122da6@oracle.com> <87lg3szlxb.fsf@redhat.com> <51c87854-36c0-903e-c647-0c612ac42c5d@oracle.com> <87ftu0zkx2.fsf@redhat.com> <76c0f61e-9c2f-8d76-2beb-52ba927ed14c@oracle.com> <87d0p4zkgu.fsf@redhat.com> Message-ID: <87tvifxqo3.fsf@redhat.com> FTR, I pushed this one by mistake to jdk/jdk instead of jdk12. I read in: https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2019-January/024470.html that it's ok to push a change to 12 after jdk/jdk so I will do that. Roland. From rwestrel at redhat.com Fri Jan 11 13:51:29 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 11 Jan 2019 14:51:29 +0100 Subject: RFR(XS): 8216549: Mismatched unsafe access to non escaping object fails In-Reply-To: <877efbzh8a.fsf@redhat.com> References: <877efbzh8a.fsf@redhat.com> Message-ID: <87r2djxpy6.fsf@redhat.com> Also: I targeted this to 13 but I don't really have a strong opinion whether it should go in 12 or 13. Roland. From vladimir.x.ivanov at oracle.com Fri Jan 11 18:23:40 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 11 Jan 2019 10:23:40 -0800 Subject: [12] RFR (S): 8215757: C2: PhaseIdealLoop::spinup() computes wrong post-dominating point In-Reply-To: <874lafzfiy.fsf@redhat.com> References: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> <874lafzfiy.fsf@redhat.com> Message-ID: <09abefa3-37b4-5657-22f9-e06f144a1867@oracle.com> >> #2: As part of loop limit check insertion, new IfNode is created (If >> 1854) and linked to R1722 as an input which causes R1722 IDOM to be >> updated [2]. It changes R1722 IDOM (I1511 => R1784), since dom_lca() >> normalizes the result using find_non_split_ctrl(). > > Isn't that the root cause: the idom of R1722 is still I1511 and not > R1784? If it were the case, then PhaseIdealLoop::handle_use()/spinup() would reliably crash on all users of Phi 1790. There are 2 other Regions (R1710 and R1716) which keep their IDOM (I1511) intact and the transformation works fine for them. R1722 is changed during strip mining transformation and its IDOM is recomputed (I1511 => R1784). Then PhaseIdealLoop::handle_use()/spinup() crashes trying to process R1722 user (CallStaticJava 199) and the problem is caused by IDOM(R1722) which is still R1784 and not I798 (region_dom/iff_dom) as for the other Regions (that's the effect of "lazy_replace(iff, region_dom)" in PhaseIdealLoop::do_split_if()). Best regards, Vladimir Ivanov From ekaterina.pavlova at oracle.com Fri Jan 11 18:42:43 2019 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Fri, 11 Jan 2019 10:42:43 -0800 Subject: [13] RFR(S): 8213249: compiler/graalunit/HotspotTest.java failed in ExplicitExceptionTest In-Reply-To: References: Message-ID: The changes look good. thanks. -katya On 1/11/19 1:31 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8213249 > http://cr.openjdk.java.net/~thartmann/8213249/webrev.01/ > > The problem is C2's -XX:OmitStackTraceInFastThrow which is enabled by default. Instead of > deoptimizing at frequent throws (such as the ArrayIndexOutOfBoundsException in this case), C2 emits > code to throw a pre-allocated exception object (see code in GraphKit::builtin_throw()) which has a > null message. > > Thanks, > Tobias > From igor.ignatyev at oracle.com Fri Jan 11 18:47:14 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 11 Jan 2019 10:47:14 -0800 Subject: [13] RFR(S): 8213249: compiler/graalunit/HotspotTest.java failed in ExplicitExceptionTest In-Reply-To: References: Message-ID: <0BE98960-D1DE-4DF9-A15A-BB19C47BA28D@oracle.com> Hi Tobias, the fix looks good to me. Thanks, -- Igor > On Jan 11, 2019, at 10:42 AM, Ekaterina Pavlova wrote: > > The changes look good. > > thanks. > -katya > > On 1/11/19 1:31 AM, Tobias Hartmann wrote: >> Hi, >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8213249 >> http://cr.openjdk.java.net/~thartmann/8213249/webrev.01/ >> The problem is C2's -XX:OmitStackTraceInFastThrow which is enabled by default. Instead of >> deoptimizing at frequent throws (such as the ArrayIndexOutOfBoundsException in this case), C2 emits >> code to throw a pre-allocated exception object (see code in GraphKit::builtin_throw()) which has a >> null message. >> Thanks, >> Tobias > From dean.long at oracle.com Fri Jan 11 18:48:55 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 11 Jan 2019 10:48:55 -0800 Subject: [13] RFR(S): 8213249: compiler/graalunit/HotspotTest.java failed in ExplicitExceptionTest In-Reply-To: References: Message-ID: The fix seems reasonable.? It's a little strange that the test needs to know about a C2 flag, but these tests are already strange because they care about exception messages exactly matching between compilers. dl On 1/11/19 1:31 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8213249 > http://cr.openjdk.java.net/~thartmann/8213249/webrev.01/ > > The problem is C2's -XX:OmitStackTraceInFastThrow which is enabled by default. Instead of > deoptimizing at frequent throws (such as the ArrayIndexOutOfBoundsException in this case), C2 emits > code to throw a pre-allocated exception object (see code in GraphKit::builtin_throw()) which has a > null message. > > Thanks, > Tobias From vladimir.x.ivanov at oracle.com Fri Jan 11 19:23:59 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 11 Jan 2019 11:23:59 -0800 Subject: [12] RFR (S): 8215757: C2: PhaseIdealLoop::spinup() computes wrong post-dominating point In-Reply-To: <09abefa3-37b4-5657-22f9-e06f144a1867@oracle.com> References: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> <874lafzfiy.fsf@redhat.com> <09abefa3-37b4-5657-22f9-e06f144a1867@oracle.com> Message-ID: <21fc0b3a-87d5-0c31-0d63-75eca3c05e5b@oracle.com> On 11/01/2019 10:23, Vladimir Ivanov wrote: > >>> ??? #2: As part of loop limit check insertion, new IfNode is created (If >>> 1854) and linked to R1722 as an input which causes R1722 IDOM to be >>> updated [2]. It changes R1722 IDOM (I1511 => R1784), since dom_lca() >>> normalizes the result using find_non_split_ctrl(). >> >> Isn't that the root cause: the idom of R1722 is still I1511 and not >> R1784? > > If it were the case, then PhaseIdealLoop::handle_use()/spinup() would > reliably crash on all users of Phi 1790. There are 2 other Regions > (R1710 and R1716) which keep their IDOM (I1511) intact and the > transformation works fine for them. > > R1722 is changed during strip mining transformation and its IDOM is > recomputed (I1511 => R1784). To elaborate a bit more on that: the only reason IDOM changes is due to the way it is computed: // rgn = R1722, new_iff = I1854 Node* ridom = idom(rgn); // ridom = I1522 = IDOM(R1722) Node* nrdom = dom_lca(ridom, new_iff); // nrdom = R1784 set_idom(rgn, nrdom, dom_depth(rgn)); Node *dom_lca( Node *n1, Node *n2 ) const { return find_non_split_ctrl(dom_lca_internal(n1, n2)); } dom_lca_internal(I1522, I1854) = I1522 find_non_split_ctrl(I1522) = R1784 If IDOM info is recomputed from scratch, IDOM(R1722) remains I1511. So, eager IDOM normalization (during initial construcion) doesn't help: it would lead to consistently hitting the problem in PhaseIdealLoop::handle_use()/spinup() when processing dependent RegionNodes. Best regards, Vladimir Ivanov From vivek.r.deshpande at intel.com Fri Jan 11 19:38:16 2019 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Fri, 11 Jan 2019 19:38:16 +0000 Subject: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index In-Reply-To: <045def80-a536-ae87-7384-09b30e5a8d78@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A9A148B9C@ORSMSX106.amr.corp.intel.com> <045def80-a536-ae87-7384-09b30e5a8d78@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A9A14A45C@ORSMSX106.amr.corp.intel.com> Hi Tobias Thanks for reviewing the patch. I have made the changes according to your suggestion. In this webrev: http://cr.openjdk.java.net/~vdeshpande/8216050/webrev.01/ I have fix for the crash reported in the 8216050. The lower cost is needed for generation of vpdpwssd instruction, by combining AddVI and MulAddVS2VI. For other instructions pmaddwd and vpmaddwd, they get generated on platforms upto skylake with default cost. I have updated the bug also with the link to webrev. I have created a different bug JDK-8216580 for 3) Fix generation of vector code by allowing adjacent LoadS nodes to be isomorphic when they have different control RangeCheck nodes for a[i] and a[i+1] accesses in same MulAddS2I node Thank you. Regards, Vivek -----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Friday, January 11, 2019 4:49 AM To: Deshpande, Vivek R ; hotspot-compiler-dev at openjdk.java.net compiler Cc: Vladimir Kozlov ; Viswanathan, Sandhya ; Raj, Guru Subject: Re: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index Hi Vivek, On 11.01.19 07:58, Deshpande, Vivek R wrote: > 1) Fix for the crash by matching the operand by swapping to right positions. Looks good but the change to loopopts.cpp:530 screwed up the indentation around the ifs, please fix. > 2) Cost based generation of vpdpwssd instruction. Other instructions added by JDK-8214751 still miss a cost definition, for example: http://hg.openjdk.java.net/jdk/jdk/rev/4bb6e0871bf7#l5.20 > 3) Fix generation of vector code by allowing adjacent LoadS nodes to > be isomorphic when they have different control RangeCheck nodes > ????for a[i] and a[i+1] accesses in same MulAddS2I node This is unrelated to the original bug, right? If so, this should be integrated with a separate RFE. Thanks, Tobias From dean.long at oracle.com Fri Jan 11 19:46:23 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 11 Jan 2019 11:46:23 -0800 Subject: RFR(S): 8216556: Unnecessary liveness computation with JVMTI In-Reply-To: <88842ba1a169406d9628ab06665bd787@sap.com> References: <88842ba1a169406d9628ab06665bd787@sap.com> Message-ID: Hi Martin.? Looks good to me. dl On 1/11/19 4:55 AM, Doerr, Martin wrote: > > Hi, > > I?d like to contribute a small JIT improvement for JVMTI to avoid > calling raw_liveness_at_bci when its result is not needed. > > Bug with description: > > https://bugs.openjdk.java.net/browse/JDK-8216556 > > Webrev: > > http://cr.openjdk.java.net/~mdoerr/8216556_JVMTI_liveness/webrev.00/ > > Please review. > > Best regards, > > Martin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Fri Jan 11 19:49:29 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 11 Jan 2019 11:49:29 -0800 Subject: RFR(XS): 8216549: Mismatched unsafe access to non escaping object fails In-Reply-To: <877efbzh8a.fsf@redhat.com> References: <877efbzh8a.fsf@redhat.com> Message-ID: <7962eba3-28c3-44d4-f88b-58ea9640f25e@oracle.com> Looks good. Best regards, Vladimir Ivanov On 11/01/2019 01:16, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8216549/webrev.00/ > > test1(), test2() and test3() perform an unsafe access with a mismatched > access. > > test1() compiles to an unschedulable graph and causes the compiler to > crash. The memory input of the load from a non escaping allocation > initially points to a membar but is set to bypass the membar while > control stays set to the membar. The load is not eliminated because it's > a mismatched memory access, an anti dependence is added between the > membar and the load and the graph is unschedulable. > > test2() and test3() return wrong results: the access is mismatched and > misaligned, it's given its own alias by c2 but the MergeMem right after > the allocation only points to the allocation for actual fields of the > newly allocated object. So the load memory input is set to the memory > state on method entry and the load is optimized as zero. > > I simply propose to make non escaping allocations with mismatched > accesses to be non scalar replaceable. > > Roland. > From vivek.r.deshpande at intel.com Sat Jan 12 00:03:49 2019 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Sat, 12 Jan 2019 00:03:49 +0000 Subject: RFR(XS):8216580:X86: Fix generation of VNNI vector code by allowing adjacent LoadS nodes to be isomorphic Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A9A14A6DA@ORSMSX106.amr.corp.intel.com> Hi Tobias The webrev for the bug JDK-821650 is here: http://cr.openjdk.java.net/~vdeshpande/8216580/webrev.00/ This fixes generation of vector code by allowing adjacent LoadS nodes to be isomorphic when they have different control RangeCheck nodes for a[i] and a[i+1] accesses in same MulAddS2I node. Could you please review it. Regards, Vivek -----Original Message----- From: Deshpande, Vivek R Sent: Friday, January 11, 2019 11:38 AM To: 'Tobias Hartmann' ; hotspot-compiler-dev at openjdk.java.net compiler Cc: Vladimir Kozlov ; Viswanathan, Sandhya ; Raj, Guru Subject: RE: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index Hi Tobias Thanks for reviewing the patch. I have made the changes according to your suggestion. In this webrev: http://cr.openjdk.java.net/~vdeshpande/8216050/webrev.01/ I have fix for the crash reported in the 8216050. The lower cost is needed for generation of vpdpwssd instruction, by combining AddVI and MulAddVS2VI. For other instructions pmaddwd and vpmaddwd, they get generated on platforms upto skylake with default cost. I have updated the bug also with the link to webrev. I have created a different bug JDK-8216580 for 3) Fix generation of vector code by allowing adjacent LoadS nodes to be isomorphic when they have different control RangeCheck nodes for a[i] and a[i+1] accesses in same MulAddS2I node Thank you. Regards, Vivek -----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Friday, January 11, 2019 4:49 AM To: Deshpande, Vivek R ; hotspot-compiler-dev at openjdk.java.net compiler Cc: Vladimir Kozlov ; Viswanathan, Sandhya ; Raj, Guru Subject: Re: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index Hi Vivek, On 11.01.19 07:58, Deshpande, Vivek R wrote: > 1) Fix for the crash by matching the operand by swapping to right positions. Looks good but the change to loopopts.cpp:530 screwed up the indentation around the ifs, please fix. > 2) Cost based generation of vpdpwssd instruction. Other instructions added by JDK-8214751 still miss a cost definition, for example: http://hg.openjdk.java.net/jdk/jdk/rev/4bb6e0871bf7#l5.20 > 3) Fix generation of vector code by allowing adjacent LoadS nodes to > be isomorphic when they have different control RangeCheck nodes > ????for a[i] and a[i+1] accesses in same MulAddS2I node This is unrelated to the original bug, right? If so, this should be integrated with a separate RFE. Thanks, Tobias From doug.simon at oracle.com Sat Jan 12 12:57:19 2019 From: doug.simon at oracle.com (Doug Simon) Date: Sat, 12 Jan 2019 13:57:19 +0100 Subject: [12] RFR(S): 8215313: [AOT] java/lang/String/Split.java fails with AOTed java.base In-Reply-To: <88068DB1-E235-448C-A7D8-B1208143CCCF@oracle.com> References: <88068DB1-E235-448C-A7D8-B1208143CCCF@oracle.com> Message-ID: > On 11 Jan 2019, at 10:02, Doug Simon wrote: > > Hi Josef, > >> On 10 Jan 2019, at 22:52, Josef Haider > wrote: >> >> Agreed, cmpw/cmpb would make more sense here, i just wanted >> to keep the changeset minimal, since the entire method may soon be >> changed again, anyway. >> > Can you please say more about this? Would you recommend applying your current patch as is to fix the crash or will you have the changes you mention ready soon? Josef has updated his fix to use cmpw/cmpb: http://cr.openjdk.java.net/~dnsimon/8215313/ Previous webrev is now at http://cr.openjdk.java.net/~dnsimon/8215313.old/20190112_1342 Dean, can you please re-review. -Doug > > -Doug >>> Taking another look, it seems like cmpl could be replaced with the >>> size-appropriate cmpb, cmpw, or cmpl based on byteMode(kind) and >>> findTwoCharPrefix, but I guess AMD64Assembler only supports cmpl and >>> cmpq right now. >>> >>> dl >>> >>> On 1/10/19 10:04 AM, dean.long at oracle.com wrote: >>> > Is it OK to modify the values of searchValue[i]? If the search value >>> > is already sign-extended, how about sign-extending cmpResult instead >>> > of zero-extending searchValue? >>> > >>> > dl >>> > >>> > On 1/10/19 7:09 AM, Doug Simon wrote: >>> >> Please review this fix supplied by Josef Haider for an incorrect >>> >> compilation of String.split. >>> >> >>> >> When the String.indexOf intrinsic on AMD64 reaches the end of a >>> >> string, it tries to vectorize the last compare operations by reading >>> >> past the bounds of the character/byte array. This is not safe if the >>> >> out-of-bounds read would cross a page boundary, so in that case >>> >> characters are compared one-by-one. This is done with a >>> >> `cmpl`-instruction, which only works as long as the bytes/chars are >>> >> not sign extended. >>> >> >>> >> The fix is to simply `and` the characters we are searching for with >>> >> `0xff`/`0xffff` in order to eliminate any erroneous sign extensions. >>> >> >>> >> http://cr.openjdk.java.net/~dnsimon/8215313 >>> >> https://bugs.openjdk.java.net/browse/JDK-8215313 >>> >> >>> >> -Doug >>> > >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Sat Jan 12 22:40:27 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sat, 12 Jan 2019 14:40:27 -0800 Subject: [13] RFR(S): 8213249: compiler/graalunit/HotspotTest.java failed in ExplicitExceptionTest In-Reply-To: References: Message-ID: <256a33be-08f5-9670-edf9-ff640f19a54c@oracle.com> Good. Thanks, Vladimir On 1/11/19 1:31 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8213249 > http://cr.openjdk.java.net/~thartmann/8213249/webrev.01/ > > The problem is C2's -XX:OmitStackTraceInFastThrow which is enabled by default. Instead of > deoptimizing at frequent throws (such as the ArrayIndexOutOfBoundsException in this case), C2 emits > code to throw a pre-allocated exception object (see code in GraphKit::builtin_throw()) which has a > null message. > > Thanks, > Tobias > From vladimir.kozlov at oracle.com Sun Jan 13 02:20:19 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sat, 12 Jan 2019 18:20:19 -0800 Subject: RFR: 8216424: Remove or clean up TimeLivenessAnalysis In-Reply-To: References: <9ab53da1-d6d8-d5f8-10b6-da960444aa6c@oracle.com> Message-ID: <52e3cb2a-37d4-6e84-f33c-8d9bf572de7e@oracle.com> Agree with removal. I never used it. Thanks, Vladimir On 1/9/19 8:54 AM, Tobias Hartmann wrote: > Hi Claes, > > Both webrevs look good to me but I would prefer removal as well. I haven't ever seen anyone using > that flag but let's wait for more opinions. > > Best regards, > Tobias > > On 09.01.19 16:10, Claes Redestad wrote: >> Hi, >> >> implementation for the develop flag TimeLivenessAnalysis leaves a few >> breadcrumbs in product builds (in particular TraceTime >> constructors/destructors aren't being inlined, so the compiler doesn't >> realize these objects aren't actually doing anything) >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8216424 >> >> This should be either cleaned up: >> http://cr.openjdk.java.net/~redestad/8216424/cleanup.00/ >> >> .. or the flag should be removed altogether: >> http://cr.openjdk.java.net/~redestad/8216424/remove.00/ >> >> I favor removal since the statistics collected by this analysis does >> not seem very useful and any real performance effect could/should be >> estimated using real profiling tools on product builds, anyhow. >> >> Thanks! >> >> /Claes From vladimir.kozlov at oracle.com Sun Jan 13 02:36:48 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sat, 12 Jan 2019 18:36:48 -0800 Subject: RFR(S): 8216556: Unnecessary liveness computation with JVMTI In-Reply-To: References: <88842ba1a169406d9628ab06665bd787@sap.com> Message-ID: <36d50534-5f4d-d443-bf3c-4286d977faa5@oracle.com> +1 Thanks, Vladimir On 1/11/19 11:46 AM, dean.long at oracle.com wrote: > Hi Martin.? Looks good to me. > > dl > > On 1/11/19 4:55 AM, Doerr, Martin wrote: >> >> Hi, >> >> I?d like to contribute a small JIT improvement for JVMTI to avoid calling raw_liveness_at_bci when its result is not >> needed. >> >> Bug with description: >> >> https://bugs.openjdk.java.net/browse/JDK-8216556 >> >> Webrev: >> >> http://cr.openjdk.java.net/~mdoerr/8216556_JVMTI_liveness/webrev.00/ >> >> Please review. >> >> Best regards, >> >> Martin >> > From vladimir.kozlov at oracle.com Sun Jan 13 02:46:57 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sat, 12 Jan 2019 18:46:57 -0800 Subject: [12] RFR(S) 8216151: [Graal] Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.org.graalvm.compiler.debug Message-ID: <077fcff1-28fc-acbf-6a8b-c299978ae0a2@oracle.com> http://cr.openjdk.java.net/~kvn/8216151/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8216151 Have to update default.policy after changes in jdk.internal.vm.compiler.management files done by JDK-8199755: "Update Graal". Ran CheckAccessClassInPackagePermissions.java test. -- Thanks, Vladimir From claes.redestad at oracle.com Sun Jan 13 11:59:46 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Sun, 13 Jan 2019 12:59:46 +0100 Subject: RFR: 8216424: Remove or clean up TimeLivenessAnalysis In-Reply-To: <52e3cb2a-37d4-6e84-f33c-8d9bf572de7e@oracle.com> References: <9ab53da1-d6d8-d5f8-10b6-da960444aa6c@oracle.com> <52e3cb2a-37d4-6e84-f33c-8d9bf572de7e@oracle.com> Message-ID: Thanks, Vladimir, I'll go ahead and remove it, then. /Claes On 2019-01-13 03:20, Vladimir Kozlov wrote: > Agree with removal. I never used it. > > Thanks, > Vladimir > > On 1/9/19 8:54 AM, Tobias Hartmann wrote: >> Hi Claes, >> >> Both webrevs look good to me but I would prefer removal as well. I >> haven't ever seen anyone using >> that flag but let's wait for more opinions. >> >> Best regards, >> Tobias >> >> On 09.01.19 16:10, Claes Redestad wrote: >>> Hi, >>> >>> implementation for the develop flag TimeLivenessAnalysis leaves a few >>> breadcrumbs in product builds (in particular TraceTime >>> constructors/destructors aren't being inlined, so the compiler doesn't >>> realize these objects aren't actually doing anything) >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8216424 >>> >>> This should be either cleaned up: >>> http://cr.openjdk.java.net/~redestad/8216424/cleanup.00/ >>> >>> .. or the flag should be removed altogether: >>> http://cr.openjdk.java.net/~redestad/8216424/remove.00/ >>> >>> I favor removal since the statistics collected by this analysis does >>> not seem very useful and any real performance effect could/should be >>> estimated using real profiling tools on product builds, anyhow. >>> >>> Thanks! >>> >>> /Claes From bsrbnd at gmail.com Sun Jan 13 17:10:07 2019 From: bsrbnd at gmail.com (B. Blaser) Date: Sun, 13 Jan 2019 18:10:07 +0100 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions In-Reply-To: References: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> Message-ID: On Thu, 10 Jan 2019 at 10:19, Andrew Haley wrote: > > On 1/9/19 12:13 PM, Roman Kennke wrote: > > I cannot say if if this has performance implication. I suspect not. If > > it has, it's probably miniscule improvement. I can't see how it could be > > worse though. > > I can. x86 can have some very weird performance characteristics. It'd be > helpful to do some measurement. I'm not sure we are really able to conclude anything from performance measurement on highly implementation-dependent instructions unless we make an average on a significant number of different x86_64 processors which might well change with future generations... Shouldn't we follow a more pragmatic direction considering that less instructions/registers and a better/smaller encoding is generally preferable, as Roman suggested, which is the purpose of complex instruction sets? Bernard From vladimir.kozlov at oracle.com Sun Jan 13 21:34:20 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sun, 13 Jan 2019 13:34:20 -0800 Subject: RFR(M): 8216135: C2 assert(!had_error) failed: bad dominance In-Reply-To: <87ef9m178o.fsf@redhat.com> References: <87ef9m178o.fsf@redhat.com> Message-ID: <690e8952-5996-6a69-949d-9f196e2b84d8@oracle.com> Looks reasonable. Did you test with switched off UseLoopPredicate? Thanks, Vladimir On 1/9/19 1:59 AM, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8216135/webrev.00/ > > Range check elimination is applied to a loop and then the loop is > unrolled. After the loop is unrolled, the range of values for the > induction variable conflicts with a range check CastII (the loop is over > unrolled and the main loop would never be executed), the CastII's value > becomes top, a data path dies but the corresponding control path is kept > alive. This results in a broken graph. > > This scenario is supposed to be caught by the skeleton predicates added > by 8193130 but it's not for 2 reasons: > > 1- With 8203915 & 8205033, Tobias extended skeleton predicates to cover > not only the first value of the induction variable of the first loop > iteration but also the last value of an unrolled loop. But his changes > only apply to loop predicates, not range check elimination. > > 2- With 8203915 & 8205033, Tobias used an Opaque1 node as a place holder > so on each unrolling, he could update the skeleton predicate with the > new stride. The problem is that the Opaque1 node blocks type > propagation and the skeleton predicate only has a chance to remove a > dead main loop after loop opts are over. In the case of this bug, the > CastII becomes dead before loop opts are finished. > > The problem with 2- is that if the Opaque1 node is not added, on the > next unrolling there's no way to find what predicate and what part of > the predicate to update. The fix I propose, is to keep 3 predicates > after the first unrolling: > > 1 for the first value of the first iteration > 1 for the last value of the last iteration, without an Opaque1 node > 1 with an Opaque1 node that can be used as a template > > On the next unrolling pass, the 1st and 2nd predicates above could have > been optimized out. Rather than try to locate and update the 2nd > predicate, the 1st and 2nd predicates are removed if they are found and, > once the code finds the 3rd predicate, it clones it once to produce the > check on the first value again and a second time to produce an updated > check on the new last value. > > Roland. > From vladimir.kozlov at oracle.com Sun Jan 13 21:57:54 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sun, 13 Jan 2019 13:57:54 -0800 Subject: [12] RFR (S): 8215757: C2: PhaseIdealLoop::spinup() computes wrong post-dominating point In-Reply-To: <21fc0b3a-87d5-0c31-0d63-75eca3c05e5b@oracle.com> References: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> <874lafzfiy.fsf@redhat.com> <09abefa3-37b4-5657-22f9-e06f144a1867@oracle.com> <21fc0b3a-87d5-0c31-0d63-75eca3c05e5b@oracle.com> Message-ID: <14b4d2c8-0cf4-4c49-309b-1838da8536a0@oracle.com> On 1/11/19 11:23 AM, Vladimir Ivanov wrote: > > > On 11/01/2019 10:23, Vladimir Ivanov wrote: >> >>>> ??? #2: As part of loop limit check insertion, new IfNode is created (If >>>> 1854) and linked to R1722 as an input which causes R1722 IDOM to be >>>> updated [2]. It changes R1722 IDOM (I1511 => R1784), since dom_lca() >>>> normalizes the result using find_non_split_ctrl(). >>> >>> Isn't that the root cause: the idom of R1722 is still I1511 and not >>> R1784? >> >> If it were the case, then PhaseIdealLoop::handle_use()/spinup() would reliably crash on all users of Phi 1790. There >> are 2 other Regions (R1710 and R1716) which keep their IDOM (I1511) intact and the transformation works fine for them. >> >> R1722 is changed during strip mining transformation and its IDOM is recomputed (I1511 => R1784). > > To elaborate a bit more on that: the only reason IDOM changes is due to the way it is computed: > ??? // rgn = R1722, new_iff = I1854 > ??? Node* ridom = idom(rgn); // ridom = I1522 = IDOM(R1722) Is it typo? Should it be I1511? I don't see I1522 in graph's picture. > ??? Node* nrdom = dom_lca(ridom, new_iff); // nrdom = R1784 > ??? set_idom(rgn, nrdom, dom_depth(rgn)); > > ??? Node *dom_lca( Node *n1, Node *n2 ) const { > ????? return find_non_split_ctrl(dom_lca_internal(n1, n2)); > ??? } > > ??? dom_lca_internal(I1522, I1854) = I1522 I assume it is 1511. > ??? find_non_split_ctrl(I1522) = R1784 > > If IDOM info is recomputed from scratch, IDOM(R1722) remains I1511. Can you explain more this point? Why result is different if it is from scratch? Thanks, Vladimir K > > So, eager IDOM normalization (during initial construcion) doesn't help: it would lead to consistently hitting the > problem in PhaseIdealLoop::handle_use()/spinup() when processing dependent RegionNodes. > > Best regards, > Vladimir Ivanov From vladimir.kozlov at oracle.com Sun Jan 13 22:05:04 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sun, 13 Jan 2019 14:05:04 -0800 Subject: RFR(XS): 8216549: Mismatched unsafe access to non escaping object fails In-Reply-To: <87r2djxpy6.fsf@redhat.com> References: <877efbzh8a.fsf@redhat.com> <87r2djxpy6.fsf@redhat.com> Message-ID: <579a07cb-7617-6c39-38ba-595e369a41b2@oracle.com> I would suggest to push it into JDK 12. It is P3 and nasty one. Thanks, Vladimir On 1/11/19 5:51 AM, Roland Westrelin wrote: > > Also: I targeted this to 13 but I don't really have a strong opinion > whether it should go in 12 or 13. > > Roland. > From vladimir.kozlov at oracle.com Sun Jan 13 22:05:46 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sun, 13 Jan 2019 14:05:46 -0800 Subject: RFR(XS): 8216549: Mismatched unsafe access to non escaping object fails In-Reply-To: <7962eba3-28c3-44d4-f88b-58ea9640f25e@oracle.com> References: <877efbzh8a.fsf@redhat.com> <7962eba3-28c3-44d4-f88b-58ea9640f25e@oracle.com> Message-ID: <04834367-91bc-e60d-f510-736a8ec2fedd@oracle.com> +1 Thanks, Vladimir K. On 1/11/19 11:49 AM, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 11/01/2019 01:16, Roland Westrelin wrote: >> >> http://cr.openjdk.java.net/~roland/8216549/webrev.00/ >> >> test1(), test2() and test3() perform an unsafe access with a mismatched >> access. >> >> test1() compiles to an unschedulable graph and causes the compiler to >> crash. The memory input of the load from a non escaping allocation >> initially points to a membar but is set to bypass the membar while >> control stays set to the membar. The load is not eliminated because it's >> a mismatched memory access, an anti dependence is added between the >> membar and the load and the graph is unschedulable. >> >> test2() and test3() return wrong results: the access is mismatched and >> misaligned, it's given its own alias by c2 but the MergeMem right after >> the allocation only points to the allocation for actual fields of the >> newly allocated object. So the load memory input is set to the memory >> state on method entry and the load is optimized as zero. >> >> I simply propose to make non escaping allocations with mismatched >> accesses to be non scalar replaceable. >> >> Roland. >> From dean.long at oracle.com Sun Jan 13 22:07:14 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Sun, 13 Jan 2019 14:07:14 -0800 Subject: [12] RFR(S): 8215313: [AOT] java/lang/String/Split.java fails with AOTed java.base In-Reply-To: References: <88068DB1-E235-448C-A7D8-B1208143CCCF@oracle.com> Message-ID: Looks good.? Please update copyright with 2019 end year. dl On 1/12/19 4:57 AM, Doug Simon wrote: > > >> On 11 Jan 2019, at 10:02, Doug Simon > > wrote: >> >> Hi Josef, >> >>> On 10 Jan 2019, at 22:52, Josef Haider >> > wrote: >>> >>> Agreed, cmpw/cmpb would make more sense here, i just wanted >>> to keep the changeset minimal, since the entire method may soon be >>> changed again, anyway. >>> >> Can you please say more about this? Would you recommend applying your >> current patch as is to fix the crash or will you have the changes you >> mention ready soon? > > Josef has updated his fix to use cmpw/cmpb: > > http://cr.openjdk.java.net/~dnsimon/8215313/ > > Previous webrev is now at > http://cr.openjdk.java.net/~dnsimon/8215313.old/20190112_1342 > > Dean, can you please re-review. > > -Doug > >> >> -Doug >>>> Taking another look, it seems like cmpl could be replaced with the >>>> size-appropriate cmpb, cmpw, or cmpl based on byteMode(kind) and >>>> findTwoCharPrefix, but I guess AMD64Assembler only supports cmpl and >>>> cmpq right now. >>>> >>>> dl >>>> >>>> On 1/10/19 10:04 AM,dean.long at oracle.com wrote: >>>> >/Is it OK to modify the values of searchValue[i]?? If the search value />/is already sign-extended, how about sign-extending cmpResult instead />/of zero-extending searchValue? />//>/dl />//>/On 1/10/19 7:09 AM, Doug Simon wrote: />>/Please review this fix supplied by Josef Haider for an incorrect />>/compilation of String.split. />>//>>/When the String.indexOf intrinsic on AMD64 reaches the end of a />>/string, it tries to vectorize the last compare operations by reading />>/past the bounds of the character/byte array. This is not safe if the />>/out-of-bounds read would cross a page boundary, so in that case />>/characters are compared one-by-one. This is done with a />>/`cmpl`-instruction, which only works as long as the bytes/chars are />>/not sign extended. />>//>>/The fix is to simply `and` the characters we are searching for with />>/`0xff`/`0xffff` in order to eliminate any erroneous sign extensions. />>//>>/http://cr.openjdk.java.net/~dnsimon/8215313 />>/https://bugs.openjdk.java.net/browse/JDK-8215313 />>//>>/-Doug />// >>>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Sun Jan 13 22:11:20 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sun, 13 Jan 2019 14:11:20 -0800 Subject: RFR: 8216427: ciMethodData::load_extra_data() does not always unpack the last entry In-Reply-To: <983e6611-417e-9c7a-0643-389fffa0e2cf@oracle.com> References: <983e6611-417e-9c7a-0643-389fffa0e2cf@oracle.com> Message-ID: Looks good. Please run hs-precheckin-comp too. Thanks, Vladimir On 1/11/19 1:53 AM, Erik ?sterlund wrote: > Hi, > > When unpacking the extra data section of the MDOs, the source and destination might not have the same number of entries, > because there can be safepoints between cloning the extra data section of the MDO and unpacking the source entries to > the destination entries. > > Therefore the unpacking loop loops through all the source entries and copies them to the destination. Except the last > DataLayout::arg_info_data_tag entry, that never gets copied form the source to the destination. Therefore, if a > safepoint occurred between cloning the extra data section and unpacking its entries in ciMethodData::load_extra_data(), > the last entry could contain random bogus memory. > > It seems like the reason the last entry is not copied is because the copying of an entry requires a length which is > currently calculated by taking the difference between the current entry and the next entry in the loop. But as there is > no notion of a next entry when you are at the last DataLayout::arg_info_data_tag entry (because it is always the last > one when present), so you can't do that. Therefore, the solution of choice seems to have been simply not copying the > last DataLayout::arg_info_data_tag entry, instead of calculating what the length of that entry would be. > > This patch appropriately calculates the length of the entries instead (which is also defined for > DataLayout::arg_info_data_tag) in the copying loop, allowing the last DataLayout::arg_info_data_tag entry to be copied > as well. > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8216427/webrev.00/ > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8216427 > > Tested through hs-tier1-3. > > Thanks, > /Erik From erik.osterlund at oracle.com Sun Jan 13 22:24:24 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Sun, 13 Jan 2019 14:24:24 -0800 (PST) Subject: RFR: 8216427: ciMethodData::load_extra_data() does not always unpack the last entry In-Reply-To: References: <983e6611-417e-9c7a-0643-389fffa0e2cf@oracle.com> Message-ID: <9605F7DB-5227-45B0-9F7D-FEA3FE050760@oracle.com> Hi Vladimir, Thanks for the review. /Erik > On 13 Jan 2019, at 23:11, Vladimir Kozlov wrote: > > Looks good. > > Please run hs-precheckin-comp too. > > Thanks, > Vladimir > >> On 1/11/19 1:53 AM, Erik ?sterlund wrote: >> Hi, >> When unpacking the extra data section of the MDOs, the source and destination might not have the same number of entries, because there can be safepoints between cloning the extra data section of the MDO and unpacking the source entries to the destination entries. >> Therefore the unpacking loop loops through all the source entries and copies them to the destination. Except the last DataLayout::arg_info_data_tag entry, that never gets copied form the source to the destination. Therefore, if a safepoint occurred between cloning the extra data section and unpacking its entries in ciMethodData::load_extra_data(), the last entry could contain random bogus memory. >> It seems like the reason the last entry is not copied is because the copying of an entry requires a length which is currently calculated by taking the difference between the current entry and the next entry in the loop. But as there is no notion of a next entry when you are at the last DataLayout::arg_info_data_tag entry (because it is always the last one when present), so you can't do that. Therefore, the solution of choice seems to have been simply not copying the last DataLayout::arg_info_data_tag entry, instead of calculating what the length of that entry would be. >> This patch appropriately calculates the length of the entries instead (which is also defined for DataLayout::arg_info_data_tag) in the copying loop, allowing the last DataLayout::arg_info_data_tag entry to be copied as well. >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8216427/webrev.00/ >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8216427 >> Tested through hs-tier1-3. >> Thanks, >> /Erik From martin.doerr at sap.com Mon Jan 14 08:30:33 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 14 Jan 2019 08:30:33 +0000 Subject: RFR(S): 8216556: Unnecessary liveness computation with JVMTI In-Reply-To: <3a600790198e4bbbb6f253daf0af8ff0@sap.com> References: <88842ba1a169406d9628ab06665bd787@sap.com> <9c7afb40-cc2b-9ae8-fb70-4ac3bacb72da@oracle.com> <3a600790198e4bbbb6f253daf0af8ff0@sap.com> Message-ID: Hi Claes, excellent proposal. Thanks. I had not noticed that it currently is in a cpp file. New webrev: http://cr.openjdk.java.net/~mdoerr/8216556_JVMTI_liveness/webrev.01/ What I still don't really like is that we're passing MethodLivenessResult objects on stack via 3 compilation units. But I don't know if it's worth refactoring the code. Best regards, Martin -----Original Message----- From: Claes Redestad Sent: Freitag, 11. Januar 2019 16:45 To: Doerr, Martin Subject: Re: RFR(S): 8216556: Unnecessary liveness computation with JVMTI Hi, just a random thought, but if you're optimizing this and got some measure where it matters(?), maybe you should also try inlining ciEnv::should_retain_local_variables(), i.e., move definition to ciEnv.hpp. If it doesn't bloat static binary size it seems like it won't hurt, at least. /Claes On 2019-01-11 13:55, Doerr, Martin wrote: > Hi, > > I'd like to contribute a small JIT improvement for JVMTI to avoid > calling raw_liveness_at_bci when its result is not needed. > > Bug with description: > > https://bugs.openjdk.java.net/browse/JDK-8216556 > > Webrev: > > http://cr.openjdk.java.net/~mdoerr/8216556_JVMTI_liveness/webrev.00/ > > Please review. > > Best regards, > > Martin > From rwestrel at redhat.com Mon Jan 14 08:32:00 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 14 Jan 2019 09:32:00 +0100 Subject: RFR(M): 8216135: C2 assert(!had_error) failed: bad dominance In-Reply-To: <690e8952-5996-6a69-949d-9f196e2b84d8@oracle.com> References: <87ef9m178o.fsf@redhat.com> <690e8952-5996-6a69-949d-9f196e2b84d8@oracle.com> Message-ID: <87k1j7y70f.fsf@redhat.com> > Looks reasonable. Thanks for the review. > Did you test with switched off UseLoopPredicate? I didn't and sure, that makes sense. I'll run that on my system tonight. Or maybe Tobias can run the same testing he ran with UseLoopPredicate off? Roland. From tobias.hartmann at oracle.com Mon Jan 14 08:40:42 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 14 Jan 2019 09:40:42 +0100 Subject: [13] RFR(S): 8213249: compiler/graalunit/HotspotTest.java failed in ExplicitExceptionTest In-Reply-To: <256a33be-08f5-9670-edf9-ff640f19a54c@oracle.com> References: <256a33be-08f5-9670-edf9-ff640f19a54c@oracle.com> Message-ID: Katya, Igor, Dean, Vladimir, thanks for the reviews! Best regards, Tobias On 12.01.19 23:40, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 1/11/19 1:31 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8213249 >> http://cr.openjdk.java.net/~thartmann/8213249/webrev.01/ >> >> The problem is C2's -XX:OmitStackTraceInFastThrow which is enabled by default. Instead of >> deoptimizing at frequent throws (such as the ArrayIndexOutOfBoundsException in this case), C2 emits >> code to throw a pre-allocated exception object (see code in GraphKit::builtin_throw()) which has a >> null message. >> >> Thanks, >> Tobias >> From tobias.hartmann at oracle.com Mon Jan 14 08:42:31 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 14 Jan 2019 09:42:31 +0100 Subject: RFR(XS): 8216549: Mismatched unsafe access to non escaping object fails In-Reply-To: <877efbzh8a.fsf@redhat.com> References: <877efbzh8a.fsf@redhat.com> Message-ID: Hi Roland, looks good to me as well. I'm fine with pushing to JDK 12. Best regards, Tobias On 11.01.19 10:16, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8216549/webrev.00/ > > test1(), test2() and test3() perform an unsafe access with a mismatched > access. > > test1() compiles to an unschedulable graph and causes the compiler to > crash. The memory input of the load from a non escaping allocation > initially points to a membar but is set to bypass the membar while > control stays set to the membar. The load is not eliminated because it's > a mismatched memory access, an anti dependence is added between the > membar and the load and the graph is unschedulable. > > test2() and test3() return wrong results: the access is mismatched and > misaligned, it's given its own alias by c2 but the MergeMem right after > the allocation only points to the allocation for actual fields of the > newly allocated object. So the load memory input is set to the memory > state on method entry and the load is optimized as zero. > > I simply propose to make non escaping allocations with mismatched > accesses to be non scalar replaceable. > > Roland. > From tobias.hartmann at oracle.com Mon Jan 14 08:43:52 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 14 Jan 2019 09:43:52 +0100 Subject: RFR(M): 8216135: C2 assert(!had_error) failed: bad dominance In-Reply-To: <87k1j7y70f.fsf@redhat.com> References: <87ef9m178o.fsf@redhat.com> <690e8952-5996-6a69-949d-9f196e2b84d8@oracle.com> <87k1j7y70f.fsf@redhat.com> Message-ID: <3a6364a3-0483-f454-87e9-7894cf8e8055@oracle.com> Hi Roland, On 14.01.19 09:32, Roland Westrelin wrote: > I didn't and sure, that makes sense. I'll run that on my system > tonight. Or maybe Tobias can run the same testing he ran with > UseLoopPredicate off? Yes, I'll do that. Best regards, Tobias From doug.simon at oracle.com Mon Jan 14 09:16:31 2019 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 14 Jan 2019 10:16:31 +0100 Subject: [12] RFR(S): 8215313: [AOT] java/lang/String/Split.java fails with AOTed java.base In-Reply-To: References: <88068DB1-E235-448C-A7D8-B1208143CCCF@oracle.com> Message-ID: > On 13 Jan 2019, at 23:07, dean.long at oracle.com wrote: > > Looks good. Thanks for the review. > Please update copyright with 2019 end year. Done. -Doug > > On 1/12/19 4:57 AM, Doug Simon wrote: >> >> >>> On 11 Jan 2019, at 10:02, Doug Simon > wrote: >>> >>> Hi Josef, >>> >>>> On 10 Jan 2019, at 22:52, Josef Haider > wrote: >>>> >>>> Agreed, cmpw/cmpb would make more sense here, i just wanted >>>> to keep the changeset minimal, since the entire method may soon be >>>> changed again, anyway. >>>> >>> Can you please say more about this? Would you recommend applying your current patch as is to fix the crash or will you have the changes you mention ready soon? >> >> Josef has updated his fix to use cmpw/cmpb: >> >> http://cr.openjdk.java.net/~dnsimon/8215313/ >> >> Previous webrev is now at http://cr.openjdk.java.net/~dnsimon/8215313.old/20190112_1342 >> >> Dean, can you please re-review. >> >> -Doug >> >>> >>> -Doug >>>>> Taking another look, it seems like cmpl could be replaced with the >>>>> size-appropriate cmpb, cmpw, or cmpl based on byteMode(kind) and >>>>> findTwoCharPrefix, but I guess AMD64Assembler only supports cmpl and >>>>> cmpq right now. >>>>> >>>>> dl >>>>> >>>>> On 1/10/19 10:04 AM, dean.long at oracle.com wrote: >>>>> > Is it OK to modify the values of searchValue[i]? If the search value >>>>> > is already sign-extended, how about sign-extending cmpResult instead >>>>> > of zero-extending searchValue? >>>>> > >>>>> > dl >>>>> > >>>>> > On 1/10/19 7:09 AM, Doug Simon wrote: >>>>> >> Please review this fix supplied by Josef Haider for an incorrect >>>>> >> compilation of String.split. >>>>> >> >>>>> >> When the String.indexOf intrinsic on AMD64 reaches the end of a >>>>> >> string, it tries to vectorize the last compare operations by reading >>>>> >> past the bounds of the character/byte array. This is not safe if the >>>>> >> out-of-bounds read would cross a page boundary, so in that case >>>>> >> characters are compared one-by-one. This is done with a >>>>> >> `cmpl`-instruction, which only works as long as the bytes/chars are >>>>> >> not sign extended. >>>>> >> >>>>> >> The fix is to simply `and` the characters we are searching for with >>>>> >> `0xff`/`0xffff` in order to eliminate any erroneous sign extensions. >>>>> >> >>>>> >> http://cr.openjdk.java.net/~dnsimon/8215313 >>>>> >> https://bugs.openjdk.java.net/browse/JDK-8215313 >>>>> >> >>>>> >> -Doug >>>>> > >>>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adinn at redhat.com Mon Jan 14 09:55:59 2019 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 14 Jan 2019 09:55:59 +0000 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions In-Reply-To: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> References: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> Message-ID: On 09/01/2019 12:13, Roman Kennke wrote: > While poking around x86_64.ad's cmovP instructions (because I needed it > for an experiment in Shenandoah), I noticed that 2 of them are > disabled/commented-out: cmovP_mem and cmovP_memU. This means that a > cmovp with a 2nd argument that is a LoadP will generate two instructions: > > mov %r1, $mem > cmov %r2, %1 > > instead of just one: > > cmov %r2, $mem > > The comment there says that adlc doesn't compute the bottom-type > correctly, and that implicit null-checking is broken, but I couldn't > confirm either of those. I checked hg annotate, but the commented-out > block stems from revision #1 and cannot be traced to a bug or so. I'm not an expert on aldc but I suspect that the first comment cannot simply be ignored -- even if it appears to work in the cases you have tried. adlc needs to know bottom types both for memory nodes and for machine nodes which coalesce memory ops via rule reductions. This is necessary in order to ensure that ops which affect the same memory slices are scheduled in the correct order. Code in files output_h.cpp ad output_c.cpp generates implementations of a virtual method that retrieves the bottom type for such nodes. CMoveP instructions are handled as a special case (in output_h.cpp) by computing the meet of the bottom types of the first and second ins for the associated node. That's ok when the ins correspond to standard form inputs. However, I'm not sure it will correctly handle a rule containing a rule with a memory form input. Memory inputs are a fiction which corresponds to more than one in node. I think this may end up computing the bottom type using the bottom type of the base address without taking into account any offset. That might well cause nasty errors in the computation of some types. Perhaps someone from the compiler team can comment on this? regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From tobias.hartmann at oracle.com Mon Jan 14 10:04:45 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 14 Jan 2019 11:04:45 +0100 Subject: [12] RFR(S) 8216151: [Graal] Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.org.graalvm.compiler.debug In-Reply-To: <077fcff1-28fc-acbf-6a8b-c299978ae0a2@oracle.com> References: <077fcff1-28fc-acbf-6a8b-c299978ae0a2@oracle.com> Message-ID: Hi Vladimir, looks good to me. Best regards, Tobias On 13.01.19 03:46, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/8216151/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8216151 > > Have to update default.policy after changes in jdk.internal.vm.compiler.management files done by > JDK-8199755: "Update Graal". > > Ran CheckAccessClassInPackagePermissions.java test. > From tobias.hartmann at oracle.com Mon Jan 14 10:17:51 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 14 Jan 2019 11:17:51 +0100 Subject: RFR: 8216427: ciMethodData::load_extra_data() does not always unpack the last entry In-Reply-To: <983e6611-417e-9c7a-0643-389fffa0e2cf@oracle.com> References: <983e6611-417e-9c7a-0643-389fffa0e2cf@oracle.com> Message-ID: Hi Erik, looks good to me too. Best regards, Tobias On 11.01.19 10:53, Erik ?sterlund wrote: > Hi, > > When unpacking the extra data section of the MDOs, the source and destination might not have the > same number of entries, because there can be safepoints between cloning the extra data section of > the MDO and unpacking the source entries to the destination entries. > > Therefore the unpacking loop loops through all the source entries and copies them to the > destination. Except the last DataLayout::arg_info_data_tag entry, that never gets copied form the > source to the destination. Therefore, if a safepoint occurred between cloning the extra data section > and unpacking its entries in ciMethodData::load_extra_data(), the last entry could contain random > bogus memory. > > It seems like the reason the last entry is not copied is because the copying of an entry requires a > length which is currently calculated by taking the difference between the current entry and the next > entry in the loop. But as there is no notion of a next entry when you are at the last > DataLayout::arg_info_data_tag entry (because it is always the last one when present), so you can't > do that. Therefore, the solution of choice seems to have been simply not copying the last > DataLayout::arg_info_data_tag entry, instead of calculating what the length of that entry would be. > > This patch appropriately calculates the length of the entries instead (which is also defined for > DataLayout::arg_info_data_tag) in the copying loop, allowing the last DataLayout::arg_info_data_tag > entry to be copied as well. > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8216427/webrev.00/ > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8216427 > > Tested through hs-tier1-3. > > Thanks, > /Erik From Alan.Bateman at oracle.com Mon Jan 14 10:27:23 2019 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 14 Jan 2019 10:27:23 +0000 Subject: [12] RFR(S) 8216151: [Graal] Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.org.graalvm.compiler.debug In-Reply-To: <077fcff1-28fc-acbf-6a8b-c299978ae0a2@oracle.com> References: <077fcff1-28fc-acbf-6a8b-c299978ae0a2@oracle.com> Message-ID: On 13/01/2019 02:46, Vladimir Kozlov wrote: > http://cr.openjdk.java.net/~kvn/8216151/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8216151 > > Have to update default.policy after changes in > jdk.internal.vm.compiler.management files done by JDK-8199755: "Update > Graal". > > Ran CheckAccessClassInPackagePermissions.java test. > cc'ing security-dev as that is where is the security policy file is maintained. One thing is double check is that code in jdk.internal.vm.compiler.management really needs to access members of classes in the listed packages. I ask because the module definition doesn't export some of these packages to jdk.internal.vm.compiler.management so they aren't accessible even when not running with a security manager. -Alan From tobias.hartmann at oracle.com Mon Jan 14 10:40:13 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 14 Jan 2019 11:40:13 +0100 Subject: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A9A14A45C@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A9A148B9C@ORSMSX106.amr.corp.intel.com> <045def80-a536-ae87-7384-09b30e5a8d78@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A9A14A45C@ORSMSX106.amr.corp.intel.com> Message-ID: <87483987-e5dd-a42b-ff94-da645e041019@oracle.com> Hi Vivek, thanks for making these changes. Looks good to me! A second review would be good. Best regards, Tobias On 11.01.19 20:38, Deshpande, Vivek R wrote: > Hi Tobias > > Thanks for reviewing the patch. > I have made the changes according to your suggestion. > In this webrev: http://cr.openjdk.java.net/~vdeshpande/8216050/webrev.01/ > I have fix for the crash reported in the 8216050. > > The lower cost is needed for generation of vpdpwssd instruction, by combining AddVI and MulAddVS2VI. > For other instructions pmaddwd and vpmaddwd, they get generated on platforms upto skylake with default cost. > > I have updated the bug also with the link to webrev. > > I have created a different bug JDK-8216580 for > 3) Fix generation of vector code by allowing adjacent LoadS nodes to be isomorphic when they have different control RangeCheck nodes > for a[i] and a[i+1] accesses in same MulAddS2I node > > Thank you. > Regards, > Vivek > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Friday, January 11, 2019 4:49 AM > To: Deshpande, Vivek R ; hotspot-compiler-dev at openjdk.java.net compiler > Cc: Vladimir Kozlov ; Viswanathan, Sandhya ; Raj, Guru > Subject: Re: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index > > Hi Vivek, > > On 11.01.19 07:58, Deshpande, Vivek R wrote: >> 1) Fix for the crash by matching the operand by swapping to right positions. > > Looks good but the change to loopopts.cpp:530 screwed up the indentation around the ifs, please fix. > >> 2) Cost based generation of vpdpwssd instruction. > > Other instructions added by JDK-8214751 still miss a cost definition, for example: > http://hg.openjdk.java.net/jdk/jdk/rev/4bb6e0871bf7#l5.20 > >> 3) Fix generation of vector code by allowing adjacent LoadS nodes to >> be isomorphic when they have different control RangeCheck nodes >> ????for a[i] and a[i+1] accesses in same MulAddS2I node > > This is unrelated to the original bug, right? If so, this should be integrated with a separate RFE. > > Thanks, > Tobias > From rkennke at redhat.com Mon Jan 14 10:57:10 2019 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 14 Jan 2019 11:57:10 +0100 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions In-Reply-To: References: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> Message-ID: <176827a6-32e5-8b9a-3441-3dcca9bc2759@redhat.com> Hi Andrew, >> While poking around x86_64.ad's cmovP instructions (because I needed it >> for an experiment in Shenandoah), I noticed that 2 of them are >> disabled/commented-out: cmovP_mem and cmovP_memU. This means that a >> cmovp with a 2nd argument that is a LoadP will generate two instructions: >> >> mov %r1, $mem >> cmov %r2, %1 >> >> instead of just one: >> >> cmov %r2, $mem >> >> The comment there says that adlc doesn't compute the bottom-type >> correctly, and that implicit null-checking is broken, but I couldn't >> confirm either of those. I checked hg annotate, but the commented-out >> block stems from revision #1 and cannot be traced to a bug or so. > > I'm not an expert on aldc but I suspect that the first comment cannot > simply be ignored -- even if it appears to work in the cases you have tried. > > adlc needs to know bottom types both for memory nodes and for machine > nodes which coalesce memory ops via rule reductions. This is necessary > in order to ensure that ops which affect the same memory slices are > scheduled in the correct order. > > Code in files output_h.cpp ad output_c.cpp generates implementations of > a virtual method that retrieves the bottom type for such nodes. CMoveP > instructions are handled as a special case (in output_h.cpp) by > computing the meet of the bottom types of the first and second ins for > the associated node. > > That's ok when the ins correspond to standard form inputs. However, I'm > not sure it will correctly handle a rule containing a rule with a memory > form input. Memory inputs are a fiction which corresponds to more than > one in node. I think this may end up computing the bottom type using the > bottom type of the base address without taking into account any offset. > That might well cause nasty errors in the computation of some types. > > Perhaps someone from the compiler team can comment on this? Yeah, I agree, but I couldn't tell or figure out where and how exactly it's wrong. And since the comment is from changeset #1, and no bug referenced, etc, it's hard to find out. And it might also be possible that it was due to the buggy impl that it was (using 32bit reg instead of 64). Would be good if somebody from compiler team in Oracle could comment on it. Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From erik.osterlund at oracle.com Mon Jan 14 11:26:23 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Mon, 14 Jan 2019 12:26:23 +0100 Subject: RFR: 8216427: ciMethodData::load_extra_data() does not always unpack the last entry In-Reply-To: References: <983e6611-417e-9c7a-0643-389fffa0e2cf@oracle.com> Message-ID: Hi Tobias, Thanks for the review. /Erik > On 14 Jan 2019, at 11:17, Tobias Hartmann wrote: > > Hi Erik, > > looks good to me too. > > Best regards, > Tobias > >> On 11.01.19 10:53, Erik ?sterlund wrote: >> Hi, >> >> When unpacking the extra data section of the MDOs, the source and destination might not have the >> same number of entries, because there can be safepoints between cloning the extra data section of >> the MDO and unpacking the source entries to the destination entries. >> >> Therefore the unpacking loop loops through all the source entries and copies them to the >> destination. Except the last DataLayout::arg_info_data_tag entry, that never gets copied form the >> source to the destination. Therefore, if a safepoint occurred between cloning the extra data section >> and unpacking its entries in ciMethodData::load_extra_data(), the last entry could contain random >> bogus memory. >> >> It seems like the reason the last entry is not copied is because the copying of an entry requires a >> length which is currently calculated by taking the difference between the current entry and the next >> entry in the loop. But as there is no notion of a next entry when you are at the last >> DataLayout::arg_info_data_tag entry (because it is always the last one when present), so you can't >> do that. Therefore, the solution of choice seems to have been simply not copying the last >> DataLayout::arg_info_data_tag entry, instead of calculating what the length of that entry would be. >> >> This patch appropriately calculates the length of the entries instead (which is also defined for >> DataLayout::arg_info_data_tag) in the copying loop, allowing the last DataLayout::arg_info_data_tag >> entry to be copied as well. >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8216427/webrev.00/ >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8216427 >> >> Tested through hs-tier1-3. >> >> Thanks, >> /Erik From tobias.hartmann at oracle.com Mon Jan 14 11:35:54 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 14 Jan 2019 12:35:54 +0100 Subject: RFR(M): 8216135: C2 assert(!had_error) failed: bad dominance In-Reply-To: <3a6364a3-0483-f454-87e9-7894cf8e8055@oracle.com> References: <87ef9m178o.fsf@redhat.com> <690e8952-5996-6a69-949d-9f196e2b84d8@oracle.com> <87k1j7y70f.fsf@redhat.com> <3a6364a3-0483-f454-87e9-7894cf8e8055@oracle.com> Message-ID: <0989b7d2-00f5-20d9-6696-9f88029a47d2@oracle.com> Hi Roland, all tests passed. Best regards, Tobias On 14.01.19 09:43, Tobias Hartmann wrote: > Hi Roland, > > On 14.01.19 09:32, Roland Westrelin wrote: >> I didn't and sure, that makes sense. I'll run that on my system >> tonight. Or maybe Tobias can run the same testing he ran with >> UseLoopPredicate off? > > Yes, I'll do that. > > Best regards, > Tobias > From rwestrel at redhat.com Mon Jan 14 12:51:32 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 14 Jan 2019 13:51:32 +0100 Subject: RFR(XS): 8216549: Mismatched unsafe access to non escaping object fails In-Reply-To: References: <877efbzh8a.fsf@redhat.com> Message-ID: <87ef9fxuzv.fsf@redhat.com> Thanks for the reviews, Vladimir I, Vladimir K and Tobias. I will push it to 12. Roland. From rwestrel at redhat.com Mon Jan 14 14:10:16 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 14 Jan 2019 15:10:16 +0100 Subject: RFR(M): 8216135: C2 assert(!had_error) failed: bad dominance In-Reply-To: <0989b7d2-00f5-20d9-6696-9f88029a47d2@oracle.com> References: <87ef9m178o.fsf@redhat.com> <690e8952-5996-6a69-949d-9f196e2b84d8@oracle.com> <87k1j7y70f.fsf@redhat.com> <3a6364a3-0483-f454-87e9-7894cf8e8055@oracle.com> <0989b7d2-00f5-20d9-6696-9f88029a47d2@oracle.com> Message-ID: <87a7k3xrcn.fsf@redhat.com> Thanks for testing it. Roland. From erik.osterlund at oracle.com Mon Jan 14 15:17:31 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 14 Jan 2019 16:17:31 +0100 Subject: 8216987: ciMethodData::load_data() unpacks MDOs with non-atomic copy Message-ID: <4ed95ecc-91ec-07e8-4adc-1a48be644f1c@oracle.com> Hi, The ciMethodData::load_data() member function copies a raw MDO to the compiler mirror of said MDO. However, the copy is performed using a non-atomic copy function, despite being updated concurrently. This could potentially cause word tearing when reading metadata pointers, causing the VM to crash... in theory. While this is not a problem when unpacking the extra data section, because it is done under a lock, the same can not be said about the rest of the MDO. So it should either be protected by a lock, or use an atomic copy function instead. This patch adds an extra seat belt by performing atomic heap word copy instead. Webrev: http://cr.openjdk.java.net/~eosterlund/8216987/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8216987 Thanks, /Erik From martin.doerr at sap.com Mon Jan 14 15:30:09 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 14 Jan 2019 15:30:09 +0000 Subject: 8216987: ciMethodData::load_data() unpacks MDOs with non-atomic copy In-Reply-To: <4ed95ecc-91ec-07e8-4adc-1a48be644f1c@oracle.com> References: <4ed95ecc-91ec-07e8-4adc-1a48be644f1c@oracle.com> Message-ID: Hi Erik, this looks good. Best regards, Martin -----Original Message----- From: hotspot-compiler-dev On Behalf Of Erik ?sterlund Sent: Montag, 14. Januar 2019 16:18 To: hotspot compiler Subject: 8216987: ciMethodData::load_data() unpacks MDOs with non-atomic copy Hi, The ciMethodData::load_data() member function copies a raw MDO to the compiler mirror of said MDO. However, the copy is performed using a non-atomic copy function, despite being updated concurrently. This could potentially cause word tearing when reading metadata pointers, causing the VM to crash... in theory. While this is not a problem when unpacking the extra data section, because it is done under a lock, the same can not be said about the rest of the MDO. So it should either be protected by a lock, or use an atomic copy function instead. This patch adds an extra seat belt by performing atomic heap word copy instead. Webrev: http://cr.openjdk.java.net/~eosterlund/8216987/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8216987 Thanks, /Erik From erik.osterlund at oracle.com Mon Jan 14 15:32:05 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 14 Jan 2019 16:32:05 +0100 Subject: 8216987: ciMethodData::load_data() unpacks MDOs with non-atomic copy In-Reply-To: References: <4ed95ecc-91ec-07e8-4adc-1a48be644f1c@oracle.com> Message-ID: Hi Martin, Thanks for the review. /Erik On 2019-01-14 16:30, Doerr, Martin wrote: > Hi Erik, > > this looks good. > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Erik ?sterlund > Sent: Montag, 14. Januar 2019 16:18 > To: hotspot compiler > Subject: 8216987: ciMethodData::load_data() unpacks MDOs with non-atomic copy > > Hi, > > The ciMethodData::load_data() member function copies a raw MDO to the > compiler mirror of said MDO. However, the copy is performed using a > non-atomic copy function, despite being updated concurrently. This could > potentially cause word tearing when reading metadata pointers, causing > the VM to crash... in theory. > > While this is not a problem when unpacking the extra data section, > because it is done under a lock, the same can not be said about the rest > of the MDO. So it should either be protected by a lock, or use an atomic > copy function instead. > > This patch adds an extra seat belt by performing atomic heap word copy > instead. > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8216987/webrev.00/ > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8216987 > > Thanks, > /Erik From tobias.hartmann at oracle.com Mon Jan 14 16:12:18 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 14 Jan 2019 17:12:18 +0100 Subject: 8216987: ciMethodData::load_data() unpacks MDOs with non-atomic copy In-Reply-To: <4ed95ecc-91ec-07e8-4adc-1a48be644f1c@oracle.com> References: <4ed95ecc-91ec-07e8-4adc-1a48be644f1c@oracle.com> Message-ID: <124bdce0-dce0-0ab2-6b2c-514d336c54a7@oracle.com> Hi Erik, looks good. Best regards, Tobias On 14.01.19 16:17, Erik ?sterlund wrote: > Hi, > > The ciMethodData::load_data() member function copies a raw MDO to the compiler mirror of said MDO. > However, the copy is performed using a non-atomic copy function, despite being updated concurrently. > This could potentially cause word tearing when reading metadata pointers, causing the VM to crash... > in theory. > > While this is not a problem when unpacking the extra data section, because it is done under a lock, > the same can not be said about the rest of the MDO. So it should either be protected by a lock, or > use an atomic copy function instead. > > This patch adds an extra seat belt by performing atomic heap word copy instead. > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8216987/webrev.00/ > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8216987 > > Thanks, > /Erik From erik.osterlund at oracle.com Mon Jan 14 16:21:20 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 14 Jan 2019 17:21:20 +0100 Subject: 8216987: ciMethodData::load_data() unpacks MDOs with non-atomic copy In-Reply-To: <124bdce0-dce0-0ab2-6b2c-514d336c54a7@oracle.com> References: <4ed95ecc-91ec-07e8-4adc-1a48be644f1c@oracle.com> <124bdce0-dce0-0ab2-6b2c-514d336c54a7@oracle.com> Message-ID: <0337f39d-6bf4-0c08-91bc-e0a040ea6646@oracle.com> Hi Tobias, Thanks for the review. /Erik On 2019-01-14 17:12, Tobias Hartmann wrote: > Hi Erik, > > looks good. > > Best regards, > Tobias > > On 14.01.19 16:17, Erik ?sterlund wrote: >> Hi, >> >> The ciMethodData::load_data() member function copies a raw MDO to the compiler mirror of said MDO. >> However, the copy is performed using a non-atomic copy function, despite being updated concurrently. >> This could potentially cause word tearing when reading metadata pointers, causing the VM to crash... >> in theory. >> >> While this is not a problem when unpacking the extra data section, because it is done under a lock, >> the same can not be said about the rest of the MDO. So it should either be protected by a lock, or >> use an atomic copy function instead. >> >> This patch adds an extra seat belt by performing atomic heap word copy instead. >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8216987/webrev.00/ >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8216987 >> >> Thanks, >> /Erik From patric.hedlin at oracle.com Mon Jan 14 16:47:35 2019 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Mon, 14 Jan 2019 17:47:35 +0100 Subject: RFR(S): 8210392: assert(Compile::current()->live_nodes() < Compile::current()->max_node_limit()) failed: Live Node limit exceeded limit In-Reply-To: <2f4e12e4-459d-b96b-6cf2-50d6dba098d9@oracle.com> References: <28011331-bd43-2c32-dba4-e41879ffe28a@oracle.com> <99f3f410-7200-5fb1-fccd-c39e35c20288@oracle.com> <2f4e12e4-459d-b96b-6cf2-50d6dba098d9@oracle.com> Message-ID: <0aece297-3929-7db5-7054-190163fe65fd@oracle.com> Thanks for reviewing Tobias, On 12/18/18 1:37 PM, Tobias Hartmann wrote: > Hi Patric, > > were you able to reproduce this with a test (I see that one is attached to the bug)? If so, please > add it to the webrev. Please also remove the extra newlines (for example, in line 1146). > > The comment in line 1027 says "Use same limit as split_if_with_blocks_post". I think this is > outdated right? Updated webrev with test-case. Fixed #?%#. Best regards, Patric > Best regards, > Tobias > > On 18.12.18 12:48, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8210392 >> >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8210392/ >> >> >> 8210392: assert(Compile::current()->live_nodes() < Compile::current()->max_node_limit()) failed: >> Live Node limit exceeded limit >> >> ??? Avoid excessive split-if through a crude throttling approach. >> >> >> Testing: hs-tier1-4, hs-precheckin-comp >> >> >> Best regards, >> Patric From tobias.hartmann at oracle.com Mon Jan 14 16:52:39 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 14 Jan 2019 17:52:39 +0100 Subject: RFR(S): 8210392: assert(Compile::current()->live_nodes() < Compile::current()->max_node_limit()) failed: Live Node limit exceeded limit In-Reply-To: <0aece297-3929-7db5-7054-190163fe65fd@oracle.com> References: <28011331-bd43-2c32-dba4-e41879ffe28a@oracle.com> <99f3f410-7200-5fb1-fccd-c39e35c20288@oracle.com> <2f4e12e4-459d-b96b-6cf2-50d6dba098d9@oracle.com> <0aece297-3929-7db5-7054-190163fe65fd@oracle.com> Message-ID: Hi Patric, thanks for adding the test. This looks good to me. Best regards, Tobias On 14.01.19 17:47, Patric Hedlin wrote: > Thanks for reviewing Tobias, > > On 12/18/18 1:37 PM, Tobias Hartmann wrote: >> Hi Patric, >> >> were you able to reproduce this with a test (I see that one is attached to the bug)? If so, please >> add it to the webrev. Please also remove the extra newlines (for example, in line 1146). >> >> The comment in line 1027 says "Use same limit as split_if_with_blocks_post". I think this is >> outdated right? > > Updated webrev with test-case. > > Fixed #?%#. > > Best regards, > Patric > >> Best regards, >> Tobias >> >> On 18.12.18 12:48, Patric Hedlin wrote: >>> Dear all, >>> >>> I would like to ask for help to review the following change/update: >>> >>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8210392 >>> >>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8210392/ >>> >>> >>> 8210392: assert(Compile::current()->live_nodes() < Compile::current()->max_node_limit()) failed: >>> Live Node limit exceeded limit >>> >>> ???? Avoid excessive split-if through a crude throttling approach. >>> >>> >>> Testing: hs-tier1-4, hs-precheckin-comp >>> >>> >>> Best regards, >>> Patric From vladimir.kozlov at oracle.com Mon Jan 14 17:03:16 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Jan 2019 09:03:16 -0800 Subject: 8216987: ciMethodData::load_data() unpacks MDOs with non-atomic copy In-Reply-To: <4ed95ecc-91ec-07e8-4adc-1a48be644f1c@oracle.com> References: <4ed95ecc-91ec-07e8-4adc-1a48be644f1c@oracle.com> Message-ID: <428e592e-fe7c-80e5-21a0-122774baccf7@oracle.com> Good. Thanks, Vladimir On 1/14/19 7:17 AM, Erik ?sterlund wrote: > Hi, > > The ciMethodData::load_data() member function copies a raw MDO to the compiler mirror of said MDO. However, the copy is > performed using a non-atomic copy function, despite being updated concurrently. This could potentially cause word > tearing when reading metadata pointers, causing the VM to crash... in theory. > > While this is not a problem when unpacking the extra data section, because it is done under a lock, the same can not be > said about the rest of the MDO. So it should either be protected by a lock, or use an atomic copy function instead. > > This patch adds an extra seat belt by performing atomic heap word copy instead. > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8216987/webrev.00/ > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8216987 > > Thanks, > /Erik From erik.osterlund at oracle.com Mon Jan 14 17:04:24 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Mon, 14 Jan 2019 18:04:24 +0100 Subject: 8216987: ciMethodData::load_data() unpacks MDOs with non-atomic copy In-Reply-To: <428e592e-fe7c-80e5-21a0-122774baccf7@oracle.com> References: <4ed95ecc-91ec-07e8-4adc-1a48be644f1c@oracle.com> <428e592e-fe7c-80e5-21a0-122774baccf7@oracle.com> Message-ID: Hi Vladimir, Thanks for the review. /Erik > On 14 Jan 2019, at 18:03, Vladimir Kozlov wrote: > > Good. > > Thanks, > Vladimir > >> On 1/14/19 7:17 AM, Erik ?sterlund wrote: >> Hi, >> The ciMethodData::load_data() member function copies a raw MDO to the compiler mirror of said MDO. However, the copy is performed using a non-atomic copy function, despite being updated concurrently. This could potentially cause word tearing when reading metadata pointers, causing the VM to crash... in theory. >> While this is not a problem when unpacking the extra data section, because it is done under a lock, the same can not be said about the rest of the MDO. So it should either be protected by a lock, or use an atomic copy function instead. >> This patch adds an extra seat belt by performing atomic heap word copy instead. >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8216987/webrev.00/ >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8216987 >> Thanks, >> /Erik From vladimir.kozlov at oracle.com Mon Jan 14 17:06:42 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Jan 2019 09:06:42 -0800 Subject: [12] RFR(S) 8216151: [Graal] Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.org.graalvm.compiler.debug In-Reply-To: References: <077fcff1-28fc-acbf-6a8b-c299978ae0a2@oracle.com> Message-ID: <138390b2-7542-619c-eef1-b775e5bb2064@oracle.com> Thank you, Tobias Vladimir On 1/14/19 2:04 AM, Tobias Hartmann wrote: > Hi Vladimir, > > looks good to me. > > Best regards, > Tobias > > > On 13.01.19 03:46, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/8216151/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8216151 >> >> Have to update default.policy after changes in jdk.internal.vm.compiler.management files done by >> JDK-8199755: "Update Graal". >> >> Ran CheckAccessClassInPackagePermissions.java test. >> From vladimir.kozlov at oracle.com Mon Jan 14 17:39:10 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Jan 2019 09:39:10 -0800 Subject: [12] RFR(S) 8216151: [Graal] Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.org.graalvm.compiler.debug In-Reply-To: References: <077fcff1-28fc-acbf-6a8b-c299978ae0a2@oracle.com> Message-ID: Thank you, Alan On 1/14/19 2:27 AM, Alan Bateman wrote: > On 13/01/2019 02:46, Vladimir Kozlov wrote: >> http://cr.openjdk.java.net/~kvn/8216151/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8216151 >> >> Have to update default.policy after changes in jdk.internal.vm.compiler.management files done by JDK-8199755: "Update >> Graal". >> >> Ran CheckAccessClassInPackagePermissions.java test. >> > cc'ing security-dev as that is where is the security policy file is maintained. > > One thing is double check is that code in jdk.internal.vm.compiler.management really needs to access members of classes > in the listed packages. I ask because the module definition doesn't export some of these packages to > jdk.internal.vm.compiler.management so they aren't accessible even when not running with a security manager. I verified that all listed packages are used by compiler.management and I listed only needed in default.policy. I used CheckAccessClassInPackagePermissions.java test to find which permissions are needed. Thanks, Vladimir > > -Alan From mandy.chung at oracle.com Mon Jan 14 18:29:39 2019 From: mandy.chung at oracle.com (Mandy Chung) Date: Mon, 14 Jan 2019 10:29:39 -0800 Subject: [12] RFR(S) 8216151: [Graal] Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.org.graalvm.compiler.debug In-Reply-To: References: <077fcff1-28fc-acbf-6a8b-c299978ae0a2@oracle.com> Message-ID: On 1/14/19 9:39 AM, Vladimir Kozlov wrote: > Thank you, Alan > > On 1/14/19 2:27 AM, Alan Bateman wrote: >> On 13/01/2019 02:46, Vladimir Kozlov wrote: >>> http://cr.openjdk.java.net/~kvn/8216151/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8216151 >>> >>> Have to update default.policy after changes in >>> jdk.internal.vm.compiler.management files done by JDK-8199755: >>> "Update Graal". >>> >>> Ran CheckAccessClassInPackagePermissions.java test. >>> >> cc'ing security-dev as that is where is the security policy file is >> maintained. >> >> One thing is double check is that code in >> jdk.internal.vm.compiler.management really needs to access members of >> classes in the listed packages. I ask because the module definition >> doesn't export some of these packages to >> jdk.internal.vm.compiler.management so they aren't accessible even >> when not running with a security manager. > > I verified that all listed packages are used by compiler.management > and I listed only needed in default.policy. I used > CheckAccessClassInPackagePermissions.java test to find which > permissions are needed. > I reviewed the change and the list matches the list of qualified exports from jdk.internal.vm.compiler to jdk.internal.vm.compiler.management. The security team has been looking into removing the private VM call out to ClassLoader::checkPackageAccess.? When that's removed, we would not need to maintain these accessClassInPackage permission to access any new qualified exports. Mandy -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Mon Jan 14 18:31:07 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Jan 2019 10:31:07 -0800 Subject: [12] RFR(S) 8216151: [Graal] Module jdk.internal.vm.compiler.management has not been granted accessClassInPackage.org.graalvm.compiler.debug In-Reply-To: References: <077fcff1-28fc-acbf-6a8b-c299978ae0a2@oracle.com> Message-ID: <2ad1f183-f3fd-eb0e-f56b-64c5746c6c08@oracle.com> Thank you Mandy for review. Vladimir On 1/14/19 10:29 AM, Mandy Chung wrote: > > > On 1/14/19 9:39 AM, Vladimir Kozlov wrote: >> Thank you, Alan >> >> On 1/14/19 2:27 AM, Alan Bateman wrote: >>> On 13/01/2019 02:46, Vladimir Kozlov wrote: >>>> http://cr.openjdk.java.net/~kvn/8216151/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8216151 >>>> >>>> Have to update default.policy after changes in jdk.internal.vm.compiler.management files done by JDK-8199755: >>>> "Update Graal". >>>> >>>> Ran CheckAccessClassInPackagePermissions.java test. >>>> >>> cc'ing security-dev as that is where is the security policy file is maintained. >>> >>> One thing is double check is that code in jdk.internal.vm.compiler.management really needs to access members of >>> classes in the listed packages. I ask because the module definition doesn't export some of these packages to >>> jdk.internal.vm.compiler.management so they aren't accessible even when not running with a security manager. >> >> I verified that all listed packages are used by compiler.management and I listed only needed in default.policy. I used >> CheckAccessClassInPackagePermissions.java test to find which permissions are needed. >> > > I reviewed the change and the list matches the list of qualified exports from jdk.internal.vm.compiler to > jdk.internal.vm.compiler.management. > > The security team has been looking into removing the private VM call out to ClassLoader::checkPackageAccess.? When > that's removed, we would not need to maintain these accessClassInPackage permission to access any new qualified exports. > > Mandy From vivek.r.deshpande at intel.com Mon Jan 14 19:40:10 2019 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Mon, 14 Jan 2019 19:40:10 +0000 Subject: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index In-Reply-To: <87483987-e5dd-a42b-ff94-da645e041019@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A9A148B9C@ORSMSX106.amr.corp.intel.com> <045def80-a536-ae87-7384-09b30e5a8d78@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A9A14A45C@ORSMSX106.amr.corp.intel.com> <87483987-e5dd-a42b-ff94-da645e041019@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A9A14C5B9@ORSMSX106.amr.corp.intel.com> Thanks Tobias for reviewing it. Regards, Vivek -----Original Message----- From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] Sent: Monday, January 14, 2019 2:40 AM To: Deshpande, Vivek R ; hotspot-compiler-dev at openjdk.java.net compiler Cc: Vladimir Kozlov ; Viswanathan, Sandhya ; Raj, Guru Subject: Re: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index Hi Vivek, thanks for making these changes. Looks good to me! A second review would be good. Best regards, Tobias On 11.01.19 20:38, Deshpande, Vivek R wrote: > Hi Tobias > > Thanks for reviewing the patch. > I have made the changes according to your suggestion. > In this webrev: > http://cr.openjdk.java.net/~vdeshpande/8216050/webrev.01/ > I have fix for the crash reported in the 8216050. > > The lower cost is needed for generation of vpdpwssd instruction, by combining AddVI and MulAddVS2VI. > For other instructions pmaddwd and vpmaddwd, they get generated on platforms upto skylake with default cost. > > I have updated the bug also with the link to webrev. > > I have created a different bug JDK-8216580 for > 3) Fix generation of vector code by allowing adjacent LoadS nodes to be isomorphic when they have different control RangeCheck nodes > for a[i] and a[i+1] accesses in same MulAddS2I node > > Thank you. > Regards, > Vivek > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Friday, January 11, 2019 4:49 AM > To: Deshpande, Vivek R ; > hotspot-compiler-dev at openjdk.java.net compiler > > Cc: Vladimir Kozlov ; Viswanathan, Sandhya > ; Raj, Guru > Subject: Re: RFR(S):8216050:X86: Fix for Superword optimization fails > with assert(0 <= i && i < _len) failed: illegal index > > Hi Vivek, > > On 11.01.19 07:58, Deshpande, Vivek R wrote: >> 1) Fix for the crash by matching the operand by swapping to right positions. > > Looks good but the change to loopopts.cpp:530 screwed up the indentation around the ifs, please fix. > >> 2) Cost based generation of vpdpwssd instruction. > > Other instructions added by JDK-8214751 still miss a cost definition, for example: > http://hg.openjdk.java.net/jdk/jdk/rev/4bb6e0871bf7#l5.20 > >> 3) Fix generation of vector code by allowing adjacent LoadS nodes to >> be isomorphic when they have different control RangeCheck nodes >> ????for a[i] and a[i+1] accesses in same MulAddS2I node > > This is unrelated to the original bug, right? If so, this should be integrated with a separate RFE. > > Thanks, > Tobias > From tom.rodriguez at oracle.com Mon Jan 14 20:14:11 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 14 Jan 2019 12:14:11 -0800 Subject: [12] RFR(S): 8215313: [AOT] java/lang/String/Split.java fails with AOTed java.base In-Reply-To: References: <88068DB1-E235-448C-A7D8-B1208143CCCF@oracle.com> Message-ID: <8fcf202b-7115-7dac-87b1-d2be550ad8c1@oracle.com> Looks good. tom Doug Simon wrote on 1/12/19 4:57 AM: > > >> On 11 Jan 2019, at 10:02, Doug Simon > > wrote: >> >> Hi Josef, >> >>> On 10 Jan 2019, at 22:52, Josef Haider >> > wrote: >>> >>> Agreed, cmpw/cmpb would make more sense here, i just wanted >>> to keep the changeset minimal, since the entire method may soon be >>> changed again, anyway. >>> >> Can you please say more about this? Would you recommend applying your >> current patch as is to fix the crash or will you have the changes you >> mention ready soon? > > Josef has updated his fix to use cmpw/cmpb: > > http://cr.openjdk.java.net/~dnsimon/8215313/ > > Previous webrev is now at > http://cr.openjdk.java.net/~dnsimon/8215313.old/20190112_1342 > > Dean, can you please re-review. > > -Doug > >> >> -Doug >>>> Taking another look, it seems like cmpl could be replaced with the >>>> size-appropriate cmpb, cmpw, or cmpl based on byteMode(kind) and >>>> findTwoCharPrefix, but I guess AMD64Assembler only supports cmpl and >>>> cmpq right now. >>>> >>>> dl >>>> >>>> On 1/10/19 10:04 AM,dean.long at oracle.com >>>> wrote: >>>> >/Is it OK to modify the values of searchValue[i]?? If the search value />/is already sign-extended, how about sign-extending cmpResult instead />/of zero-extending searchValue? />//>/dl />//>/On 1/10/19 7:09 AM, Doug Simon wrote: />>/Please review this fix supplied by Josef Haider for an incorrect />>/compilation of String.split. />>//>>/When the String.indexOf intrinsic on AMD64 reaches the end of a />>/string, it tries to vectorize the last compare operations by reading />>/past the bounds of the character/byte array. This is not safe if the />>/out-of-bounds read would cross a page boundary, so in that case />>/characters are compared one-by-one. This is done with a />>/`cmpl`-instruction, which only works as long as the bytes/chars are />>/not sign extended. />>//>>/The fix is to simply `and` the characters we are searching for with />>/`0xff`/`0xffff` in order to eliminate any erroneous sign extensions. />>//>>/http://cr.openjdk.java.net/~dnsimon/8215313 />>/https://bugs.openjdk.java.net/browse/JDK-8215313 />>//>>/-Doug />// >>>> >>> >>> >> > From vladimir.kozlov at oracle.com Mon Jan 14 20:25:55 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Jan 2019 12:25:55 -0800 Subject: [12] RFR(S): 8215313: [AOT] java/lang/String/Split.java fails with AOTed java.base In-Reply-To: <8fcf202b-7115-7dac-87b1-d2be550ad8c1@oracle.com> References: <88068DB1-E235-448C-A7D8-B1208143CCCF@oracle.com> <8fcf202b-7115-7dac-87b1-d2be550ad8c1@oracle.com> Message-ID: <112920cf-46d5-0082-e2ca-79e21fdb7ce6@oracle.com> +1 Thanks, Vladimir On 1/14/19 12:14 PM, Tom Rodriguez wrote: > Looks good. > > tom > > Doug Simon wrote on 1/12/19 4:57 AM: >> >> >>> On 11 Jan 2019, at 10:02, Doug Simon > wrote: >>> >>> Hi Josef, >>> >>>> On 10 Jan 2019, at 22:52, Josef Haider > wrote: >>>> >>>> Agreed, cmpw/cmpb would make more sense here, i just wanted >>>> to keep the changeset minimal, since the entire method may soon be >>>> changed again, anyway. >>>> >>> Can you please say more about this? Would you recommend applying your current patch as is to fix the crash or will >>> you have the changes you mention ready soon? >> >> Josef has updated his fix to use cmpw/cmpb: >> >> http://cr.openjdk.java.net/~dnsimon/8215313/ >> >> Previous webrev is now at http://cr.openjdk.java.net/~dnsimon/8215313.old/20190112_1342 >> >> Dean, can you please re-review. >> >> -Doug >> >>> >>> -Doug >>>>> Taking another look, it seems like cmpl could be replaced with the >>>>> size-appropriate cmpb, cmpw, or cmpl based on byteMode(kind) and >>>>> findTwoCharPrefix, but I guess AMD64Assembler only supports cmpl and >>>>> cmpq right now. >>>>> >>>>> dl >>>>> >>>>> On 1/10/19 10:04 AM,dean.long at oracle.com >>>>> wrote: >>>>> >/Is it OK to modify the values of searchValue[i]?? If the search value />/is already sign-extended, how about >>>>> sign-extending cmpResult instead />/of zero-extending searchValue? />//>/dl />//>/On 1/10/19 7:09 AM, Doug Simon >>>>> wrote: />>/Please review this fix supplied by Josef Haider for an incorrect />>/compilation of String.split. >>>>> />>//>>/When the String.indexOf intrinsic on AMD64 reaches the end of a />>/string, it tries to vectorize the last >>>>> compare operations by reading />>/past the bounds of the character/byte array. This is not safe if the >>>>> />>/out-of-bounds read would cross a page boundary, so in that case />>/characters are compared one-by-one. This is >>>>> done with a />>/`cmpl`-instruction, which only works as long as the bytes/chars are />>/not sign extended. >>>>> />>//>>/The fix is to simply `and` the characters we are searching for with />>/`0xff`/`0xffff` in order to >>>>> eliminate any erroneous sign extensions. />>//>>/http://cr.openjdk.java.net/~dnsimon/8215313 >>>>> />>/https://bugs.openjdk.java.net/browse/JDK-8215313 />>//>>/-Doug />// >>>>> >>>> >>>> >>> >> From vladimir.x.ivanov at oracle.com Mon Jan 14 21:55:37 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 14 Jan 2019 13:55:37 -0800 Subject: [12] RFR (S): 8215757: C2: PhaseIdealLoop::spinup() computes wrong post-dominating point In-Reply-To: <14b4d2c8-0cf4-4c49-309b-1838da8536a0@oracle.com> References: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> <874lafzfiy.fsf@redhat.com> <09abefa3-37b4-5657-22f9-e06f144a1867@oracle.com> <21fc0b3a-87d5-0c31-0d63-75eca3c05e5b@oracle.com> <14b4d2c8-0cf4-4c49-309b-1838da8536a0@oracle.com> Message-ID: <6ebc3b71-bc58-0a14-20c9-aaac3a705f91@oracle.com> >>> If it were the case, then PhaseIdealLoop::handle_use()/spinup() would >>> reliably crash on all users of Phi 1790. There are 2 other Regions >>> (R1710 and R1716) which keep their IDOM (I1511) intact and the >>> transformation works fine for them. >>> >>> R1722 is changed during strip mining transformation and its IDOM is >>> recomputed (I1511 => R1784). >> >> To elaborate a bit more on that: the only reason IDOM changes is due >> to the way it is computed: >> ???? // rgn = R1722, new_iff = I1854 >> ???? Node* ridom = idom(rgn); // ridom = I1522 = IDOM(R1722) > > Is it typo? Should it be I1511? I don't see I1522 in graph's picture. Yes, it should be I1511. Sorry for the confusion. >> ???? Node* nrdom = dom_lca(ridom, new_iff); // nrdom = R1784 >> ???? set_idom(rgn, nrdom, dom_depth(rgn)); >> >> ???? Node *dom_lca( Node *n1, Node *n2 ) const { >> ?????? return find_non_split_ctrl(dom_lca_internal(n1, n2)); >> ???? } >> >> ???? dom_lca_internal(I1522, I1854) = I1522 > > I assume it is 1511. > >> ???? find_non_split_ctrl(I1522) = R1784 >> >> If IDOM info is recomputed from scratch, IDOM(R1722) remains I1511. > > Can you explain more this point? Why result is different if it is from > scratch? PhaseIdealLoop::Dominators() doesn't adjust IDOM for Regions. So, initial IDOM values are and that's the same dom_lca_internal() computes for them: IDOM(R1710) = IDOM(R1716) = IDOM(R1722) = I1511 Then IdealLoopTree::counted_loop() strip mines some of the loops and it causes a change in R1722 which causes recomputation of IDOM using dom_lca() which does normalize the IDOM. If IDOM is rebuilt from scratch at this point, initial IDOM will stay the same (because no strip mining takes place): IDOM(R1710) = IDOM(R1716) = IDOM(R1722) = I1511 And that's the other way to fix the crash: initiate new PhaseIdealLoop iteration right away if any strip mined loops are introduced. But it looks more like a workaround and I decided to go with the fix in PhaseIdealLoop::spinup() because I don't see a reason why IDOM recomputation can't be triggered from other places. Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Mon Jan 14 22:36:48 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Jan 2019 14:36:48 -0800 Subject: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A9A14A45C@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A9A148B9C@ORSMSX106.amr.corp.intel.com> <045def80-a536-ae87-7384-09b30e5a8d78@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A9A14A45C@ORSMSX106.amr.corp.intel.com> Message-ID: <14067057-ec56-8c07-8f79-d1a29c7e20b7@oracle.com> Hi Vivek, I do not understand changes in superword.cpp. muladds2i will never be packed in follow_def_uses() since you return 'false' for muladds2i in all cases when u1 != u2 (even when i1 == i2). Is it intentional? Thanks, Vladimir On 1/11/19 11:38 AM, Deshpande, Vivek R wrote: > Hi Tobias > > Thanks for reviewing the patch. > I have made the changes according to your suggestion. > In this webrev: http://cr.openjdk.java.net/~vdeshpande/8216050/webrev.01/ > I have fix for the crash reported in the 8216050. > > The lower cost is needed for generation of vpdpwssd instruction, by combining AddVI and MulAddVS2VI. > For other instructions pmaddwd and vpmaddwd, they get generated on platforms upto skylake with default cost. > > I have updated the bug also with the link to webrev. > > I have created a different bug JDK-8216580 for > 3) Fix generation of vector code by allowing adjacent LoadS nodes to be isomorphic when they have different control RangeCheck nodes > for a[i] and a[i+1] accesses in same MulAddS2I node > > Thank you. > Regards, > Vivek > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Friday, January 11, 2019 4:49 AM > To: Deshpande, Vivek R ; hotspot-compiler-dev at openjdk.java.net compiler > Cc: Vladimir Kozlov ; Viswanathan, Sandhya ; Raj, Guru > Subject: Re: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index > > Hi Vivek, > > On 11.01.19 07:58, Deshpande, Vivek R wrote: >> 1) Fix for the crash by matching the operand by swapping to right positions. > > Looks good but the change to loopopts.cpp:530 screwed up the indentation around the ifs, please fix. > >> 2) Cost based generation of vpdpwssd instruction. > > Other instructions added by JDK-8214751 still miss a cost definition, for example: > http://hg.openjdk.java.net/jdk/jdk/rev/4bb6e0871bf7#l5.20 > >> 3) Fix generation of vector code by allowing adjacent LoadS nodes to >> be isomorphic when they have different control RangeCheck nodes >> ????for a[i] and a[i+1] accesses in same MulAddS2I node > > This is unrelated to the original bug, right? If so, this should be integrated with a separate RFE. > > Thanks, > Tobias > From vladimir.kozlov at oracle.com Mon Jan 14 23:25:40 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Jan 2019 15:25:40 -0800 Subject: [12] RFR (S): 8215757: C2: PhaseIdealLoop::spinup() computes wrong post-dominating point In-Reply-To: <6ebc3b71-bc58-0a14-20c9-aaac3a705f91@oracle.com> References: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> <874lafzfiy.fsf@redhat.com> <09abefa3-37b4-5657-22f9-e06f144a1867@oracle.com> <21fc0b3a-87d5-0c31-0d63-75eca3c05e5b@oracle.com> <14b4d2c8-0cf4-4c49-309b-1838da8536a0@oracle.com> <6ebc3b71-bc58-0a14-20c9-aaac3a705f91@oracle.com> Message-ID: <5e4f7d3b-6d7d-3017-2926-13e932820205@oracle.com> On 1/14/19 1:55 PM, Vladimir Ivanov wrote: > >>>> R1722 is changed during strip mining transformation and its IDOM is recomputed (I1511 => R1784). >>> >>> If IDOM info is recomputed from scratch, IDOM(R1722) remains I1511. >> >> Can you explain more this point? Why result is different if it is from scratch? > > PhaseIdealLoop::Dominators() doesn't adjust IDOM for Regions. So, initial IDOM values are and that's the same > dom_lca_internal() computes for them: > ? IDOM(R1710) = IDOM(R1716) = IDOM(R1722) = I1511 > > Then IdealLoopTree::counted_loop() strip mines some of the loops and it causes a change in R1722 which causes > recomputation of IDOM using dom_lca() which does normalize the IDOM. > > If IDOM is rebuilt from scratch at this point, initial IDOM will stay the same (because no strip mining takes place): > ? IDOM(R1710) = IDOM(R1716) = IDOM(R1722) = I1511 > > > And that's the other way to fix the crash: initiate new PhaseIdealLoop iteration right away if any strip mined loops are > introduced. Got it. So the issue is that strip mining invalidated IDOM information generated at the beginning of PhaseIdealLoop::build_and_optimize(). > > But it looks more like a workaround and I decided to go with the fix in PhaseIdealLoop::spinup() because I don't see a > reason why IDOM recomputation can't be triggered from other places. I am not sure your changes help to all cases. It may indeed helps to split_if optimization but dominator information is used before it too. I see Shenandoah's optimize_loops() uses information before split_if. Can we correctly recalculate IDOM after counted_loop() if strip mining loop was inserted? My be we can simplify strip mining code if we know that IDOM will be recalculated. Would be nice to hear Roland's opinion too. On other hand I think your point fix is good for JDK 12. May be do what I suggest in JDK 13 later if it is too complex. Thanks, Vladimir > > Best regards, > Vladimir Ivanov From vladimir.x.ivanov at oracle.com Mon Jan 14 23:49:15 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 14 Jan 2019 15:49:15 -0800 Subject: [12] RFR (S): 8215757: C2: PhaseIdealLoop::spinup() computes wrong post-dominating point In-Reply-To: <5e4f7d3b-6d7d-3017-2926-13e932820205@oracle.com> References: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> <874lafzfiy.fsf@redhat.com> <09abefa3-37b4-5657-22f9-e06f144a1867@oracle.com> <21fc0b3a-87d5-0c31-0d63-75eca3c05e5b@oracle.com> <14b4d2c8-0cf4-4c49-309b-1838da8536a0@oracle.com> <6ebc3b71-bc58-0a14-20c9-aaac3a705f91@oracle.com> <5e4f7d3b-6d7d-3017-2926-13e932820205@oracle.com> Message-ID: <5eb3f544-7080-b6cc-6ebe-d9c87db19717@oracle.com> >> >> But it looks more like a workaround and I decided to go with the fix >> in PhaseIdealLoop::spinup() because I don't see a reason why IDOM >> recomputation can't be triggered from other places. > > I am not sure your changes help to all cases.? It may indeed helps to > split_if optimization but dominator information is used before it too. I > see Shenandoah's optimize_loops() uses information before split_if. I try to address only PhaseIdealLoop::spinup() case. There may be other bugs lurking in other places. > Can we correctly recalculate IDOM after counted_loop() if strip mining > loop was inserted? My be we can simplify strip mining code if we know > that IDOM will be recalculated. The simplest fix I can come up with (and most reliable IMO w.r.t. other possible bugs which aren't uncovered yet) is to set C->major_progress() if strip mining happened and return early to initiate the next round of PhaseIdealLoop and recompute IDOM info. In that case, transformations will see only IDOM computed by Dominators(), but it means repeated IDOM & loop info computations when strip mining happens. > Would be nice to hear Roland's opinion too. Yes, same here. As for me: * I find it ugly that Dominators() and dom_lca() aren't consistent; * I'm in favor of normalized info (dom_lca() variant) to be computed from the very beginning; * I still believe PhaseIdealLoop::spinup() has a bug which should be fixed (irrespective of whether IDOM is normalized or not); Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Tue Jan 15 00:08:37 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Jan 2019 16:08:37 -0800 Subject: [12] RFR (S): 8215757: C2: PhaseIdealLoop::spinup() computes wrong post-dominating point In-Reply-To: <5eb3f544-7080-b6cc-6ebe-d9c87db19717@oracle.com> References: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> <874lafzfiy.fsf@redhat.com> <09abefa3-37b4-5657-22f9-e06f144a1867@oracle.com> <21fc0b3a-87d5-0c31-0d63-75eca3c05e5b@oracle.com> <14b4d2c8-0cf4-4c49-309b-1838da8536a0@oracle.com> <6ebc3b71-bc58-0a14-20c9-aaac3a705f91@oracle.com> <5e4f7d3b-6d7d-3017-2926-13e932820205@oracle.com> <5eb3f544-7080-b6cc-6ebe-d9c87db19717@oracle.com> Message-ID: <4748e941-763b-a38e-8686-26a0e59da581@oracle.com> On 1/14/19 3:49 PM, Vladimir Ivanov wrote: > >>> >>> But it looks more like a workaround and I decided to go with the fix in PhaseIdealLoop::spinup() because I don't see >>> a reason why IDOM recomputation can't be triggered from other places. >> >> I am not sure your changes help to all cases.? It may indeed helps to split_if optimization but dominator information >> is used before it too. I see Shenandoah's optimize_loops() uses information before split_if. > > I try to address only PhaseIdealLoop::spinup() case. There may be other bugs lurking in other places. > >> Can we correctly recalculate IDOM after counted_loop() if strip mining loop was inserted? My be we can simplify strip >> mining code if we know that IDOM will be recalculated. > > The simplest fix I can come up with (and most reliable IMO w.r.t. other possible bugs which aren't uncovered yet) is to > set C->major_progress() if strip mining happened and return early to initiate the next round of PhaseIdealLoop and > recompute IDOM info. In that case, transformations will see only IDOM computed by Dominators(), but it means repeated > IDOM & loop info computations when strip mining happens. Yes. It is safest/conservative solution. The only issue is that we have several targeted individual calls to PhaseIdealLoop before we go into optimize_loops() which calls PhaseIdealLoop in loop. So initial PhaseIdealLoop call sequence could be altered if we bailout too soon. > >> Would be nice to hear Roland's opinion too. > > Yes, same here. > > As for me: > > ?* I find it ugly that Dominators() and dom_lca() aren't consistent; Agree. Should be fixed (but not urgent for 12). > > ?* I'm in favor of normalized info (dom_lca() variant) to be computed from the very beginning; File RFE. > > ?* I still believe PhaseIdealLoop::spinup() has a bug which should be fixed (irrespective of whether IDOM is normalized > or not); I agree with that too. Again, I agree with your fix for jdk 12. Lets clean up this mess after that. Thanks, Vladimir > > Best regards, > Vladimir Ivanov From vivek.r.deshpande at intel.com Tue Jan 15 00:17:05 2019 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Tue, 15 Jan 2019 00:17:05 +0000 Subject: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index In-Reply-To: <14067057-ec56-8c07-8f79-d1a29c7e20b7@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A9A148B9C@ORSMSX106.amr.corp.intel.com> <045def80-a536-ae87-7384-09b30e5a8d78@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A9A14A45C@ORSMSX106.amr.corp.intel.com> <14067057-ec56-8c07-8f79-d1a29c7e20b7@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A9A14CD5E@ORSMSX106.amr.corp.intel.com> Hi Vladimir Thanks for looking at the patch. The MulAddS2I node gets packed in follow_use_defs() with this approach in which we just perform swaps in follow_def_uses and return false. This way MulAddS2I nodes gets the right alignment of multiple of 4 from its outs. If we return true after the swaps in follow_def_uses(), it gets alignment as multiple of 2(from LoadS) for packing, instead of multiple of 4. Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Monday, January 14, 2019 2:37 PM To: Deshpande, Vivek R ; Tobias Hartmann ; hotspot-compiler-dev at openjdk.java.net compiler Cc: Raj, Guru Subject: Re: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index Hi Vivek, I do not understand changes in superword.cpp. muladds2i will never be packed in follow_def_uses() since you return 'false' for muladds2i in all cases when u1 != u2 (even when i1 == i2). Is it intentional? Thanks, Vladimir On 1/11/19 11:38 AM, Deshpande, Vivek R wrote: > Hi Tobias > > Thanks for reviewing the patch. > I have made the changes according to your suggestion. > In this webrev: > http://cr.openjdk.java.net/~vdeshpande/8216050/webrev.01/ > I have fix for the crash reported in the 8216050. > > The lower cost is needed for generation of vpdpwssd instruction, by combining AddVI and MulAddVS2VI. > For other instructions pmaddwd and vpmaddwd, they get generated on platforms upto skylake with default cost. > > I have updated the bug also with the link to webrev. > > I have created a different bug JDK-8216580 for > 3) Fix generation of vector code by allowing adjacent LoadS nodes to be isomorphic when they have different control RangeCheck nodes > for a[i] and a[i+1] accesses in same MulAddS2I node > > Thank you. > Regards, > Vivek > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Friday, January 11, 2019 4:49 AM > To: Deshpande, Vivek R ; > hotspot-compiler-dev at openjdk.java.net compiler > > Cc: Vladimir Kozlov ; Viswanathan, Sandhya > ; Raj, Guru > Subject: Re: RFR(S):8216050:X86: Fix for Superword optimization fails > with assert(0 <= i && i < _len) failed: illegal index > > Hi Vivek, > > On 11.01.19 07:58, Deshpande, Vivek R wrote: >> 1) Fix for the crash by matching the operand by swapping to right positions. > > Looks good but the change to loopopts.cpp:530 screwed up the indentation around the ifs, please fix. > >> 2) Cost based generation of vpdpwssd instruction. > > Other instructions added by JDK-8214751 still miss a cost definition, for example: > http://hg.openjdk.java.net/jdk/jdk/rev/4bb6e0871bf7#l5.20 > >> 3) Fix generation of vector code by allowing adjacent LoadS nodes to >> be isomorphic when they have different control RangeCheck nodes >> ????for a[i] and a[i+1] accesses in same MulAddS2I node > > This is unrelated to the original bug, right? If so, this should be integrated with a separate RFE. > > Thanks, > Tobias > From vladimir.kozlov at oracle.com Tue Jan 15 00:26:15 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 14 Jan 2019 16:26:15 -0800 Subject: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A9A14CD5E@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A9A148B9C@ORSMSX106.amr.corp.intel.com> <045def80-a536-ae87-7384-09b30e5a8d78@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A9A14A45C@ORSMSX106.amr.corp.intel.com> <14067057-ec56-8c07-8f79-d1a29c7e20b7@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A9A14CD5E@ORSMSX106.amr.corp.intel.com> Message-ID: <2a9920a1-0ec3-87cf-2c71-7bdcfcb796be@oracle.com> On 1/14/19 4:17 PM, Deshpande, Vivek R wrote: > Hi Vladimir > > Thanks for looking at the patch. > The MulAddS2I node gets packed in follow_use_defs() with this approach in which we just perform swaps in follow_def_uses and return false. Got it. I confused follow_use_defs() with follow_def_uses(). Changes are good. Vladimir > This way MulAddS2I nodes gets the right alignment of multiple of 4 from its outs. > If we return true after the swaps in follow_def_uses(), it gets alignment as multiple of 2(from LoadS) for packing, instead of multiple of 4. > > Regards, > Vivek > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Monday, January 14, 2019 2:37 PM > To: Deshpande, Vivek R ; Tobias Hartmann ; hotspot-compiler-dev at openjdk.java.net compiler > Cc: Raj, Guru > Subject: Re: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index > > Hi Vivek, > > I do not understand changes in superword.cpp. > > muladds2i will never be packed in follow_def_uses() since you return 'false' for muladds2i in all cases when u1 != u2 (even when i1 == i2). Is it intentional? > > Thanks, > Vladimir > > On 1/11/19 11:38 AM, Deshpande, Vivek R wrote: >> Hi Tobias >> >> Thanks for reviewing the patch. >> I have made the changes according to your suggestion. >> In this webrev: >> http://cr.openjdk.java.net/~vdeshpande/8216050/webrev.01/ >> I have fix for the crash reported in the 8216050. >> >> The lower cost is needed for generation of vpdpwssd instruction, by combining AddVI and MulAddVS2VI. >> For other instructions pmaddwd and vpmaddwd, they get generated on platforms upto skylake with default cost. >> >> I have updated the bug also with the link to webrev. >> >> I have created a different bug JDK-8216580 for >> 3) Fix generation of vector code by allowing adjacent LoadS nodes to be isomorphic when they have different control RangeCheck nodes >> for a[i] and a[i+1] accesses in same MulAddS2I node >> >> Thank you. >> Regards, >> Vivek >> >> -----Original Message----- >> From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] >> Sent: Friday, January 11, 2019 4:49 AM >> To: Deshpande, Vivek R ; >> hotspot-compiler-dev at openjdk.java.net compiler >> >> Cc: Vladimir Kozlov ; Viswanathan, Sandhya >> ; Raj, Guru >> Subject: Re: RFR(S):8216050:X86: Fix for Superword optimization fails >> with assert(0 <= i && i < _len) failed: illegal index >> >> Hi Vivek, >> >> On 11.01.19 07:58, Deshpande, Vivek R wrote: >>> 1) Fix for the crash by matching the operand by swapping to right positions. >> >> Looks good but the change to loopopts.cpp:530 screwed up the indentation around the ifs, please fix. >> >>> 2) Cost based generation of vpdpwssd instruction. >> >> Other instructions added by JDK-8214751 still miss a cost definition, for example: >> http://hg.openjdk.java.net/jdk/jdk/rev/4bb6e0871bf7#l5.20 >> >>> 3) Fix generation of vector code by allowing adjacent LoadS nodes to >>> be isomorphic when they have different control RangeCheck nodes >>> ????for a[i] and a[i+1] accesses in same MulAddS2I node >> >> This is unrelated to the original bug, right? If so, this should be integrated with a separate RFE. >> >> Thanks, >> Tobias >> From tom.rodriguez at oracle.com Tue Jan 15 07:09:12 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 14 Jan 2019 23:09:12 -0800 Subject: [12] RFR(XS) 8215748: Application fails when executed with Graal Message-ID: <63a38944-2d0a-ef61-bea5-e709b4623692@oracle.com> http://cr.openjdk.java.net/~never/8215748/webrev https://bugs.openjdk.java.net/browse/JDK-8215748 If an interface method attempts to invoke an array clone method, JVMCI doesn't let you resolve the invoke properly which can result in performance problems or unexpected NullPointerExceptions. clone is publicly visible on arrays but is protected in Object. HotSpot doesn't have an actual Method* for the array clone operations, it just reuses Object.clone. This is accomplished with some trickery in the linkResolver.cpp that adjusts the visibility during resolution if an array class is involved. JVMCI only deals with concrete methods so when a call site is resolved you get back the real Object.clone. If you try to use resolveMethod on it then it will resolve it relative to Object instead of using the array type. This works ok when the accessing class is an class but for interface types it fails. In benign cases Graal just ends up falling back to a regular call which is slower than normal. In this case we were attempting to resolve an invoke for a profiled call site and got back null which shouldn't happen. The fix is the use the array class as the method type in this particular case which mirrors the logic in the linkResolver.cpp that adjusts the visibility check. Tested with Spark and the new unit test. mach5 testing is ongoing. From rwestrel at redhat.com Tue Jan 15 09:03:21 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 15 Jan 2019 10:03:21 +0100 Subject: RFR(S): 8217042: Shenandoah: write barrier on backedge of strip mined loop causes c2 crash at expansion time Message-ID: <874laaxpgm.fsf@redhat.com> http://cr.openjdk.java.net/~roland/8217042/webrev.00/ If a write barrier is in the body of the outer strip mined loop, expanding it causes loop strip mining verification code to fail. This is worked around by turning the strip mined loop nest into a regular counted loop nest so verification code doesn't trigger. The logic that takes care of that breaks when the write barrier is on the backedge of the strip mined loop because it is applied after the barrier is expanded. The fix I propose is to move that logic before barrier expansion. Roland. From rwestrel at redhat.com Tue Jan 15 09:19:20 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 15 Jan 2019 10:19:20 +0100 Subject: RFR(S): 8217043: Shenandoah: SIGSEGV in Type::meet_helper() at barrier expansion time Message-ID: <87y37mwa5j.fsf@redhat.com> http://cr.openjdk.java.net/~roland/8217043/webrev.00/ The ShenandoahBarrierNode::needs_barrier_impl() encounters a CallLeafNode (from a write barrier) and tries to get the type of n which is a tuple, not a pointer and this causes a null pointer dereference. The write barrier runtime call should anyway prevent an optimization of the barrier and to be on the safe side, any call should. Roland. From shade at redhat.com Tue Jan 15 09:26:44 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 15 Jan 2019 10:26:44 +0100 Subject: RFR(S): 8217042: Shenandoah: write barrier on backedge of strip mined loop causes c2 crash at expansion time In-Reply-To: <874laaxpgm.fsf@redhat.com> References: <874laaxpgm.fsf@redhat.com> Message-ID: <0b1a05ce-5cd6-4791-fad4-d70c7ac9be24@redhat.com> On 1/15/19 10:03 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8217042/webrev.00/ Cannot comment on the patch itself, it looks fine to my untrained eye. You might want to fix indents in two places before pushing, these should be inside the if-s? Here: 2666 if (loop->_head->is_OuterStripMinedLoop()) { 2667 // Expanding a barrier here will break loop strip mining 2668 // verification. Transform the loop so the loop nest doesn't 2669 // appear as strip mined. and here: 2680 if (loop->_head->is_OuterStripMinedLoop()) { 2681 // Expanding a barrier here will break loop strip mining 2682 // verification. Transform the loop so the loop nest doesn't 2683 // appear as strip mined. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From shade at redhat.com Tue Jan 15 10:01:40 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 15 Jan 2019 11:01:40 +0100 Subject: RFR(S): 8217043: Shenandoah: SIGSEGV in Type::meet_helper() at barrier expansion time In-Reply-To: <87y37mwa5j.fsf@redhat.com> References: <87y37mwa5j.fsf@redhat.com> Message-ID: On 1/15/19 10:19 AM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/8217043/webrev.00/ > > The ShenandoahBarrierNode::needs_barrier_impl() encounters a > CallLeafNode (from a write barrier) and tries to get the type of n which > is a tuple, not a pointer and this causes a null pointer > dereference. The write barrier runtime call should anyway prevent an > optimization of the barrier and to be on the safe side, any call should. Looks good to me. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From aph at redhat.com Tue Jan 15 10:14:54 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Jan 2019 10:14:54 +0000 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions In-Reply-To: References: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> Message-ID: On 1/13/19 5:10 PM, B. Blaser wrote: > On Thu, 10 Jan 2019 at 10:19, Andrew Haley wrote: >> >> On 1/9/19 12:13 PM, Roman Kennke wrote: >>> I cannot say if if this has performance implication. I suspect not. If >>> it has, it's probably miniscule improvement. I can't see how it could be >>> worse though. >> >> I can. x86 can have some very weird performance characteristics. It'd be >> helpful to do some measurement. > > I'm not sure we are really able to conclude anything from performance > measurement on highly implementation-dependent instructions unless we > make an average on a significant number of different x86_64 processors > which might well change with future generations... > > Shouldn't we follow a more pragmatic direction considering that less > instructions/registers and a better/smaller encoding is generally > preferable, as Roman suggested, which is the purpose of complex > instruction sets? I'm not sure that CISC has a purpose, as such. See the analysis of GCC performance in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309 : Quick summary: Conditional moves on Intel Core/Xeon and AMD Bulldozer architectures should probably be avoided "as a rule." History: Conditional moves were beneficial for the Intel Pentium 4, and also (but less-so) for AMD Athlon/Phenom chips. In the AMD Athlon/Phenom case the performance of cmov vs cmp+branch is determined more by the alignment of the target of the branch, than by the prediction rate of the branch. The instruction decoders would incur penalties on certain types of unaligned branch targets (when taken), or when decoding sequences of instructions that contained multiple branches within a 16byte "fetch" window (taken or not). cmov was sometimes handy for avoiding those. With regard to more current Intel Core and AMD Bulldozer/Bobcat architecture: I have found that use of conditional moves (cmov) is only beneficial if the branch that the move is replacing is badly mis-predicted. In my tests, the cmov only became clearly "optimal" when the branch was predicted correctly less than 92% of the time, which is abysmal by modern branch predictor standards and rarely occurs in practice. Above 97% prediction rates, cmov is typically slower than cmp+branch. Inside loops that contain branches with prediction rates approaching 100% (as is the case presented by the OP), cmov becomes a severe performance bottleneck. This holds true for both Core and Bulldozer. Bulldozer has less efficient branching than the i7, but is also severely bottlenecked by its limited fetch/decode. Cmov requires executing more total instructions, and that makes Bulldozer very unhappy. Note that my tests involved relatively simple loops that did not suffer from the added register pressure that cmov introduces. In practice, the prognosis for cmov being "optimal" is even worse than what I've observed in a controlled environment. Furthermore, to my knowledge the status of cmov vs. branch performance on x86 will not be changing anytime soon. cmov will continue to be a liability well into the next couple architecture releases from Intel and AMD. Piledriver will have added fetch/decode resources but should also have a smaller mispredict penalty, so its doubtful cmov will gain much advantages there either. Therefore I would recommend setting -fno-tree-loop-if-convert for all -march matching Intel Core and AMD Bulldozer/Bobcat families. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rkennke at redhat.com Tue Jan 15 10:17:19 2019 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 15 Jan 2019 11:17:19 +0100 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions In-Reply-To: References: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> Message-ID: >>>> I cannot say if if this has performance implication. I suspect not. If >>>> it has, it's probably miniscule improvement. I can't see how it could be >>>> worse though. >>> >>> I can. x86 can have some very weird performance characteristics. It'd be >>> helpful to do some measurement. >> >> I'm not sure we are really able to conclude anything from performance >> measurement on highly implementation-dependent instructions unless we >> make an average on a significant number of different x86_64 processors >> which might well change with future generations... >> >> Shouldn't we follow a more pragmatic direction considering that less >> instructions/registers and a better/smaller encoding is generally >> preferable, as Roman suggested, which is the purpose of complex >> instruction sets? > > I'm not sure that CISC has a purpose, as such. > > See the analysis of GCC performance in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309 : > > > Quick summary: Conditional moves on Intel Core/Xeon and AMD Bulldozer > architectures should probably be avoided "as a rule." > > History: Conditional moves were beneficial for the Intel Pentium 4, and also > (but less-so) for AMD Athlon/Phenom chips. In the AMD Athlon/Phenom case the > performance of cmov vs cmp+branch is determined more by the alignment of the > target of the branch, than by the prediction rate of the branch. The > instruction decoders would incur penalties on certain types of unaligned branch > targets (when taken), or when decoding sequences of instructions that contained > multiple branches within a 16byte "fetch" window (taken or not). cmov was > sometimes handy for avoiding those. > > With regard to more current Intel Core and AMD Bulldozer/Bobcat architecture: > > I have found that use of conditional moves (cmov) is only beneficial if the > branch that the move is replacing is badly mis-predicted. In my tests, the > cmov only became clearly "optimal" when the branch was predicted correctly less > than 92% of the time, which is abysmal by modern branch predictor standards and > rarely occurs in practice. Above 97% prediction rates, cmov is typically > slower than cmp+branch. Inside loops that contain branches with prediction > rates approaching 100% (as is the case presented by the OP), cmov becomes a > severe performance bottleneck. This holds true for both Core and Bulldozer. > Bulldozer has less efficient branching than the i7, but is also severely > bottlenecked by its limited fetch/decode. Cmov requires executing more total > instructions, and that makes Bulldozer very unhappy. > > Note that my tests involved relatively simple loops that did not suffer from > the added register pressure that cmov introduces. In practice, the prognosis > for cmov being "optimal" is even worse than what I've observed in a controlled > environment. Furthermore, to my knowledge the status of cmov vs. branch > performance on x86 will not be changing anytime soon. cmov will continue to be > a liability well into the next couple architecture releases from Intel and AMD. > Piledriver will have added fetch/decode resources but should also have a > smaller mispredict penalty, so its doubtful cmov will gain much advantages > there either. > > Therefore I would recommend setting -fno-tree-loop-if-convert for all -march > matching Intel Core and AMD Bulldozer/Bobcat families. > I agree with that. However, note that this is not about using cmov vs. branches. This is about generating a load followed by a cmov on the resulting register vs generating a cmov that also does the load and avoids the register. It's pretty much the same data-dependency-wise, except that it avoids using the extra register and encodes smaller. Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Tue Jan 15 10:32:15 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 15 Jan 2019 11:32:15 +0100 Subject: [12] RFR(XS) 8215748: Application fails when executed with Graal In-Reply-To: <63a38944-2d0a-ef61-bea5-e709b4623692@oracle.com> References: <63a38944-2d0a-ef61-bea5-e709b4623692@oracle.com> Message-ID: Hi Tom, this looks good to me. You might want to reference the related code in LinkResolver::check_method_accessability in your comment(no new webrev required). Best regards, Tobias On 15.01.19 08:09, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/8215748/webrev > https://bugs.openjdk.java.net/browse/JDK-8215748 > > If an interface method attempts to invoke an array clone method, JVMCI doesn't let you resolve the > invoke properly which can result in performance problems or unexpected NullPointerExceptions.? clone > is publicly visible on arrays but is protected in Object.? HotSpot doesn't have an actual Method* > for the array clone operations, it just reuses Object.clone.? This is accomplished with some > trickery in the linkResolver.cpp that adjusts the visibility during resolution if an array class is > involved.? JVMCI only deals with concrete methods so when a call site is resolved you get back the > real Object.clone.? If you try to use resolveMethod on it then it will resolve it relative to Object > instead of using the array type.? This works ok when the accessing class is an class but for > interface types it fails.? In benign cases Graal just ends up falling back to a regular call which > is slower than normal. ?In this case we were attempting to resolve an invoke for a profiled call > site and got back null which shouldn't happen.? The fix is the use the array class as the method > type in this particular case which mirrors the logic in the linkResolver.cpp that adjusts the > visibility check. Tested with Spark and the new unit test.? mach5 testing is ongoing. From aph at redhat.com Tue Jan 15 10:44:33 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 15 Jan 2019 10:44:33 +0000 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions In-Reply-To: References: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> Message-ID: On 1/15/19 10:17 AM, Roman Kennke wrote: > I agree with that. However, note that this is not about using cmov vs. > branches. This is about generating a load followed by a cmov on the > resulting register vs generating a cmov that also does the load and > avoids the register. It's pretty much the same data-dependency-wise, > except that it avoids using the extra register and encodes smaller. Sure, I get that. But, for the reasons given, CMOV is a rather dusty corner of the ISA. Intel themselves recommend not using it unless you know that the branch is always unpredictable. They say "Use the SETCC and CMOV instructions to eliminate unpredictable conditional branches where possible. Do not do this for predictable branches." It really couldn't be clearer. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Tue Jan 15 10:56:38 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 15 Jan 2019 11:56:38 +0100 Subject: RFR(XS):8216580:X86: Fix generation of VNNI vector code by allowing adjacent LoadS nodes to be isomorphic In-Reply-To: <53E8E64DB2403849AFD89B7D4DAC8B2A9A14A6DA@ORSMSX106.amr.corp.intel.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A9A14A6DA@ORSMSX106.amr.corp.intel.com> Message-ID: Hi Vivek, please add parentheses around the == comparison in lines 1225,1226. Otherwise this looks reasonable to me but I'm not too familiar with that code. Best regards, Tobias On 12.01.19 01:03, Deshpande, Vivek R wrote: > Hi Tobias > > The webrev for the bug JDK-821650 is here: > http://cr.openjdk.java.net/~vdeshpande/8216580/webrev.00/ > This fixes generation of vector code by allowing adjacent LoadS nodes to be isomorphic when they have different control RangeCheck nodes for a[i] and a[i+1] accesses in same MulAddS2I node. > Could you please review it. > > Regards, > Vivek > > -----Original Message----- > From: Deshpande, Vivek R > Sent: Friday, January 11, 2019 11:38 AM > To: 'Tobias Hartmann' ; hotspot-compiler-dev at openjdk.java.net compiler > Cc: Vladimir Kozlov ; Viswanathan, Sandhya ; Raj, Guru > Subject: RE: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index > > Hi Tobias > > Thanks for reviewing the patch. > I have made the changes according to your suggestion. > In this webrev: http://cr.openjdk.java.net/~vdeshpande/8216050/webrev.01/ > I have fix for the crash reported in the 8216050. > > The lower cost is needed for generation of vpdpwssd instruction, by combining AddVI and MulAddVS2VI. > For other instructions pmaddwd and vpmaddwd, they get generated on platforms upto skylake with default cost. > > I have updated the bug also with the link to webrev. > > I have created a different bug JDK-8216580 for > 3) Fix generation of vector code by allowing adjacent LoadS nodes to be isomorphic when they have different control RangeCheck nodes > for a[i] and a[i+1] accesses in same MulAddS2I node > > Thank you. > Regards, > Vivek > > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Friday, January 11, 2019 4:49 AM > To: Deshpande, Vivek R ; hotspot-compiler-dev at openjdk.java.net compiler > Cc: Vladimir Kozlov ; Viswanathan, Sandhya ; Raj, Guru > Subject: Re: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index > > Hi Vivek, > > On 11.01.19 07:58, Deshpande, Vivek R wrote: >> 1) Fix for the crash by matching the operand by swapping to right positions. > > Looks good but the change to loopopts.cpp:530 screwed up the indentation around the ifs, please fix. > >> 2) Cost based generation of vpdpwssd instruction. > > Other instructions added by JDK-8214751 still miss a cost definition, for example: > http://hg.openjdk.java.net/jdk/jdk/rev/4bb6e0871bf7#l5.20 > >> 3) Fix generation of vector code by allowing adjacent LoadS nodes to >> be isomorphic when they have different control RangeCheck nodes >> ????for a[i] and a[i+1] accesses in same MulAddS2I node > > This is unrelated to the original bug, right? If so, this should be integrated with a separate RFE. > > Thanks, > Tobias > From martin.doerr at sap.com Tue Jan 15 11:05:44 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 15 Jan 2019 11:05:44 +0000 Subject: RFR(S): 8216556: Unnecessary liveness computation with JVMTI In-Reply-To: References: <88842ba1a169406d9628ab06665bd787@sap.com> <9c7afb40-cc2b-9ae8-fb70-4ac3bacb72da@oracle.com> <3a600790198e4bbbb6f253daf0af8ff0@sap.com> Message-ID: <01121e2319ea44bf8aee088ffb32a617@sap.com> Hi Vladimir, Dean and Claes, thank you for reviewing. I assume the version which moves the implementation of should_retain_local_variables() to the hpp file (as suggested by Claes) is fine. I'll push this version if there are no objections. Best regards, Martin -----Original Message----- From: hotspot-compiler-dev On Behalf Of Doerr, Martin Sent: Montag, 14. Januar 2019 09:31 To: Claes Redestad ; 'hotspot-compiler-dev at openjdk.java.net' Subject: [CAUTION] RE: RFR(S): 8216556: Unnecessary liveness computation with JVMTI Hi Claes, excellent proposal. Thanks. I had not noticed that it currently is in a cpp file. New webrev: http://cr.openjdk.java.net/~mdoerr/8216556_JVMTI_liveness/webrev.01/ What I still don't really like is that we're passing MethodLivenessResult objects on stack via 3 compilation units. But I don't know if it's worth refactoring the code. Best regards, Martin -----Original Message----- From: Claes Redestad Sent: Freitag, 11. Januar 2019 16:45 To: Doerr, Martin Subject: Re: RFR(S): 8216556: Unnecessary liveness computation with JVMTI Hi, just a random thought, but if you're optimizing this and got some measure where it matters(?), maybe you should also try inlining ciEnv::should_retain_local_variables(), i.e., move definition to ciEnv.hpp. If it doesn't bloat static binary size it seems like it won't hurt, at least. /Claes On 2019-01-11 13:55, Doerr, Martin wrote: > Hi, > > I'd like to contribute a small JIT improvement for JVMTI to avoid > calling raw_liveness_at_bci when its result is not needed. > > Bug with description: > > https://bugs.openjdk.java.net/browse/JDK-8216556 > > Webrev: > > http://cr.openjdk.java.net/~mdoerr/8216556_JVMTI_liveness/webrev.00/ > > Please review. > > Best regards, > > Martin > From rkennke at redhat.com Tue Jan 15 11:16:42 2019 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 15 Jan 2019 12:16:42 +0100 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions In-Reply-To: References: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> Message-ID: <81dc5407-4985-883c-83fc-2e1fa2b77e66@redhat.com> >> I agree with that. However, note that this is not about using cmov vs. >> branches. This is about generating a load followed by a cmov on the >> resulting register vs generating a cmov that also does the load and >> avoids the register. It's pretty much the same data-dependency-wise, >> except that it avoids using the extra register and encodes smaller. > > Sure, I get that. But, for the reasons given, CMOV is a rather dusty > corner of the ISA. Intel themselves recommend not using it unless you > know that the branch is always unpredictable. They say "Use the SETCC > and CMOV instructions to eliminate unpredictable conditional branches > where possible. Do not do this for predictable branches." It really > couldn't be clearer. Well yeah, but again, this patch isn't about generating cmov or not, it only changes that a cmov preceded by a load (mov) is generated as single instruction rather than two instructions for object loads, pretty much as it's done for all the other types. However, it's not very important to me, and probably anybody else, otherwise this wouldn't have been commented-out. I'd withdraw the patch unless somebody steps up and really wants it. Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From lutz.schmidt at sap.com Tue Jan 15 11:26:55 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 15 Jan 2019 11:26:55 +0000 Subject: RFR (M): 8216314: SIGILL in CodeHeapState::print_names() Message-ID: <98FFB93F-674D-4994-953F-B35572E316A2@sap.com> Dear all, may I please request reviews for this fix, hardening CodeHeap Analytics to not fail when used in high-load (stress) scenarios. There was quite a bit of preliminary discussion, all documented in the "Comments" section of the bug. Bug: https://bugs.openjdk.java.net/browse/JDK-8216314 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8216314.01/ Thank you! Lutz From tobias.hartmann at oracle.com Tue Jan 15 11:45:51 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 15 Jan 2019 12:45:51 +0100 Subject: RFR (M): 8216314: SIGILL in CodeHeapState::print_names() In-Reply-To: <98FFB93F-674D-4994-953F-B35572E316A2@sap.com> References: <98FFB93F-674D-4994-953F-B35572E316A2@sap.com> Message-ID: <1c39e69b-2830-0ced-bbb2-8b5003972695@oracle.com> Hi Lutz, thanks for the discussions and making these changes. The fix looks good to me. Minor style issue (no new webrev required) in codeHeapState.cpp:1289/1290/1305/1305/1306: Please add a newline after '{' (and before '}') or at least a whitespace. Best regards, Tobias On 15.01.19 12:26, Schmidt, Lutz wrote: > Dear all, > > may I please request reviews for this fix, hardening CodeHeap Analytics to not fail when used in high-load (stress) scenarios. There was quite a bit of preliminary discussion, all documented in the "Comments" section of the bug. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8216314 > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8216314.01/ > > Thank you! > Lutz > > From lutz.schmidt at sap.com Tue Jan 15 12:57:25 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 15 Jan 2019 12:57:25 +0000 Subject: RFR (M): 8216314: SIGILL in CodeHeapState::print_names() In-Reply-To: <1c39e69b-2830-0ced-bbb2-8b5003972695@oracle.com> References: <98FFB93F-674D-4994-953F-B35572E316A2@sap.com> <1c39e69b-2830-0ced-bbb2-8b5003972695@oracle.com> Message-ID: Thanks for the review, Tobias! The discussions were very helpful to zero in on a good solution. The single-line if statement are now three-liners. Regards, Lutz ?On 15.01.19, 12:45, "Tobias Hartmann" wrote: Hi Lutz, thanks for the discussions and making these changes. The fix looks good to me. Minor style issue (no new webrev required) in codeHeapState.cpp:1289/1290/1305/1305/1306: Please add a newline after '{' (and before '}') or at least a whitespace. Best regards, Tobias On 15.01.19 12:26, Schmidt, Lutz wrote: > Dear all, > > may I please request reviews for this fix, hardening CodeHeap Analytics to not fail when used in high-load (stress) scenarios. There was quite a bit of preliminary discussion, all documented in the "Comments" section of the bug. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8216314 > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8216314.01/ > > Thank you! > Lutz > > From rwestrel at redhat.com Tue Jan 15 13:38:11 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 15 Jan 2019 14:38:11 +0100 Subject: [12] RFR (S): 8215757: C2: PhaseIdealLoop::spinup() computes wrong post-dominating point In-Reply-To: <5e4f7d3b-6d7d-3017-2926-13e932820205@oracle.com> References: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> <874lafzfiy.fsf@redhat.com> <09abefa3-37b4-5657-22f9-e06f144a1867@oracle.com> <21fc0b3a-87d5-0c31-0d63-75eca3c05e5b@oracle.com> <14b4d2c8-0cf4-4c49-309b-1838da8536a0@oracle.com> <6ebc3b71-bc58-0a14-20c9-aaac3a705f91@oracle.com> <5e4f7d3b-6d7d-3017-2926-13e932820205@oracle.com> Message-ID: <87va2qvy64.fsf@redhat.com> Hi Vladimir K & Vladimir I, >> And that's the other way to fix the crash: initiate new PhaseIdealLoop iteration right away if any strip mined loops are >> introduced. > > Got it. So the issue is that strip mining invalidated IDOM information generated at the beginning of > PhaseIdealLoop::build_and_optimize(). I don't think that's accurate. The idom is changed when a loop limit check is inserted (so that's unrelated to strip mining AFAICT). As Vladimir said, when the loop limit check is inserted, the idom of the region is fixed by: Node* nrdom = dom_lca(ridom, new_iff); set_idom(rgn, nrdom, dom_depth(rgn)); which does: Node *dom_lca( Node *n1, Node *n2 ) const { return find_non_split_ctrl(dom_lca_internal(n1, n2)); } and because of the find_non_split_ctrl(), the idom is set to a region rather than an if. That's broken and I'm confused as to why a straightforward change of the logic above: diff --git a/src/hotspot/share/opto/loopPredicate.cpp b/src/hotspot/share/opto/loopPredicate.cpp --- a/src/hotspot/share/opto/loopPredicate.cpp +++ b/src/hotspot/share/opto/loopPredicate.cpp @@ -160,7 +160,7 @@ // When called from beautify_loops() idom is not constructed yet. if (_idom != NULL) { Node* ridom = idom(rgn); - Node* nrdom = dom_lca(ridom, new_iff); + Node* nrdom = dom_lca_internal(ridom, new_iff); set_idom(rgn, nrdom, dom_depth(rgn)); } is not good enough. Roland. From nils.eliasson at oracle.com Tue Jan 15 13:38:11 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 15 Jan 2019 14:38:11 +0100 Subject: RFR(S): 8210392: assert(Compile::current()->live_nodes() < Compile::current()->max_node_limit()) failed: Live Node limit exceeded limit In-Reply-To: References: <28011331-bd43-2c32-dba4-e41879ffe28a@oracle.com> <99f3f410-7200-5fb1-fccd-c39e35c20288@oracle.com> <2f4e12e4-459d-b96b-6cf2-50d6dba098d9@oracle.com> <0aece297-3929-7db5-7054-190163fe65fd@oracle.com> Message-ID: <7dd1865d-d29a-26cb-46cf-a818c0b0f305@oracle.com> +1 Looks good! // Nils On 2019-01-14 17:52, Tobias Hartmann wrote: > Hi Patric, > > thanks for adding the test. This looks good to me. > > Best regards, > Tobias > > > On 14.01.19 17:47, Patric Hedlin wrote: >> Thanks for reviewing Tobias, >> >> On 12/18/18 1:37 PM, Tobias Hartmann wrote: >>> Hi Patric, >>> >>> were you able to reproduce this with a test (I see that one is attached to the bug)? If so, please >>> add it to the webrev. Please also remove the extra newlines (for example, in line 1146). >>> >>> The comment in line 1027 says "Use same limit as split_if_with_blocks_post". I think this is >>> outdated right? >> Updated webrev with test-case. >> >> Fixed #?%#. >> >> Best regards, >> Patric >> >>> Best regards, >>> Tobias >>> >>> On 18.12.18 12:48, Patric Hedlin wrote: >>>> Dear all, >>>> >>>> I would like to ask for help to review the following change/update: >>>> >>>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8210392 >>>> >>>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8210392/ >>>> >>>> >>>> 8210392: assert(Compile::current()->live_nodes() < Compile::current()->max_node_limit()) failed: >>>> Live Node limit exceeded limit >>>> >>>> ???? Avoid excessive split-if through a crude throttling approach. >>>> >>>> >>>> Testing: hs-tier1-4, hs-precheckin-comp >>>> >>>> >>>> Best regards, >>>> Patric From magnus.ihse.bursie at oracle.com Tue Jan 15 14:05:23 2019 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Tue, 15 Jan 2019 15:05:23 +0100 Subject: RFR(M): 8215902: Add support for SoftFloat-3e library In-Reply-To: <4497ca084b9f48dbb8f6de1aa35c83653fd7acfb.camel@gmail.com> References: <4497ca084b9f48dbb8f6de1aa35c83653fd7acfb.camel@gmail.com> Message-ID: <7f69fc73-1c10-6b68-d657-c9e758d4bf1d@oracle.com> On 2018-12-25 16:19, Jakub Van?k wrote: > Hi, > > please review this webrev. It is a successor of the softfloat-3 [patch] > thread (first email > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2018-November/031311.html > ) > > Changes since the last patch (v6): > > - renamed --with-softloat* to --with-sflt* (it is more compact and it > corresponds to the old --with-sflt-lib=... option) > > - license is now obtained via --with-sflt-license switch (so it is not > included in OpenJDK source tree) > > - updated documentation (slight rewording, added the license option) > > - checks for default --with/--without behavior are in place again > (I forgot them when I changed the way the library is detected) > > - added a simple testcase - I found a disrepancy between softfloat and > system function behavior. When a float with bits 0x003FFFFF is > added to 0x00000001, the correct result is 0x00400000, but the > default software floating point implementation returns 0x00000000. > However I'm not sure where to put this test - now it is in > test/hotspot/jtreg/compiler/floatingpoint. > > - comments in code refer to CR 6757269 and newly JDK-8215902 too. > > I have created a repository with SoftFloat-3e with build configuration > specifically for OpenJDK on armel: > https://github.com/ev3dev-lang-java/softfloat-openjdk > > I can add a link to it to the documentation. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8215902 > Webrev: http://cr.openjdk.java.net/~jakvanek/8215902/webrev.02/ Hi Jakub, In general this looks good. Some comments: I agree with Erik that you can add a link to your github project; compiling SoftFloat is outside the scope of the OpenJDK build instructions, but it can sure be helpful to lower the bar for users wanting to do that. Just one question: any particular reason you didn't create your github repo by forking the official https://github.com/ucb-bar/berkeley-softfloat-3? That way, it would have been easy for users to see that you were not adding any malicious or suspicious code to the original SoftFloat distribution. On the other hand, I think the link to http://mail.openjdk.java.net/pipermail/aarch32-port-dev/2016-November/000611.html is unnecessary and just creates clutter in the documentation. Please remove it. /Magnus > CI build: https://ci.adoptopenjdk.net/view/ev3dev/job/openjdk12_build_ev3_linux/67/ > > Cheers, > > Jakub > From linuxtardis at gmail.com Tue Jan 15 16:31:52 2019 From: linuxtardis at gmail.com (Jakub =?UTF-8?Q?Van=C4=9Bk?=) Date: Tue, 15 Jan 2019 17:31:52 +0100 Subject: RFR(M)(round 2): 8215902: Add support for SoftFloat-3e library In-Reply-To: <7f69fc73-1c10-6b68-d657-c9e758d4bf1d@oracle.com> References: <4497ca084b9f48dbb8f6de1aa35c83653fd7acfb.camel@gmail.com> <7f69fc73-1c10-6b68-d657-c9e758d4bf1d@oracle.com> Message-ID: Hi Magnus and Erik, I have added the link to the repository to README and I have removed the link to the mailing list thread. I have also recreated the GitHub repository. Now it is a fork of the mentioned repository with two extra commits containing README and the build scripts. New webrev URL: http://cr.openjdk.java.net/~jakvanek/8215902/webrev.04/ Bug: https://bugs.openjdk.java.net/browse/JDK-8215902 Regards, Jakub On 2019-01-15 at 15:05 +0100, Magnus Ihse Bursie wrote: > On 2018-12-25 16:19, Jakub Van?k wrote: > > Hi, > > > > please review this webrev. It is a successor of the softfloat-3 > > [patch] > > thread (first email > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2018-November/031311.html > > ) > > > > Changes since the last patch (v6): > > > > - renamed --with-softloat* to --with-sflt* (it is more compact and > > it > > corresponds to the old --with-sflt-lib=... option) > > > > - license is now obtained via --with-sflt-license switch (so it is > > not > > included in OpenJDK source tree) > > > > - updated documentation (slight rewording, added the license > > option) > > > > - checks for default --with/--without behavior are in place again > > (I forgot them when I changed the way the library is detected) > > > > - added a simple testcase - I found a disrepancy between softfloat > > and > > system function behavior. When a float with bits 0x003FFFFF is > > added to 0x00000001, the correct result is 0x00400000, but the > > default software floating point implementation returns > > 0x00000000. > > However I'm not sure where to put this test - now it is in > > test/hotspot/jtreg/compiler/floatingpoint. > > > > - comments in code refer to CR 6757269 and newly JDK-8215902 too. > > > > I have created a repository with SoftFloat-3e with build > > configuration > > specifically for OpenJDK on armel: > > https://github.com/ev3dev-lang-java/softfloat-openjdk > > > > I can add a link to it to the documentation. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8215902 > > Webrev: http://cr.openjdk.java.net/~jakvanek/8215902/webrev.02/ > > Hi Jakub, > > In general this looks good. > > Some comments: > > I agree with Erik that you can add a link to your github project; > compiling SoftFloat is outside the scope of the OpenJDK build > instructions, but it can sure be helpful to lower the bar for users > wanting to do that. Just one question: any particular reason you > didn't > create your github repo by forking the official > https://github.com/ucb-bar/berkeley-softfloat-3? That way, it would > have > been easy for users to see that you were not adding any malicious or > suspicious code to the original SoftFloat distribution. > > On the other hand, I think the link to > http://mail.openjdk.java.net/pipermail/aarch32-port-dev/2016-November/000611.html > > is unnecessary and just creates clutter in the documentation. Please > remove it. > > /Magnus > > CI build: > > https://ci.adoptopenjdk.net/view/ev3dev/job/openjdk12_build_ev3_linux/67/ > > > > Cheers, > > > > Jakub > > > > From john.r.rose at oracle.com Tue Jan 15 16:43:56 2019 From: john.r.rose at oracle.com (John Rose) Date: Tue, 15 Jan 2019 08:43:56 -0800 Subject: RFR(XS): 8216549: Mismatched unsafe access to non escaping object fails In-Reply-To: <877efbzh8a.fsf@redhat.com> References: <877efbzh8a.fsf@redhat.com> Message-ID: <35639127-2BC2-4005-BDDE-8324C366F0E3@oracle.com> On Jan 11, 2019, at 1:16 AM, Roland Westrelin wrote: > > I simply propose to make non escaping allocations with mismatched > accesses to be non scalar replaceable. That's a good fix for now. At some point, we may want to make the JIT more lenient in Valhalla, at least for value type buffers[1]. The reason is that there are legitimate reasons to process a small value type of multiple small fields in terms of a larger primitive type. The reason I'm thinking of is vectorizing operations like comparison and hash-code on the small value type. When we get Java-level support for vectors (Panama Vector API) some value type operations can be handled in an operation or two on a single vector. Example: For `__ByValue class IntPair { int x, y; }`, the comparison operator can perhaps be optimized (by a platform-specific binder) as a `long` comparison, or an MMX comparison. It's not a concern yet, but here's a little bookmark FTR, showing Mandy's work on buffers? ? John [1]: http://cr.openjdk.java.net/~mchung/valhalla/webrevs/unsafe/private-buffer.00/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.veresov at oracle.com Tue Jan 15 16:59:07 2019 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 15 Jan 2019 08:59:07 -0800 Subject: [12] RFR(S) 8196568: [Graal] LongMulOverflowTest.java fails with "runTestOverflow() did not overflow" Message-ID: <4FBF36AA-ACF6-4FFA-899B-D9A1E91EA828@oracle.com> Exact math instrinsics in Graal are failing TCK. Since we?re out of time for 12, in order to make Graal compliant, I?d like to turn off emission of the floating exact math nodes. This fix is only for 12, and is not going upstream. For 13 I?ll work on a proper fix. Webrev: http://cr.openjdk.java.net/~iveresov/8196568/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8196568 Thanks, Igor From vladimir.kozlov at oracle.com Tue Jan 15 17:07:34 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 15 Jan 2019 09:07:34 -0800 Subject: [12] RFR(S) 8196568: [Graal] LongMulOverflowTest.java fails with "runTestOverflow() did not overflow" In-Reply-To: <4FBF36AA-ACF6-4FFA-899B-D9A1E91EA828@oracle.com> References: <4FBF36AA-ACF6-4FFA-899B-D9A1E91EA828@oracle.com> Message-ID: <4b81189c-4e40-7b9f-60f2-2b9b0f731230@oracle.com> Looks good. Thanks, Vladimir On 1/15/19 8:59 AM, Igor Veresov wrote: > Exact math instrinsics in Graal are failing TCK. Since we?re out of time for 12, in order to make Graal compliant, I?d like to turn off emission of the floating exact math nodes. This fix is only for 12, and is not going upstream. For 13 I?ll work on a proper fix. > > Webrev: http://cr.openjdk.java.net/~iveresov/8196568/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8196568 > > > Thanks, > Igor > From dean.long at oracle.com Tue Jan 15 17:10:04 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 15 Jan 2019 09:10:04 -0800 Subject: [12] RFR(S) 8196568: [Graal] LongMulOverflowTest.java fails with "runTestOverflow() did not overflow" In-Reply-To: <4b81189c-4e40-7b9f-60f2-2b9b0f731230@oracle.com> References: <4FBF36AA-ACF6-4FFA-899B-D9A1E91EA828@oracle.com> <4b81189c-4e40-7b9f-60f2-2b9b0f731230@oracle.com> Message-ID: <7a228f09-a32d-3c46-efb5-f0d3ad144281@oracle.com> +1 dl On 1/15/19 9:07 AM, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 1/15/19 8:59 AM, Igor Veresov wrote: >> Exact math instrinsics in Graal are failing TCK. Since we?re out of >> time for 12, in order to make Graal compliant, I?d like to turn off >> emission of the floating exact math nodes. This fix is only for 12, >> and is not going upstream. For 13 I?ll work on a proper fix. >> >> Webrev: http://cr.openjdk.java.net/~iveresov/8196568/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8196568 >> >> >> Thanks, >> Igor >> From tom.rodriguez at oracle.com Tue Jan 15 17:34:43 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 15 Jan 2019 09:34:43 -0800 Subject: [12] RFR(S) 8196568: [Graal] LongMulOverflowTest.java fails with "runTestOverflow() did not overflow" In-Reply-To: <4FBF36AA-ACF6-4FFA-899B-D9A1E91EA828@oracle.com> References: <4FBF36AA-ACF6-4FFA-899B-D9A1E91EA828@oracle.com> Message-ID: Looks good. tom Igor Veresov wrote on 1/15/19 8:59 AM: > Exact math instrinsics in Graal are failing TCK. Since we?re out of time for 12, in order to make Graal compliant, I?d like to turn off emission of the floating exact math nodes. This fix is only for 12, and is not going upstream. For 13 I?ll work on a proper fix. > > Webrev: http://cr.openjdk.java.net/~iveresov/8196568/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8196568 > > > Thanks, > Igor > From vladimir.x.ivanov at oracle.com Tue Jan 15 17:46:02 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 15 Jan 2019 09:46:02 -0800 Subject: [12] RFR (S): 8215757: C2: PhaseIdealLoop::spinup() computes wrong post-dominating point In-Reply-To: <87va2qvy64.fsf@redhat.com> References: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> <874lafzfiy.fsf@redhat.com> <09abefa3-37b4-5657-22f9-e06f144a1867@oracle.com> <21fc0b3a-87d5-0c31-0d63-75eca3c05e5b@oracle.com> <14b4d2c8-0cf4-4c49-309b-1838da8536a0@oracle.com> <6ebc3b71-bc58-0a14-20c9-aaac3a705f91@oracle.com> <5e4f7d3b-6d7d-3017-2926-13e932820205@oracle.com> <87va2qvy64.fsf@redhat.com> Message-ID: On 15/01/2019 05:38, Roland Westrelin wrote: > > Hi Vladimir K & Vladimir I, > >>> And that's the other way to fix the crash: initiate new PhaseIdealLoop iteration right away if any strip mined loops are >>> introduced. >> >> Got it. So the issue is that strip mining invalidated IDOM information generated at the beginning of >> PhaseIdealLoop::build_and_optimize(). > > I don't think that's accurate. The idom is changed when a loop limit > check is inserted (so that's unrelated to strip mining AFAICT). As > Vladimir said, when the loop limit check is inserted, the idom of the > region is fixed by: > > Node* nrdom = dom_lca(ridom, new_iff); > set_idom(rgn, nrdom, dom_depth(rgn)); > > which does: > > Node *dom_lca( Node *n1, Node *n2 ) const { > return find_non_split_ctrl(dom_lca_internal(n1, n2)); > } > > and because of the find_non_split_ctrl(), the idom is set to a region > rather than an if. > > That's broken and I'm confused as to why a straightforward change of the > logic above: > > diff --git a/src/hotspot/share/opto/loopPredicate.cpp b/src/hotspot/share/opto/loopPredicate.cpp > --- a/src/hotspot/share/opto/loopPredicate.cpp > +++ b/src/hotspot/share/opto/loopPredicate.cpp > @@ -160,7 +160,7 @@ > // When called from beautify_loops() idom is not constructed yet. > if (_idom != NULL) { > Node* ridom = idom(rgn); > - Node* nrdom = dom_lca(ridom, new_iff); > + Node* nrdom = dom_lca_internal(ridom, new_iff); > set_idom(rgn, nrdom, dom_depth(rgn)); > } > > is not good enough. Fair point. So you're saying that dom_lca()/find_non_split_ctrl() should never be used to set IDOM, right? And all the places which require non-split point for an IDOM should explicitly normalize it? I checked the codebase and all places, where dom_lca()/find_non_split_ctrl() are used, IDOM is left intact except (PhaseIdealLoop::create_new_if_for_predicate). So, I'm fine with the fix you propose (though I'm still not happy about the distinction between IDOM & dom_lca()/find_non_split_ctrl()). Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Tue Jan 15 17:56:09 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 15 Jan 2019 09:56:09 -0800 Subject: RFR (M): 8216314: SIGILL in CodeHeapState::print_names() In-Reply-To: <1c39e69b-2830-0ced-bbb2-8b5003972695@oracle.com> References: <98FFB93F-674D-4994-953F-B35572E316A2@sap.com> <1c39e69b-2830-0ced-bbb2-8b5003972695@oracle.com> Message-ID: <3719e6a2-54a5-283d-ee2b-fedb0c0110a2@oracle.com> +1. Looks good. Thanks, Vladimir On 1/15/19 3:45 AM, Tobias Hartmann wrote: > Hi Lutz, > > thanks for the discussions and making these changes. The fix looks good to me. > > Minor style issue (no new webrev required) in codeHeapState.cpp:1289/1290/1305/1305/1306: Please add > a newline after '{' (and before '}') or at least a whitespace. > > Best regards, > Tobias > > On 15.01.19 12:26, Schmidt, Lutz wrote: >> Dear all, >> >> may I please request reviews for this fix, hardening CodeHeap Analytics to not fail when used in high-load (stress) scenarios. There was quite a bit of preliminary discussion, all documented in the "Comments" section of the bug. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8216314 >> Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8216314.01/ >> >> Thank you! >> Lutz >> >> From vladimir.kozlov at oracle.com Tue Jan 15 18:31:05 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 15 Jan 2019 10:31:05 -0800 Subject: [12] RFR(XS) 8215748: Application fails when executed with Graal In-Reply-To: <63a38944-2d0a-ef61-bea5-e709b4623692@oracle.com> References: <63a38944-2d0a-ef61-bea5-e709b4623692@oracle.com> Message-ID: <6b7700a3-c85a-1b9d-d314-4cd57c58c74e@oracle.com> Looks good. Thanks, Vladimir On 1/14/19 11:09 PM, Tom Rodriguez wrote: > http://cr.openjdk.java.net/~never/8215748/webrev > https://bugs.openjdk.java.net/browse/JDK-8215748 > > If an interface method attempts to invoke an array clone method, JVMCI doesn't let you resolve the > invoke properly which can result in performance problems or unexpected NullPointerExceptions.? clone > is publicly visible on arrays but is protected in Object.? HotSpot doesn't have an actual Method* > for the array clone operations, it just reuses Object.clone.? This is accomplished with some > trickery in the linkResolver.cpp that adjusts the visibility during resolution if an array class is > involved.? JVMCI only deals with concrete methods so when a call site is resolved you get back the > real Object.clone.? If you try to use resolveMethod on it then it will resolve it relative to Object > instead of using the array type.? This works ok when the accessing class is an class but for > interface types it fails.? In benign cases Graal just ends up falling back to a regular call which > is slower than normal. ?In this case we were attempting to resolve an invoke for a profiled call > site and got back null which shouldn't happen.? The fix is the use the array class as the method > type in this particular case which mirrors the logic in the linkResolver.cpp that adjusts the > visibility check. Tested with Spark and the new unit test.? mach5 testing is ongoing. From igor.veresov at oracle.com Tue Jan 15 18:37:53 2019 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 15 Jan 2019 10:37:53 -0800 Subject: [12] RFR(S) 8196568: [Graal] LongMulOverflowTest.java fails with "runTestOverflow() did not overflow" In-Reply-To: References: <4FBF36AA-ACF6-4FFA-899B-D9A1E91EA828@oracle.com> Message-ID: Vladimir, Dean, and Tom, Thanks for the reviews! Igor > On Jan 15, 2019, at 9:34 AM, Tom Rodriguez wrote: > > Looks good. > > tom > > Igor Veresov wrote on 1/15/19 8:59 AM: >> Exact math instrinsics in Graal are failing TCK. Since we?re out of time for 12, in order to make Graal compliant, I?d like to turn off emission of the floating exact math nodes. This fix is only for 12, and is not going upstream. For 13 I?ll work on a proper fix. >> Webrev: http://cr.openjdk.java.net/~iveresov/8196568/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8196568 >> Thanks, >> Igor From igor.veresov at oracle.com Tue Jan 15 18:43:27 2019 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 15 Jan 2019 10:43:27 -0800 Subject: [12] RFR(XS) 8215748: Application fails when executed with Graal In-Reply-To: <63a38944-2d0a-ef61-bea5-e709b4623692@oracle.com> References: <63a38944-2d0a-ef61-bea5-e709b4623692@oracle.com> Message-ID: <97D08F9B-06A9-40FE-8FCB-48D3F36C12DE@oracle.com> Looks good. Igor > On Jan 14, 2019, at 11:09 PM, Tom Rodriguez wrote: > > http://cr.openjdk.java.net/~never/8215748/webrev > https://bugs.openjdk.java.net/browse/JDK-8215748 > > If an interface method attempts to invoke an array clone method, JVMCI doesn't let you resolve the invoke properly which can result in performance problems or unexpected NullPointerExceptions. clone is publicly visible on arrays but is protected in Object. HotSpot doesn't have an actual Method* for the array clone operations, it just reuses Object.clone. This is accomplished with some trickery in the linkResolver.cpp that adjusts the visibility during resolution if an array class is involved. JVMCI only deals with concrete methods so when a call site is resolved you get back the real Object.clone. If you try to use resolveMethod on it then it will resolve it relative to Object instead of using the array type. This works ok when the accessing class is an class but for interface types it fails. In benign cases Graal just ends up falling back to a regular call which is slower than normal. In this case we were attempting to resolve an invoke for a profiled call site and got back null which shouldn't happen. The fix is the use the array class as the method type in this particular case which mirrors the logic in the linkResolver.cpp that adjusts the visibility check. Tested with Spark and the new unit test. mach5 testing is ongoing. From vivek.r.deshpande at intel.com Tue Jan 15 19:27:48 2019 From: vivek.r.deshpande at intel.com (Deshpande, Vivek R) Date: Tue, 15 Jan 2019 19:27:48 +0000 Subject: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index In-Reply-To: <2a9920a1-0ec3-87cf-2c71-7bdcfcb796be@oracle.com> References: <53E8E64DB2403849AFD89B7D4DAC8B2A9A148B9C@ORSMSX106.amr.corp.intel.com> <045def80-a536-ae87-7384-09b30e5a8d78@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A9A14A45C@ORSMSX106.amr.corp.intel.com> <14067057-ec56-8c07-8f79-d1a29c7e20b7@oracle.com> <53E8E64DB2403849AFD89B7D4DAC8B2A9A14CD5E@ORSMSX106.amr.corp.intel.com> <2a9920a1-0ec3-87cf-2c71-7bdcfcb796be@oracle.com> Message-ID: <53E8E64DB2403849AFD89B7D4DAC8B2A9A14DF41@ORSMSX106.amr.corp.intel.com> Thanks Vladimir and Tobias for the review. I have pushed the change. Regards, Vivek -----Original Message----- From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] Sent: Monday, January 14, 2019 4:26 PM To: Deshpande, Vivek R ; Tobias Hartmann ; hotspot-compiler-dev at openjdk.java.net compiler Cc: Raj, Guru ; Viswanathan, Sandhya Subject: Re: RFR(S):8216050:X86: Fix for Superword optimization fails with assert(0 <= i && i < _len) failed: illegal index On 1/14/19 4:17 PM, Deshpande, Vivek R wrote: > Hi Vladimir > > Thanks for looking at the patch. > The MulAddS2I node gets packed in follow_use_defs() with this approach in which we just perform swaps in follow_def_uses and return false. Got it. I confused follow_use_defs() with follow_def_uses(). Changes are good. Vladimir > This way MulAddS2I nodes gets the right alignment of multiple of 4 from its outs. > If we return true after the swaps in follow_def_uses(), it gets alignment as multiple of 2(from LoadS) for packing, instead of multiple of 4. > > Regards, > Vivek > > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Monday, January 14, 2019 2:37 PM > To: Deshpande, Vivek R ; Tobias Hartmann > ; hotspot-compiler-dev at openjdk.java.net > compiler > Cc: Raj, Guru > Subject: Re: RFR(S):8216050:X86: Fix for Superword optimization fails > with assert(0 <= i && i < _len) failed: illegal index > > Hi Vivek, > > I do not understand changes in superword.cpp. > > muladds2i will never be packed in follow_def_uses() since you return 'false' for muladds2i in all cases when u1 != u2 (even when i1 == i2). Is it intentional? > > Thanks, > Vladimir > > On 1/11/19 11:38 AM, Deshpande, Vivek R wrote: >> Hi Tobias >> >> Thanks for reviewing the patch. >> I have made the changes according to your suggestion. >> In this webrev: >> http://cr.openjdk.java.net/~vdeshpande/8216050/webrev.01/ >> I have fix for the crash reported in the 8216050. >> >> The lower cost is needed for generation of vpdpwssd instruction, by combining AddVI and MulAddVS2VI. >> For other instructions pmaddwd and vpmaddwd, they get generated on platforms upto skylake with default cost. >> >> I have updated the bug also with the link to webrev. >> >> I have created a different bug JDK-8216580 for >> 3) Fix generation of vector code by allowing adjacent LoadS nodes to be isomorphic when they have different control RangeCheck nodes >> for a[i] and a[i+1] accesses in same MulAddS2I node >> >> Thank you. >> Regards, >> Vivek >> >> -----Original Message----- >> From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] >> Sent: Friday, January 11, 2019 4:49 AM >> To: Deshpande, Vivek R ; >> hotspot-compiler-dev at openjdk.java.net compiler >> >> Cc: Vladimir Kozlov ; Viswanathan, >> Sandhya ; Raj, Guru >> >> Subject: Re: RFR(S):8216050:X86: Fix for Superword optimization fails >> with assert(0 <= i && i < _len) failed: illegal index >> >> Hi Vivek, >> >> On 11.01.19 07:58, Deshpande, Vivek R wrote: >>> 1) Fix for the crash by matching the operand by swapping to right positions. >> >> Looks good but the change to loopopts.cpp:530 screwed up the indentation around the ifs, please fix. >> >>> 2) Cost based generation of vpdpwssd instruction. >> >> Other instructions added by JDK-8214751 still miss a cost definition, for example: >> http://hg.openjdk.java.net/jdk/jdk/rev/4bb6e0871bf7#l5.20 >> >>> 3) Fix generation of vector code by allowing adjacent LoadS nodes to >>> be isomorphic when they have different control RangeCheck nodes >>> ????for a[i] and a[i+1] accesses in same MulAddS2I node >> >> This is unrelated to the original bug, right? If so, this should be integrated with a separate RFE. >> >> Thanks, >> Tobias >> From dean.long at oracle.com Tue Jan 15 20:07:51 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 15 Jan 2019 12:07:51 -0800 Subject: RFR(S): 8216556: Unnecessary liveness computation with JVMTI In-Reply-To: <01121e2319ea44bf8aee088ffb32a617@sap.com> References: <88842ba1a169406d9628ab06665bd787@sap.com> <9c7afb40-cc2b-9ae8-fb70-4ac3bacb72da@oracle.com> <3a600790198e4bbbb6f253daf0af8ff0@sap.com> <01121e2319ea44bf8aee088ffb32a617@sap.com> Message-ID: <3e7b089a-82dc-1123-7b91-7af1a741033d@oracle.com> +1 dl On 1/15/19 3:05 AM, Doerr, Martin wrote: > Hi Vladimir, Dean and Claes, > > thank you for reviewing. > I assume the version which moves the implementation of should_retain_local_variables() to the hpp file (as suggested by Claes) is fine. > I'll push this version if there are no objections. > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Doerr, Martin > Sent: Montag, 14. Januar 2019 09:31 > To: Claes Redestad ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: [CAUTION] RE: RFR(S): 8216556: Unnecessary liveness computation with JVMTI > > Hi Claes, > > excellent proposal. Thanks. I had not noticed that it currently is in a cpp file. > > New webrev: > http://cr.openjdk.java.net/~mdoerr/8216556_JVMTI_liveness/webrev.01/ > > What I still don't really like is that we're passing MethodLivenessResult objects on stack via 3 compilation units. > But I don't know if it's worth refactoring the code. > > Best regards, > Martin > > > -----Original Message----- > From: Claes Redestad > Sent: Freitag, 11. Januar 2019 16:45 > To: Doerr, Martin > Subject: Re: RFR(S): 8216556: Unnecessary liveness computation with JVMTI > > Hi, > > just a random thought, but if you're optimizing this and got some > measure where it matters(?), maybe you should also try inlining > ciEnv::should_retain_local_variables(), i.e., move definition to > ciEnv.hpp. If it doesn't bloat static binary size it seems like it won't > hurt, at least. > > /Claes > > On 2019-01-11 13:55, Doerr, Martin wrote: >> Hi, >> >> I'd like to contribute a small JIT improvement for JVMTI to avoid >> calling raw_liveness_at_bci when its result is not needed. >> >> Bug with description: >> >> https://bugs.openjdk.java.net/browse/JDK-8216556 >> >> Webrev: >> >> http://cr.openjdk.java.net/~mdoerr/8216556_JVMTI_liveness/webrev.00/ >> >> Please review. >> >> Best regards, >> >> Martin >> From vladimir.x.ivanov at oracle.com Wed Jan 16 01:21:04 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 15 Jan 2019 17:21:04 -0800 Subject: [12] RFR (S): 8215757: C2: PhaseIdealLoop::spinup() computes wrong post-dominating point In-Reply-To: References: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> <874lafzfiy.fsf@redhat.com> <09abefa3-37b4-5657-22f9-e06f144a1867@oracle.com> <21fc0b3a-87d5-0c31-0d63-75eca3c05e5b@oracle.com> <14b4d2c8-0cf4-4c49-309b-1838da8536a0@oracle.com> <6ebc3b71-bc58-0a14-20c9-aaac3a705f91@oracle.com> <5e4f7d3b-6d7d-3017-2926-13e932820205@oracle.com> <87va2qvy64.fsf@redhat.com> Message-ID: <07fe96e2-72fd-48c4-32e8-af43840aef4c@oracle.com> Updated webrev (with Roland's proposal): http://cr.openjdk.java.net/~vlivanov/8215757/webrev.01/ Testing: failing test (replay), hs-precheckin-comp, hs-tier1, hs-tier2 (in progress) Best regards, Vladimir Ivanov On 15/01/2019 09:46, Vladimir Ivanov wrote: > > > On 15/01/2019 05:38, Roland Westrelin wrote: >> >> Hi Vladimir K & Vladimir I, >> >>>> And that's the other way to fix the crash: initiate new >>>> PhaseIdealLoop iteration right away if any strip mined loops are >>>> introduced. >>> >>> Got it. So the issue is that strip mining invalidated IDOM >>> information generated at the beginning of >>> PhaseIdealLoop::build_and_optimize(). >> >> I don't think that's accurate. The idom is changed when a loop limit >> check is inserted (so that's unrelated to strip mining AFAICT). As >> Vladimir said, when the loop limit check is inserted, the idom of the >> region is fixed by: >> >> ???? Node* nrdom = dom_lca(ridom, new_iff); >> ???? set_idom(rgn, nrdom, dom_depth(rgn)); >> >> which does: >> >> ?? Node *dom_lca( Node *n1, Node *n2 ) const { >> ???? return find_non_split_ctrl(dom_lca_internal(n1, n2)); >> ?? } >> >> and because of the find_non_split_ctrl(), the idom is set to a region >> rather than an if. >> >> That's broken and I'm confused as to why a straightforward change of the >> logic above: >> >> diff --git a/src/hotspot/share/opto/loopPredicate.cpp >> b/src/hotspot/share/opto/loopPredicate.cpp >> --- a/src/hotspot/share/opto/loopPredicate.cpp >> +++ b/src/hotspot/share/opto/loopPredicate.cpp >> @@ -160,7 +160,7 @@ >> ??? // When called from beautify_loops() idom is not constructed yet. >> ??? if (_idom != NULL) { >> ????? Node* ridom = idom(rgn); >> -??? Node* nrdom = dom_lca(ridom, new_iff); >> +??? Node* nrdom = dom_lca_internal(ridom, new_iff); >> ????? set_idom(rgn, nrdom, dom_depth(rgn)); >> ??? } >> is not good enough. > > Fair point. So you're saying that dom_lca()/find_non_split_ctrl() should > never be used to set IDOM, right? And all the places which require > non-split point for an IDOM should explicitly normalize it? > > I checked the codebase and all places, where > dom_lca()/find_non_split_ctrl() are used, IDOM is left intact except > (PhaseIdealLoop::create_new_if_for_predicate). > > So, I'm fine with the fix you propose (though I'm still not happy about > the distinction between IDOM & dom_lca()/find_non_split_ctrl()). > > Best regards, > Vladimir Ivanov From vladimir.kozlov at oracle.com Wed Jan 16 01:56:48 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 15 Jan 2019 17:56:48 -0800 Subject: [12] RFR (S): 8215757: C2: PhaseIdealLoop::spinup() computes wrong post-dominating point In-Reply-To: <07fe96e2-72fd-48c4-32e8-af43840aef4c@oracle.com> References: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> <874lafzfiy.fsf@redhat.com> <09abefa3-37b4-5657-22f9-e06f144a1867@oracle.com> <21fc0b3a-87d5-0c31-0d63-75eca3c05e5b@oracle.com> <14b4d2c8-0cf4-4c49-309b-1838da8536a0@oracle.com> <6ebc3b71-bc58-0a14-20c9-aaac3a705f91@oracle.com> <5e4f7d3b-6d7d-3017-2926-13e932820205@oracle.com> <87va2qvy64.fsf@redhat.com> <07fe96e2-72fd-48c4-32e8-af43840aef4c@oracle.com> Message-ID: Looks good. I tried to look on history of this code and it was from the first day of loop predicates implementation: http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/b2b6a9bf6238#l5.184 Thanks, Vladimir On 1/15/19 5:21 PM, Vladimir Ivanov wrote: > Updated webrev (with Roland's proposal): > ? http://cr.openjdk.java.net/~vlivanov/8215757/webrev.01/ > > Testing: failing test (replay), hs-precheckin-comp, hs-tier1, hs-tier2 (in progress) > > Best regards, > Vladimir Ivanov > > On 15/01/2019 09:46, Vladimir Ivanov wrote: >> >> >> On 15/01/2019 05:38, Roland Westrelin wrote: >>> >>> Hi Vladimir K & Vladimir I, >>> >>>>> And that's the other way to fix the crash: initiate new PhaseIdealLoop iteration right away if >>>>> any strip mined loops are >>>>> introduced. >>>> >>>> Got it. So the issue is that strip mining invalidated IDOM information generated at the >>>> beginning of >>>> PhaseIdealLoop::build_and_optimize(). >>> >>> I don't think that's accurate. The idom is changed when a loop limit >>> check is inserted (so that's unrelated to strip mining AFAICT). As >>> Vladimir said, when the loop limit check is inserted, the idom of the >>> region is fixed by: >>> >>> ???? Node* nrdom = dom_lca(ridom, new_iff); >>> ???? set_idom(rgn, nrdom, dom_depth(rgn)); >>> >>> which does: >>> >>> ?? Node *dom_lca( Node *n1, Node *n2 ) const { >>> ???? return find_non_split_ctrl(dom_lca_internal(n1, n2)); >>> ?? } >>> >>> and because of the find_non_split_ctrl(), the idom is set to a region >>> rather than an if. >>> >>> That's broken and I'm confused as to why a straightforward change of the >>> logic above: >>> >>> diff --git a/src/hotspot/share/opto/loopPredicate.cpp b/src/hotspot/share/opto/loopPredicate.cpp >>> --- a/src/hotspot/share/opto/loopPredicate.cpp >>> +++ b/src/hotspot/share/opto/loopPredicate.cpp >>> @@ -160,7 +160,7 @@ >>> ??? // When called from beautify_loops() idom is not constructed yet. >>> ??? if (_idom != NULL) { >>> ????? Node* ridom = idom(rgn); >>> -??? Node* nrdom = dom_lca(ridom, new_iff); >>> +??? Node* nrdom = dom_lca_internal(ridom, new_iff); >>> ????? set_idom(rgn, nrdom, dom_depth(rgn)); >>> ??? } >>> is not good enough. >> >> Fair point. So you're saying that dom_lca()/find_non_split_ctrl() should never be used to set >> IDOM, right? And all the places which require non-split point for an IDOM should explicitly >> normalize it? >> >> I checked the codebase and all places, where dom_lca()/find_non_split_ctrl() are used, IDOM is >> left intact except (PhaseIdealLoop::create_new_if_for_predicate). >> >> So, I'm fine with the fix you propose (though I'm still not happy about the distinction between >> IDOM & dom_lca()/find_non_split_ctrl()). >> >> Best regards, >> Vladimir Ivanov From tom.rodriguez at oracle.com Wed Jan 16 07:00:35 2019 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Tue, 15 Jan 2019 23:00:35 -0800 Subject: [12] RFR(XS) 8215748: Application fails when executed with Graal In-Reply-To: References: <63a38944-2d0a-ef61-bea5-e709b4623692@oracle.com> Message-ID: <9ad421e8-3075-16ff-6aaf-80517a24c4d1@oracle.com> Tobias Hartmann wrote on 1/15/19 2:32 AM: > Hi Tom, > > this looks good to me. You might want to reference the related code in > LinkResolver::check_method_accessability in your comment(no new webrev required). Good suggestion. I added a mention of that at the end of the comment. Thanks! tom > > Best regards, > Tobias > > On 15.01.19 08:09, Tom Rodriguez wrote: >> http://cr.openjdk.java.net/~never/8215748/webrev >> https://bugs.openjdk.java.net/browse/JDK-8215748 >> >> If an interface method attempts to invoke an array clone method, JVMCI doesn't let you resolve the >> invoke properly which can result in performance problems or unexpected NullPointerExceptions.? clone >> is publicly visible on arrays but is protected in Object.? HotSpot doesn't have an actual Method* >> for the array clone operations, it just reuses Object.clone.? This is accomplished with some >> trickery in the linkResolver.cpp that adjusts the visibility during resolution if an array class is >> involved.? JVMCI only deals with concrete methods so when a call site is resolved you get back the >> real Object.clone.? If you try to use resolveMethod on it then it will resolve it relative to Object >> instead of using the array type.? This works ok when the accessing class is an class but for >> interface types it fails.? In benign cases Graal just ends up falling back to a regular call which >> is slower than normal. ?In this case we were attempting to resolve an invoke for a profiled call >> site and got back null which shouldn't happen.? The fix is the use the array class as the method >> type in this particular case which mirrors the logic in the linkResolver.cpp that adjusts the >> visibility check. Tested with Spark and the new unit test.? mach5 testing is ongoing. From lutz.schmidt at sap.com Wed Jan 16 08:30:29 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Wed, 16 Jan 2019 08:30:29 +0000 Subject: RFR (M): 8216314: SIGILL in CodeHeapState::print_names() In-Reply-To: <3719e6a2-54a5-283d-ee2b-fedb0c0110a2@oracle.com> References: <98FFB93F-674D-4994-953F-B35572E316A2@sap.com> <1c39e69b-2830-0ced-bbb2-8b5003972695@oracle.com> <3719e6a2-54a5-283d-ee2b-fedb0c0110a2@oracle.com> Message-ID: <4E570A00-3567-4333-A268-3C0100DF0417@sap.com> Thank you, Vladimir! I'll go ahead and push. Regards, Lutz ?On 15.01.19, 18:56, "Vladimir Kozlov" wrote: +1. Looks good. Thanks, Vladimir On 1/15/19 3:45 AM, Tobias Hartmann wrote: > Hi Lutz, > > thanks for the discussions and making these changes. The fix looks good to me. > > Minor style issue (no new webrev required) in codeHeapState.cpp:1289/1290/1305/1305/1306: Please add > a newline after '{' (and before '}') or at least a whitespace. > > Best regards, > Tobias > > On 15.01.19 12:26, Schmidt, Lutz wrote: >> Dear all, >> >> may I please request reviews for this fix, hardening CodeHeap Analytics to not fail when used in high-load (stress) scenarios. There was quite a bit of preliminary discussion, all documented in the "Comments" section of the bug. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8216314 >> Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8216314.01/ >> >> Thank you! >> Lutz >> >> From rwestrel at redhat.com Wed Jan 16 08:43:28 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 16 Jan 2019 09:43:28 +0100 Subject: [12] RFR (S): 8215757: C2: PhaseIdealLoop::spinup() computes wrong post-dominating point In-Reply-To: <07fe96e2-72fd-48c4-32e8-af43840aef4c@oracle.com> References: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> <874lafzfiy.fsf@redhat.com> <09abefa3-37b4-5657-22f9-e06f144a1867@oracle.com> <21fc0b3a-87d5-0c31-0d63-75eca3c05e5b@oracle.com> <14b4d2c8-0cf4-4c49-309b-1838da8536a0@oracle.com> <6ebc3b71-bc58-0a14-20c9-aaac3a705f91@oracle.com> <5e4f7d3b-6d7d-3017-2926-13e932820205@oracle.com> <87va2qvy64.fsf@redhat.com> <07fe96e2-72fd-48c4-32e8-af43840aef4c@oracle.com> Message-ID: <87pnsxvvpr.fsf@redhat.com> > http://cr.openjdk.java.net/~vlivanov/8215757/webrev.01/ Looks good to me. Roland. From rkennke at redhat.com Wed Jan 16 09:42:24 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 16 Jan 2019 10:42:24 +0100 Subject: RFR(S): 8217042: Shenandoah: write barrier on backedge of strip mined loop causes c2 crash at expansion time In-Reply-To: <874laaxpgm.fsf@redhat.com> References: <874laaxpgm.fsf@redhat.com> Message-ID: Looks good to me. Fix the comments as Aleksey already noted. Thanks, Roman > http://cr.openjdk.java.net/~roland/8217042/webrev.00/ > > If a write barrier is in the body of the outer strip mined loop, > expanding it causes loop strip mining verification code to fail. This is > worked around by turning the strip mined loop nest into a regular > counted loop nest so verification code doesn't trigger. The logic that > takes care of that breaks when the write barrier is on the backedge of > the strip mined loop because it is applied after the barrier is > expanded. The fix I propose is to move that logic before barrier > expansion. > > Roland. > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From rkennke at redhat.com Wed Jan 16 09:43:05 2019 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 16 Jan 2019 10:43:05 +0100 Subject: RFR(S): 8217043: Shenandoah: SIGSEGV in Type::meet_helper() at barrier expansion time In-Reply-To: <87y37mwa5j.fsf@redhat.com> References: <87y37mwa5j.fsf@redhat.com> Message-ID: <98bfac69-a4ed-efa3-e00d-be267b15b932@redhat.com> Ok. Thanks, Roman > http://cr.openjdk.java.net/~roland/8217043/webrev.00/ > > The ShenandoahBarrierNode::needs_barrier_impl() encounters a > CallLeafNode (from a write barrier) and tries to get the type of n which > is a tuple, not a pointer and this causes a null pointer > dereference. The write barrier runtime call should anyway prevent an > optimization of the barrier and to be on the safe side, any call should. > > Roland. > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From jatin.bhateja at intel.com Wed Jan 16 09:53:04 2019 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Wed, 16 Jan 2019 09:53:04 +0000 Subject: [aarch64-port-dev ] RFR(M): 8212043: Add floating-point Math.min/max intrinsics In-Reply-To: References: <5bf1c593-2e96-8a10-88c6-98afdd9a04f2@redhat.com> <0c7de175-17d8-f3f5-a47b-2b9b3f45af71@redhat.com> <1e7af2c4-8610-2ee9-9955-298ffb715fa7@redhat.com> <06048878-effe-7d24-bb87-b140e662aeb8@redhat.com> <7c97719b-e83a-ba40-43a3-8cec8273df1c@redhat.com> <3df16666-a10b-41bb-7439-b967e1d76735@redhat.com> <4a10fa17-197b-2da9-7890-9544a407832f@redhat.com> Message-ID: Hi Pengfei, Your final patch (http://cr.openjdk.java.net/~pli/rfr/8212043/webrev.04/) to support floating point scalar max/min intrinsic also included following test case which is not up streamed to jdk repository. test/hotspot/jtreg/compiler/intrinsics/math/TestFpMinMaxIntrinsics.java Can you kindly add this test case, I?m working on supporting these new intrinsics for X86 platform and will like to use the test case you created. Thanks and Regards, Jatin Bhateja From: Pengfei Li (Arm Technology China) > ---------- Forwarded message --------- Date: Wed, Dec 19, 2018 at 6:38 PM Subject: RE: [aarch64-port-dev ] RFR(M): 8212043: Add floating-point Math.min/max intrinsics To: Andrew Dinn >, Andrew Haley >, hotspot-compiler-dev at openjdk.java.net >, aarch64-port-dev at openjdk.java.net > Cc: nd > Hi > Pengfei, I am sure you will be pleased to know this has finally been pushed to > the dev repo. Thanks a lot Andrew Dinn and Andrew Haley! This could not happen without your help. And for the next step, the follow-up patch for vectorization is almost ready. I will post it in another new thread soon later. -- Thanks, Pengfei -------------- next part -------------- An HTML attachment was scrubbed... URL: From adinn at redhat.com Wed Jan 16 11:23:10 2019 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 16 Jan 2019 11:23:10 +0000 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <69510788-52e6-815b-1ed7-a6f4886d0398@oracle.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <09c92b1a-c6da-e16b-bb68-553876e8a6ea@oracle.com> <2a0a385d-81ee-df66-f147-e4dd9aa5b72e@oracle.com> <8b2ab749-20f1-8dd3-3cc7-64db5d45bc7d@redhat.com> <0aae37aa-7797-fde5-63d5-96c8eb961183@oracle.com> <86a1988a-a8d2-b6af-0985-11a94d6d76a5@redhat.com> <69510788-52e6-815b-1ed7-a6f4886d0398@oracle.com> Message-ID: <3e3c4f7d-049e-4aec-c165-f2664e7c98ef@redhat.com> Hi Alan/Brian, I have finally been able to shelve other commitments and return to this JEP (apologies for the hiatus). https://openjdk.java.net/jeps/8207851 The JEP has been reviewed positively by Stuart Marks (core libs) and Vladimir Kozlov (intrinsics). It has also been warmly welcomed by several potential users in Red Hat and Intel (including, respectively, Jonathan Halliday and Sandya Viswanathan both in cc). I believe I have addressed all outstanding comments on the JEP per se, including those made by Alan. Is it now possible for one of you to endorse the JEP so it can be submitted? I am aware that I still need to address a few details in the draft implementation that are not present in the latest webrev. I believe there are two changes requested by Vladimir: 1. correct the type of cache writeback memory nodes to generic memory 2. use the JVM to inject a flag setting which enables/disables mapping of persistent buffers and also one change requested by Alan: make method MappedByteBuffer.isPersistent private rather than public Is there any other impediment to submitting the JEP and proceeding to code review? regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From tobias.hartmann at oracle.com Wed Jan 16 11:25:16 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 16 Jan 2019 12:25:16 +0100 Subject: RFR(S): 8217043: Shenandoah: SIGSEGV in Type::meet_helper() at barrier expansion time In-Reply-To: <87y37mwa5j.fsf@redhat.com> References: <87y37mwa5j.fsf@redhat.com> Message-ID: Hi Roland, looks good to me. Best regards, Tobias On 15.01.19 10:19, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8217043/webrev.00/ > > The ShenandoahBarrierNode::needs_barrier_impl() encounters a > CallLeafNode (from a write barrier) and tries to get the type of n which > is a tuple, not a pointer and this causes a null pointer > dereference. The write barrier runtime call should anyway prevent an > optimization of the barrier and to be on the safe side, any call should. > > Roland. > From tobias.hartmann at oracle.com Wed Jan 16 11:26:05 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 16 Jan 2019 12:26:05 +0100 Subject: RFR(S): 8217042: Shenandoah: write barrier on backedge of strip mined loop causes c2 crash at expansion time In-Reply-To: <874laaxpgm.fsf@redhat.com> References: <874laaxpgm.fsf@redhat.com> Message-ID: Hi Roland, looks reasonable to me. Best regards, Tobias On 15.01.19 10:03, Roland Westrelin wrote: > > http://cr.openjdk.java.net/~roland/8217042/webrev.00/ > > If a write barrier is in the body of the outer strip mined loop, > expanding it causes loop strip mining verification code to fail. This is > worked around by turning the strip mined loop nest into a regular > counted loop nest so verification code doesn't trigger. The logic that > takes care of that breaks when the write barrier is on the backedge of > the strip mined loop because it is applied after the barrier is > expanded. The fix I propose is to move that logic before barrier > expansion. > > Roland. > From rwestrel at redhat.com Wed Jan 16 12:34:57 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 16 Jan 2019 13:34:57 +0100 Subject: RFR(S): 8217043: Shenandoah: SIGSEGV in Type::meet_helper() at barrier expansion time In-Reply-To: References: <87y37mwa5j.fsf@redhat.com> Message-ID: <87imyowzke.fsf@redhat.com> Thanks for the reviews Aleksey, Roman & Tobias. Roland. From rwestrel at redhat.com Wed Jan 16 12:35:58 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 16 Jan 2019 13:35:58 +0100 Subject: RFR(S): 8217042: Shenandoah: write barrier on backedge of strip mined loop causes c2 crash at expansion time In-Reply-To: References: <874laaxpgm.fsf@redhat.com> Message-ID: <87fttswzip.fsf@redhat.com> Thanks for the reviews Roman & Tobias, and for the comments, Aleksey. Roland. From lutz.schmidt at sap.com Wed Jan 16 14:52:38 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Wed, 16 Jan 2019 14:52:38 +0000 Subject: RFR(S, tedious): 8217250: Optimize CodeHeap Analytics Message-ID: Dear all, may I please have reviews for this (semantically) small change. Its purpose is to reduce the bufferedStream buffer flushes while printing CodeHeap Analytics. Bug: https://bugs.openjdk.java.net/browse/JDK-8217250 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8217250.00/ Thank you! Lutz From vladimir.x.ivanov at oracle.com Wed Jan 16 17:48:59 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 16 Jan 2019 09:48:59 -0800 Subject: [12] RFR (S): 8215757: C2: PhaseIdealLoop::spinup() computes wrong post-dominating point In-Reply-To: <87pnsxvvpr.fsf@redhat.com> References: <13b5bc19-75a3-2692-92a1-8ac731ebf671@oracle.com> <874lafzfiy.fsf@redhat.com> <09abefa3-37b4-5657-22f9-e06f144a1867@oracle.com> <21fc0b3a-87d5-0c31-0d63-75eca3c05e5b@oracle.com> <14b4d2c8-0cf4-4c49-309b-1838da8536a0@oracle.com> <6ebc3b71-bc58-0a14-20c9-aaac3a705f91@oracle.com> <5e4f7d3b-6d7d-3017-2926-13e932820205@oracle.com> <87va2qvy64.fsf@redhat.com> <07fe96e2-72fd-48c4-32e8-af43840aef4c@oracle.com> <87pnsxvvpr.fsf@redhat.com> Message-ID: Thanks, Vladimir & Roland. Best regards, Vladimir Ivanov On 16/01/2019 00:43, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~vlivanov/8215757/webrev.01/ > > Looks good to me. > > Roland. > From vladimir.kozlov at oracle.com Wed Jan 16 18:10:21 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 16 Jan 2019 10:10:21 -0800 Subject: RFR(S, tedious): 8217250: Optimize CodeHeap Analytics In-Reply-To: References: Message-ID: Hi Lutz, I see that you have only one usage in all cases for: BUFFEREDSTREAM_FLUSH_IF("", 512) Can you simple declare simplified macro for this? Otherwise looks good. Thanks, Vladimir On 1/16/19 6:52 AM, Schmidt, Lutz wrote: > Dear all, > > may I please have reviews for this (semantically) small change. Its purpose is to reduce the bufferedStream buffer flushes while printing CodeHeap Analytics. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8217250 > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8217250.00/ > > Thank you! > Lutz > > From derekw at marvell.com Wed Jan 16 19:44:33 2019 From: derekw at marvell.com (Derek White) Date: Wed, 16 Jan 2019 19:44:33 +0000 Subject: [aarch64-port-dev ] RFR: 8216350: AArch64: monitor unlock fast path not called In-Reply-To: References: <680089e7-ec26-a4cd-6143-4d36182e971a@arm.com> <51023960-0e8f-56aa-20a4-279017251585@redhat.com> Message-ID: Hi Nick, Looks good to me! - Derek > -----Original Message----- > From: Nick Gasson (Arm Technology China) > Sent: Thursday, January 10, 2019 9:37 PM > To: Andrew Haley ; Derek White > ; hotspot-compiler-dev at openjdk.java.net compiler > > Cc: nd ; aarch64-port-dev at openjdk.java.net > Subject: [EXT] Re: [aarch64-port-dev ] RFR: 8216350: AArch64: monitor > unlock fast path not called > > External Email > > ---------------------------------------------------------------------- > Hi all, > > On 09/01/2019 17:23, Andrew Haley wrote: > > > > HotSpot policy is that we can do minor cleanups as we go along: > > experience has shown that unless you do so, cruft tends to accumulate. > > These cleanups are OK for this patch. > > > > Please see the updated webrev here: > > http://cr.openjdk.java.net/~ngasson/8216350/webrev.1/ > > Includes cleanups according to Derek's comments and updated the copyright > year (thanks Felix). > > > 4) Slightly better comment for last instruction of fast_unlock (and > explicitly use zr). > > __ stlr(zr, tmp); // set unowned > > Note I needed to change the definition of load_store_exclusive to allow ZR > here. I've checked that this is OK for the other instructions that use this. > > Thanks, > Nick From lutz.schmidt at sap.com Wed Jan 16 20:37:23 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Wed, 16 Jan 2019 20:37:23 +0000 Subject: RFR(S, tedious): 8217250: Optimize CodeHeap Analytics In-Reply-To: References: Message-ID: Hi Vladimir, thanks a lot for looking at this so quickly. Sure, I could declare a specialized "BUFFEREDSTREAM_FLUSH_512" for this. The "512" originated from the thought "its large enough for a well-behaved line and small enough to save some flushes". I was also thinking about a "BUFFEREDSTREAM_FLUSH_AUTO", where the spare space is derived from the buffer capacity, maybe something like 10 percent of the capacity, 256 bytes minimum. I wasn't sure if that could be categorized as over-engineered. Your thoughts? Thanks, Lutz ?On 16.01.19, 19:10, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: Hi Lutz, I see that you have only one usage in all cases for: BUFFEREDSTREAM_FLUSH_IF("", 512) Can you simple declare simplified macro for this? Otherwise looks good. Thanks, Vladimir On 1/16/19 6:52 AM, Schmidt, Lutz wrote: > Dear all, > > may I please have reviews for this (semantically) small change. Its purpose is to reduce the bufferedStream buffer flushes while printing CodeHeap Analytics. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8217250 > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8217250.00/ > > Thank you! > Lutz > > From bsrbnd at gmail.com Wed Jan 16 20:46:32 2019 From: bsrbnd at gmail.com (B. Blaser) Date: Wed, 16 Jan 2019 21:46:32 +0100 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions In-Reply-To: <81dc5407-4985-883c-83fc-2e1fa2b77e66@redhat.com> References: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> <81dc5407-4985-883c-83fc-2e1fa2b77e66@redhat.com> Message-ID: On Tue, 15 Jan 2019 at 12:16, Roman Kennke wrote: > > >> I agree with that. However, note that this is not about using cmov vs. > >> branches. This is about generating a load followed by a cmov on the > >> resulting register vs generating a cmov that also does the load and > >> avoids the register. It's pretty much the same data-dependency-wise, > >> except that it avoids using the extra register and encodes smaller. > > > > Sure, I get that. But, for the reasons given, CMOV is a rather dusty > > corner of the ISA. Intel themselves recommend not using it unless you > > know that the branch is always unpredictable. They say "Use the SETCC > > and CMOV instructions to eliminate unpredictable conditional branches > > where possible. Do not do this for predictable branches." It really > > couldn't be clearer. > > Well yeah, but again, this patch isn't about generating cmov or not, it > only changes that a cmov preceded by a load (mov) is generated as single > instruction rather than two instructions for object loads, pretty much > as it's done for all the other types. However, it's not very important > to me, and probably anybody else, otherwise this wouldn't have been > commented-out. I'd withdraw the patch unless somebody steps up and > really wants it. To answer Andrew Haley, one of the major difference between CISC and RISC is specifically the load/store architecture of the latter which is part of most instructions of the former; I don't see many good reasons to generate RISC-like load/store code using only a subset of instructions and to juggle with registers. Note also that if a 'mov+cmov' would now appear to be faster than a sole 'cmov' on some processors, there is a high probability to see the opposite behavior in future generations. Of course, if you use another idiom like 'cmp+branch' which isn't the purpose of this fix, you might have benefits for predictable branches or not for unpredictable branches. At my mind, It'd be unfortunate to withdraw this patch as the current policy seems to go in Roman's direction, see 'cmovL_mem' which uses an adequate prefix: http://hg.openjdk.java.net/jdk/jdk/file/d3aa93570779/src/hotspot/cpu/x86/x86_64.ad#l7083 I'm not skilled enough in the pointer type area to answer Andrew Dinn's question but maybe adding predicates to validate them if Roman's prefix correction isn't sufficient would be a possible solution? In any case, explanations about the initial 'cmovP_mem' intention and the reason for disabling it would be helpful. Bernard From vladimir.kozlov at oracle.com Wed Jan 16 21:53:48 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 16 Jan 2019 13:53:48 -0800 Subject: RFR(S, tedious): 8217250: Optimize CodeHeap Analytics In-Reply-To: References: Message-ID: <2d7b7963-61be-95b8-017b-956f2752c8f3@oracle.com> On 1/16/19 12:37 PM, Schmidt, Lutz wrote: > Hi Vladimir, > > thanks a lot for looking at this so quickly. > > Sure, I could declare a specialized "BUFFEREDSTREAM_FLUSH_512" for this. The "512" originated from the thought "its large enough for a well-behaved line and small enough to save some flushes". > > I was also thinking about a "BUFFEREDSTREAM_FLUSH_AUTO", where the spare space is derived from the buffer capacity, maybe something like 10 percent of the capacity, 256 bytes minimum. I wasn't sure if that could be categorized as over-engineered. Yes, I think BUFFEREDSTREAM_FLUSH_AUTO is better than fixed size. Vladimir > > Your thoughts? > > Thanks, > Lutz > > ?On 16.01.19, 19:10, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: > > Hi Lutz, > > I see that you have only one usage in all cases for: > BUFFEREDSTREAM_FLUSH_IF("", 512) > > Can you simple declare simplified macro for this? > > Otherwise looks good. > > Thanks, > Vladimir > > On 1/16/19 6:52 AM, Schmidt, Lutz wrote: > > Dear all, > > > > may I please have reviews for this (semantically) small change. Its purpose is to reduce the bufferedStream buffer flushes while printing CodeHeap Analytics. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8217250 > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8217250.00/ > > > > Thank you! > > Lutz > > > > > > From gromero at linux.vnet.ibm.com Wed Jan 16 21:57:37 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 16 Jan 2019 19:57:37 -0200 Subject: [12] RFR(S) 8215317: [GRAAL] unit test CheckGraalIntrinsics failed after 8213754 In-Reply-To: <5df6d8d0-4fba-4515-ffee-870cf8cff9d3@oracle.com> References: <5df6d8d0-4fba-4515-ffee-870cf8cff9d3@oracle.com> Message-ID: <23eb2db4-fc81-e5e4-4c8f-67a617426353@linux.vnet.ibm.com> Hi Vladimir, I would like to request the approval to backport the change: 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace https://bugs.openjdk.java.net/browse/JDK-8213754 to jdk11u, but if it gets integrated before 8215687 it will break Graal test HotspotTest.java/CheckGraalIntrinsics.java again, as expected. Are you fine if I request the approval to backport first this change, i.e. 8215687? Actually I'll have to tweak a bit and s/isJDK12OrHigher/isJDK11OrHigher/, right? Thank you. Best regards, Gustavo On 12/13/2018 02:10 AM, Vladimir Kozlov wrote: > https://bugs.openjdk.java.net/browse/JDK-8215317 > > JDK-8213754 added new intrinsics which cause Graal's unit test failure. > > CheckGraalIntrinsics test is adjusted for new intrinsics: > > src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java > @@ -376,6 +376,14 @@ > "jdk/jfr/internal/JVM.getEventWriter()Ljava/lang/Object;"); > } > > + if (isJDK12OrHigher()) { > + add(toBeInvestigated, > + "java/lang/CharacterDataLatin1.isDigit(I)Z", > + "java/lang/CharacterDataLatin1.isLowerCase(I)Z", > + "java/lang/CharacterDataLatin1.isUpperCase(I)Z", > + "java/lang/CharacterDataLatin1.isWhitespace(I)Z"); > + } > + > if (!config.inlineNotify()) { > add(ignore, "java/lang/Object.notify()V"); > } > > Tested tier1 and tier3-graal (where test is run). > > I also pushed changes into Lab's Graal repo so this test will be updated during next sync. > But I want to push fix into JDK because JDK 12 will be forked very soon. > From xxinliu at amazon.com Wed Jan 16 22:04:27 2019 From: xxinliu at amazon.com (Liu, Xin) Date: Wed, 16 Jan 2019 22:04:27 +0000 Subject: Why does call_site_target keep changing for a Nashorn method? Message-ID: <616C8E42-4B18-405B-B28A-C9F062EC9B6C@amazon.com> In one of our applications, C1/C2 keeps compiling a Javascript method generated by Nashorn but the code fails a dependency check right before installing in the code cache. This is with JDK tip. It can?t pass ?Dependencies::check_call_site_target_value?. [C2 Parsing] [Validating compilation dependencies] It?s related to the GWT methodHandle. The 2 mismatched methodhandles are very similar except for argL3, which is an int[2]. Even though arg0-2 are not identical objects, their contents are same. (gdb) call java_lang_invoke_CallSite::target(call_site)->print() java.lang.invoke.BoundMethodHandle$Species_LLLL {0x00000000f586ca98} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LLLL' - ---- fields (total size 6 words): - 'customizationCount' 'B' @12 0 - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x00000000e21e2878} = (Ljava/lang/Object;Ljdk/nashorn/internal/runtime/Undefined;Ljava/lang/Object;)Ljava/lang/Object; (e21e2878) - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x00000000e1e4a670} => a 'java/lang/invoke/MemberName'{0x00000000e1e4a938} = {method} {0x00007fffa512cb68} 'guard''(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH' (e1e4a670) - 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0) - final 'argL0' 'Ljava/lang/Object;' @28 a 'java/lang/invoke/BoundMethodHandle$Species_LL'{0x00000000f586c9e8} (f586c9e8) - final 'argL1' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f586ca28} (f586ca28) - final 'argL2' 'Ljava/lang/Object;' @36 a 'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f586ca60} (f586ca60) - final 'argL3' 'Ljava/lang/Object;' @40 [I{0x00000000f586ca10} (f586ca10) (gdb) call method_handle->print() java.lang.invoke.BoundMethodHandle$Species_LLLL {0x00000000f6b18500} - klass: 'java/lang/invoke/BoundMethodHandle$Species_LLLL' - ---- fields (total size 6 words): - 'customizationCount' 'B' @12 0 - private final 'type' 'Ljava/lang/invoke/MethodType;' @16 a 'java/lang/invoke/MethodType'{0x00000000e21e2878} = (Ljava/lang/Object;Ljdk/nashorn/internal/runtime/Undefined;Ljava/lang/Object;)Ljava/lang/Object; (e21e2878) - final 'form' 'Ljava/lang/invoke/LambdaForm;' @20 a 'java/lang/invoke/LambdaForm'{0x00000000e1e4a670} => a 'java/lang/invoke/MemberName'{0x00000000e1e4a938} = {method} {0x00007fffa512cb68} 'guard''(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;' in 'java/lang/invoke/LambdaForm$MH' (e1e4a670) - 'asTypeCache' 'Ljava/lang/invoke/MethodHandle;' @24 NULL (0) - final 'argL0' 'Ljava/lang/Object;' @28 a 'java/lang/invoke/BoundMethodHandle$Species_LL'{0x00000000f6b18450} (f6b18450) - final 'argL1' 'Ljava/lang/Object;' @32 a 'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f6b18490} (f6b18490) - final 'argL2' 'Ljava/lang/Object;' @36 a 'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f6b184c8} (f6b184c8) - final 'argL3' 'Ljava/lang/Object;' @40 [I{0x00000000f6b18478} (f6b18478) My guess is argL3 is counters in Java.lang.invoke.MethodHandleImpl. // Intrinsified by C2. Counters are used during parsing to calculate branch frequencies. @LambdaForm.Hidden @jdk.internal.HotSpotIntrinsicCandidate static boolean profileBoolean(boolean result, int[] counters) { // Profile is int[2] where [0] and [1] correspond to false and true occurrences respectively. int idx = result ? 1 : 0; try { counters[idx] = Math.addExact(counters[idx], 1); } catch (ArithmeticException e) { // Avoid continuous overflow by halving the problematic count. counters[idx] = counters[idx] / 2; } return result; } I am still struggling to understand the source code in java.lang.invoke.*. Could anybody enlighten me why the target of the callsite changes every time here? it is relative to this profiling thing? In validation log, it has validated the dep ?dependency type='call_site_target_value' x0='1556' x='1866'? above. Why it can?t pass it after then? My guess is one MH object has been changed by another Java thread. One interesting fact that compiler thread can?t pass 22th dep. My tuition is it goes over an unknown threshold. The 2nd question is about ciEnv:: validate_compile_task_dependencies. Why does failure of call_site_target_value_changed not count as a deopt? The flag _inc_decompile_count_on_failure =false stops MDO to mark this method ?not_compileable?. C2 doesn?t set the flag, so C2 ends up compiling it over and over, which makes C2 a cpu hog. Here?s the code in validate_compile_task_dependencies bool counter_changed = system_dictionary_modification_counter_changed(); Dependencies::DepType result = dependencies()->validate_dependencies(_task, counter_changed); if (result != Dependencies::end_marker) { if (result == Dependencies::call_site_target_value) { _inc_decompile_count_on_failure = false; record_failure("call site target change"); Maybe the right thing to do is to count this as a deopt and change the deopt limit computation to take into account the size of the method in nodes, just as done for abandoning compilation if the graph is too big. Thanks, --lx -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandhya.viswanathan at intel.com Thu Jan 17 00:00:31 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Thu, 17 Jan 2019 00:00:31 +0000 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <3e3c4f7d-049e-4aec-c165-f2664e7c98ef@redhat.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <09c92b1a-c6da-e16b-bb68-553876e8a6ea@oracle.com> <2a0a385d-81ee-df66-f147-e4dd9aa5b72e@oracle.com> <8b2ab749-20f1-8dd3-3cc7-64db5d45bc7d@redhat.com> <0aae37aa-7797-fde5-63d5-96c8eb961183@oracle.com> <86a1988a-a8d2-b6af-0985-11a94d6d76a5@redhat.com> <69510788-52e6-815b-1ed7-a6f4886d0398@oracle.com> <3e3c4f7d-049e-4aec-c165-f2664e7c98ef@redhat.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A50345@FMSMSX126.amr.corp.intel.com> It will be wonderful to have persistent MappedByteBuffer feature proposed by Andrew Dinn in JDK 13. To us it looks to be a seamless extension to the existing API, provides a very good building block for persistent memory support in Java in the current Java paradigm and is directly applicable to a class of workloads. Many Big Data frameworks like Apache HBASE use FileChannel map and MappedByteBuffer as the underlying mechanism and so can use the proposed feature to utilize non-volatile memory. We have also reviewed and provided initial feedback to Andrew on the implementation. Best Regards, Sandhya -----Original Message----- From: Andrew Dinn [mailto:adinn at redhat.com] Sent: Wednesday, January 16, 2019 3:23 AM To: Alan Bateman ; Brian Goetz Cc: core-libs-dev at openjdk.java.net; hotspot compiler ; Jonathan Halliday ; Viswanathan, Sandhya Subject: Re: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory Hi Alan/Brian, I have finally been able to shelve other commitments and return to this JEP (apologies for the hiatus). https://openjdk.java.net/jeps/8207851 The JEP has been reviewed positively by Stuart Marks (core libs) and Vladimir Kozlov (intrinsics). It has also been warmly welcomed by several potential users in Red Hat and Intel (including, respectively, Jonathan Halliday and Sandya Viswanathan both in cc). I believe I have addressed all outstanding comments on the JEP per se, including those made by Alan. Is it now possible for one of you to endorse the JEP so it can be submitted? I am aware that I still need to address a few details in the draft implementation that are not present in the latest webrev. I believe there are two changes requested by Vladimir: 1. correct the type of cache writeback memory nodes to generic memory 2. use the JVM to inject a flag setting which enables/disables mapping of persistent buffers and also one change requested by Alan: make method MappedByteBuffer.isPersistent private rather than public Is there any other impediment to submitting the JEP and proceeding to code review? regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From vladimir.kozlov at oracle.com Thu Jan 17 01:40:19 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 16 Jan 2019 17:40:19 -0800 Subject: [12] RFR(S) 8215317: [GRAAL] unit test CheckGraalIntrinsics failed after 8213754 In-Reply-To: <23eb2db4-fc81-e5e4-4c8f-67a617426353@linux.vnet.ibm.com> References: <5df6d8d0-4fba-4515-ffee-870cf8cff9d3@oracle.com> <23eb2db4-fc81-e5e4-4c8f-67a617426353@linux.vnet.ibm.com> Message-ID: <7de3a2e1-5dc2-2d23-aec9-92085d7d7cff@oracle.com> Hi Gustavo, You should combine both changes and request them at the same time. Changeset will have to list both changes. Originally your changes should include CheckGraalIntrinsics.java fix. Note, for 8213754 corresponding Graal test fix is 8215317 (and not 8215687, that one is for 8212043 Math.min/max). Yes, Graal fix in jdk11u is different - new intrinsics should be listed under existing condition isJDK11OrHigher(): http://hg.openjdk.java.net/jdk-updates/jdk11u/file/5fc74655f16d/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l371 Regards, Vladimir On 1/16/19 1:57 PM, Gustavo Romero wrote: > Hi Vladimir, > > I would like to request the approval to backport the change: > > 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace > https://bugs.openjdk.java.net/browse/JDK-8213754 > > to jdk11u, but if it gets integrated before 8215687 it will break Graal > test HotspotTest.java/CheckGraalIntrinsics.java again, as expected. > > Are you fine if I request the approval to backport first this change, i.e. > 8215687? > > Actually I'll have to tweak a bit and s/isJDK12OrHigher/isJDK11OrHigher/, > right? > > Thank you. > > Best regards, > Gustavo > > On 12/13/2018 02:10 AM, Vladimir Kozlov wrote: >> https://bugs.openjdk.java.net/browse/JDK-8215317 >> >> JDK-8213754 added new intrinsics which cause Graal's unit test failure. >> >> CheckGraalIntrinsics test is adjusted for new intrinsics: >> >> src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java >> >> @@ -376,6 +376,14 @@ >> ????????????????????????????? "jdk/jfr/internal/JVM.getEventWriter()Ljava/lang/Object;"); >> ????????? } >> >> +??????? if (isJDK12OrHigher()) { >> +??????????? add(toBeInvestigated, >> +??????????????????????????? "java/lang/CharacterDataLatin1.isDigit(I)Z", >> +??????????????????????????? "java/lang/CharacterDataLatin1.isLowerCase(I)Z", >> +??????????????????????????? "java/lang/CharacterDataLatin1.isUpperCase(I)Z", >> +??????????????????????????? "java/lang/CharacterDataLatin1.isWhitespace(I)Z"); >> +??????? } >> + >> ????????? if (!config.inlineNotify()) { >> ????????????? add(ignore, "java/lang/Object.notify()V"); >> ????????? } >> >> Tested tier1 and tier3-graal (where test is run). >> >> I also pushed changes into Lab's Graal repo so this test will be updated during next sync. >> But I want to push fix into JDK because JDK 12 will be forked very soon. >> > From Pengfei.Li at arm.com Thu Jan 17 02:06:59 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Thu, 17 Jan 2019 02:06:59 +0000 Subject: [aarch64-port-dev ] RFR(M): 8212043: Add floating-point Math.min/max intrinsics In-Reply-To: References: <5bf1c593-2e96-8a10-88c6-98afdd9a04f2@redhat.com> <0c7de175-17d8-f3f5-a47b-2b9b3f45af71@redhat.com> <1e7af2c4-8610-2ee9-9955-298ffb715fa7@redhat.com> <06048878-effe-7d24-bb87-b140e662aeb8@redhat.com> <7c97719b-e83a-ba40-43a3-8cec8273df1c@redhat.com> <3df16666-a10b-41bb-7439-b967e1d76735@redhat.com> <4a10fa17-197b-2da9-7890-9544a407832f@redhat.com> Message-ID: Hi Jatin, > test/hotspot/jtreg/compiler/intrinsics/math/TestFpMinMaxIntrinsics.java > > Can you kindly add this test case, I?m working on supporting these new intrinsics for X86 platform and will like to use the test case you created. Thanks for pointing out. That newly created test file is really missing when pushed. But I'm NOT a committer so I can't do it either. Perhaps you could just use the file to test and upstream it together with your code change. I think each file I uploaded to my cr.openjdk.java.net is authorized to use. -- Thanks, Pengfei From mikael.vidstedt at oracle.com Thu Jan 17 02:22:43 2019 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Wed, 16 Jan 2019 18:22:43 -0800 Subject: RFR (XS): 8217266: Remove dead LIR_List::compare_to and LIR_Code::lir_compare_to Message-ID: <39E5F5F4-598E-46C8-8BAD-C95D16439EDA@oracle.com> Please review this small change which removes some long since dead code: bug: https://bugs.openjdk.java.net/browse/JDK-8217266 webrev: http://cr.openjdk.java.net/~mikael/webrevs/8217266/webrev.00/open/webrev/ I went back to see when the method was last actively used, but it?s at least 10 years ago (it was already dead in hsx14 from 2009), so suffice to say it?s been a while. Running tier1 now for good luck. Cheers, Mikael -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nick.Gasson at arm.com Thu Jan 17 06:51:10 2019 From: Nick.Gasson at arm.com (Nick Gasson (Arm Technology China)) Date: Thu, 17 Jan 2019 06:51:10 +0000 Subject: [aarch64-port-dev ] RFR: 8216350: AArch64: monitor unlock fast path not called In-Reply-To: References: <680089e7-ec26-a4cd-6143-4d36182e971a@arm.com> <51023960-0e8f-56aa-20a4-279017251585@redhat.com> Message-ID: <2c32a14b-151e-5d05-5de0-07a984727f20@arm.com> Thanks Derek! Is there anyone who can help me push this? (BTW in the last webrev I removed the Contributed-by line and added my hg username, hope this is correct...) Nick On 17/01/2019 03:44, Derek White wrote: > Hi Nick, > > Looks good to me! > > - Derek > >> -----Original Message----- >> From: Nick Gasson (Arm Technology China) >> Sent: Thursday, January 10, 2019 9:37 PM >> To: Andrew Haley ; Derek White >> ; hotspot-compiler-dev at openjdk.java.net compiler >> >> Cc: nd ; aarch64-port-dev at openjdk.java.net >> Subject: [EXT] Re: [aarch64-port-dev ] RFR: 8216350: AArch64: monitor >> unlock fast path not called >> >> External Email >> >> ---------------------------------------------------------------------- >> Hi all, >> >> On 09/01/2019 17:23, Andrew Haley wrote: >>> >>> HotSpot policy is that we can do minor cleanups as we go along: >>> experience has shown that unless you do so, cruft tends to accumulate. >>> These cleanups are OK for this patch. >>> >> >> Please see the updated webrev here: >> >> http://cr.openjdk.java.net/~ngasson/8216350/webrev.1/ >> >> Includes cleanups according to Derek's comments and updated the copyright >> year (thanks Felix). >> >>> 4) Slightly better comment for last instruction of fast_unlock (and >> explicitly use zr). >>> __ stlr(zr, tmp); // set unowned >> >> Note I needed to change the definition of load_store_exclusive to allow ZR >> here. I've checked that this is OK for the other instructions that use this. >> >> Thanks, >> Nick From rwestrel at redhat.com Thu Jan 17 08:33:08 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 17 Jan 2019 09:33:08 +0100 Subject: RFR (XS): 8217266: Remove dead LIR_List::compare_to and LIR_Code::lir_compare_to In-Reply-To: <39E5F5F4-598E-46C8-8BAD-C95D16439EDA@oracle.com> References: <39E5F5F4-598E-46C8-8BAD-C95D16439EDA@oracle.com> Message-ID: <877ef3wunv.fsf@redhat.com> > webrev: http://cr.openjdk.java.net/~mikael/webrevs/8217266/webrev.00/open/webrev/ That looks good to me. Roland. From aph at redhat.com Thu Jan 17 09:16:57 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 17 Jan 2019 09:16:57 +0000 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions In-Reply-To: References: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> <81dc5407-4985-883c-83fc-2e1fa2b77e66@redhat.com> Message-ID: On 1/16/19 8:46 PM, B. Blaser wrote: > To answer Andrew Haley, one of the major difference between CISC and > RISC is specifically the load/store architecture of the latter which > is part of most instructions of the former; I don't see many good > reasons to generate RISC-like load/store code using only a subset of > instructions and to juggle with registers. Well, yes, but the question remains: does this change actually help anything. And if it does, by how much? All we have now is > I cannot say if if this has performance implication. I suspect > not. If it has, it's probably miniscule improvement. I can't see how > it could be worse though. We can measure, and we should. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Jan 17 09:36:01 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 17 Jan 2019 09:36:01 +0000 Subject: [aarch64-port-dev ] RFR: 8216350: AArch64: monitor unlock fast path not called In-Reply-To: <2c32a14b-151e-5d05-5de0-07a984727f20@arm.com> References: <680089e7-ec26-a4cd-6143-4d36182e971a@arm.com> <51023960-0e8f-56aa-20a4-279017251585@redhat.com> <2c32a14b-151e-5d05-5de0-07a984727f20@arm.com> Message-ID: <4f470d09-bc37-55b6-f42d-373934d5aba4@redhat.com> On 1/17/19 6:51 AM, Nick Gasson (Arm Technology China) wrote: > Thanks Derek! Is there anyone who can help me push this? (BTW in the > last webrev I removed the Contributed-by line and added my hg username, > hope this is correct...) We need more committers. These people have contributed to the AArch64 port: adinn aph avoitylov bulasevich coleenp dchuyko dholmes dlong dpochepk dsamersoff egahlin eosterlund erikj fyang gdub goetz gziemski hseigel ihse iveresov jcbeyler jcm jwilhelm kbarrett kvn lana lfoltan lucy mbaesken mdoerr mikael njian pli pliden prr rehn rkennke roland rraghavan shade smonteith stefank stuefe thartmann tschatzl vlivanov yzhang zyao I see that we have quite a few authors without committer access. njian has 16 committed patches by now, and should surely be a committer. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From felix.yang at huawei.com Thu Jan 17 12:21:32 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Thu, 17 Jan 2019 12:21:32 +0000 Subject: [aarch64-port-dev ] RFR: 8216350: AArch64: monitor unlock fast path not called In-Reply-To: <2c32a14b-151e-5d05-5de0-07a984727f20@arm.com> References: <680089e7-ec26-a4cd-6143-4d36182e971a@arm.com> <51023960-0e8f-56aa-20a4-279017251585@redhat.com> <2c32a14b-151e-5d05-5de0-07a984727f20@arm.com> Message-ID: Hi, As this patch changes one testcase, to be conservative, I submitted the patch to the submit repo last week: http://hg.openjdk.java.net/jdk/submit/rev/7dfc2583c8b9 The Email I got shows that it passed all the oracle internal tests. Will push the patch. Thanks, Felix > > Thanks Derek! Is there anyone who can help me push this? (BTW in the > last webrev I removed the Contributed-by line and added my hg username, > hope this is correct...) > From Alan.Bateman at oracle.com Thu Jan 17 12:53:54 2019 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Thu, 17 Jan 2019 12:53:54 +0000 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <3e3c4f7d-049e-4aec-c165-f2664e7c98ef@redhat.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <600e8c67-80a4-fef5-b441-72c51c6ccddb@oracle.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <09c92b1a-c6da-e16b-bb68-553876e8a6ea@oracle.com> <2a0a385d-81ee-df66-f147-e4dd9aa5b72e@oracle.com> <8b2ab749-20f1-8dd3-3cc7-64db5d45bc7d@redhat.com> <0aae37aa-7797-fde5-63d5-96c8eb961183@oracle.com> <86a1988a-a8d2-b6af-0985-11a94d6d76a5@redhat.com> <69510788-52e6-815b-1ed7-a6f4886d0398@oracle.com> <3e3c4f7d-049e-4aec-c165-f2664e7c98ef@redhat.com> Message-ID: <34cfc530-8517-ac1a-0c04-446dc3dc2436@oracle.com> On 16/01/2019 11:23, Andrew Dinn wrote: > Hi Alan/Brian, > > I have finally been able to shelve other commitments and return to this > JEP (apologies for the hiatus). > > https://openjdk.java.net/jeps/8207851 > > The JEP has been reviewed positively by Stuart Marks (core libs) and > Vladimir Kozlov (intrinsics). It has also been warmly welcomed by > several potential users in Red Hat and Intel (including, respectively, > Jonathan Halliday and Sandya Viswanathan both in cc). I think the proposal is good as a short term/tactical solution, especially as you were able to reduce the API surface down to new FileChannel map modes. I think it can be looked at again once Project Panama is further along and there is some notion of "memory region" that is backed by NVM. I skimmed through the current draft. In the most recent discussion then I think we had converged on "SYNC" rather than "PERSISTENT", the reasoning being that there is persistence already with regular file mapped files, also it aligns with the MAP_SYNC flag to mmap. I don't recall if the discussion on isPersistent concluded, that was more of a naming issue and whether you include an isXXX method or not is not critical to the proposal. The overload of the force method to specify a range is a good addition, irrespective of the JEP. One thing to clarify is the heading "Proposed Restricted Public JDK API Changes". The proposal (and the early webrevs) exposed writebackMemory in the internal Unsafe, not sun.misc.Unsafe, which I think is right. This makes it a JDK internal API so it doesn't need to be in JEP. Did you get any feedback on the Testing section? Given that the feature needs special hardware then it will need commitment to test is on a regular basis. It's a similar issue to the draft "JEP 337: RDMA Network Sockets" where special hardware is needed to full test the feature. In the case of JEP 337 then some testing with emulation is possible. Vladimir and I have reviewed the JEP, it will need an area lead to endorse, I think it can be Brian or Mikael in this case. -Alan From martin.doerr at sap.com Thu Jan 17 13:18:13 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 17 Jan 2019 13:18:13 +0000 Subject: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: <301fd43a-e5b5-d970-7a1a-2458dbaeec36@linux.vnet.ibm.com> References: <1c4646d554954551b73c077fa40f983d@sap.com> <771a457e-6b4d-f73b-e072-703490c9ced5@linux.vnet.ibm.com> <9863276de30643338249ead2a6ac7fe9@sap.com> <452dcb69-189e-700d-5995-582ba13669b9@linux.vnet.ibm.com> <37d9e6f3b2b4400d8963f54d2fe7767f@sap.com> <406db3e3-2ac3-dc16-f384-99a314e62a42@linux.vnet.ibm.com> <180a6c0b-7abe-9d5c-51e6-dffbb23570d3@linux.vnet.ibm.com> <301fd43a-e5b5-d970-7a1a-2458dbaeec36@linux.vnet.ibm.com> Message-ID: Hi, the rebased webrev.01 applies on jdk/jdk, now (after JDK-8216376). So the issue Gustavo had observed does not longer exist. http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ I have updated copyrights and retested it. Best regards, Martin -----Original Message----- From: Gustavo Romero Sent: Montag, 7. Januar 2019 14:52 To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays Hi Martin, On 01/07/2019 11:49 AM, Doerr, Martin wrote: > I want to check all places where we use "mr(R1_SP, R21_sender_SP)". There may be more issues with that. I'll probably handle that in a separate change and push this CRC change afterwards. I see. Thanks for letting me know. Best regards, Gustavo > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero > Sent: Freitag, 4. Januar 2019 19:55 > To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays > > Hi Martin, > > On 01/04/2019 02:13 PM, Doerr, Martin wrote: >> Hi Gustavo, >> >> when called from the interpreter (the scenario you observed), R21 is set before resizing the frame to avoid wasted stack space (InterpreterMacroAssembler::call_from_interpreter). > > Got it. Thanks a lot for the explanations. > > I think it doesn't currently matter in practice, but I'm wondering if to be > consistent we should cut back the stack back earlier also in > TemplateInterpreterGenerator::generate_CRC32_update_entry()? > > diff -r a35f8c35d8c9 src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp > --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan 04 10:09:00 2019 +0100 > +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan 04 13:44:37 2019 -0500 > @@ -1840,11 +1840,12 @@ > #endif > __ lwz(crc, 2*wordSize, argP); // Current crc state, zero extend to 64 bit to have a clean register. > > + // Restore caller sp for c2i case and return. > + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. > + > StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); > __ kernel_crc32_singleByte(crc, data, dataLen, table, tmp, true); > > - // Restore caller sp for c2i case and return. > - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. > __ blr(); > > // Generate a vanilla native entry as the slow path. > > Currently there is no issue probably because generated code is simpler and does > no spills. > > Best regards, > Gustavo > >> When called from compiled methods, R21 is set by a c2i adapter which extends the compiled frame by space for arguments (gen_c2i_adapter). >> >> "mr(R1_SP, R21_sender_SP)" is more error-prone than "resize_frame_absolute" so I think the latter would be better (though it takes more registers and instructions), but I don't want to replace that as part of this CRC change. >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero >> Sent: Freitag, 4. Januar 2019 14:44 >> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >> >> Hi Martin, >> >> On 01/04/2019 07:30 AM, Doerr, Martin wrote: >>> thank you very much for confirming. This makes sense. We use different frame headers depending on whether the frame is the top Java frame or not (and on whether it's a debug build or not). Setting R1_SP to sender_SP is a shortcut for leaf calls which relies on having an unmodified stack until this point. So the patch fixes the issue. >> >> Glad to help! Thanks for the additional information, I was not aware that the >> selection of different frame headers could be done at compile time. One last >> question only for my education: what exactly advanced (incremented) R1_SP so it >> has to be cut back using sender_SP value, i.e. sender_SP tracks the frame for >> which function exactly or "who" is the caller exactly here? >> >> Thank you. >> >> Best regards, >> Gustavo >> >>> New webrev: >>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero >>> Sent: Donnerstag, 3. Januar 2019 19:36 >>> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >>> >>> Hi Martin, >>> >>> On 01/03/2019 03:34 PM, Doerr, Martin wrote: >>>> Unfortunately, I can't reproduce the crash. TestCRC32C works stable on our machine (with fastdbg build). >>>> I guess that the frameless spills mess up the stack. Can you check if the patch below helps? >>> >>> Thanks for providing a fix so I can try it. >>> Yes, I confirm the patch below indeed fixes the sigsegv crash when CRC32C update() method is used. >>> I also confirm that I don't observe the crash on the fastdebug build, only on the release build. >>> It also only affects the Interpreter mode, so passing -Xcomp avoids the crash on the release build. >>> >>> Just as reference, I can reproduce it on the release build with the following trivial code: >>> >>> import java.util.zip.CRC32C; >>> >>> class CRC32C_v1 { >>> public static void main(String[] arg) { >>> byte[] b = new byte[1024]; >>> >>> CRC32C crc32c = new CRC32C(); >>> crc32c.update(b, 0, b.length); >>> >>> System.out.println(crc32c.getValue()); >>> } >>> } >>> >>> Thanks for fixing the typos. >>> >>> >>> Best regards, >>> Gustavo >>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> diff -r a33f49d5998c src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp >>>> --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 17:30:03 2019 +0100 >>>> +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 18:33:16 2019 +0100 >>>> @@ -1924,6 +1924,9 @@ >>>> __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); >>>> } >>>> >>>> + // Restore caller sp for c2i case. >>>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>>> + >>>> StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); >>>> >>>> if (!VM_Version::has_vpmsumb()) { >>>> @@ -1933,8 +1936,6 @@ >>>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, true); >>>> } >>>> >>>> - // Restore caller sp for c2i case and return. >>>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>>> __ blr(); >>>> >>>> // Generate a vanilla native entry as the slow path. >>>> @@ -2014,6 +2015,9 @@ >>>> __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); >>>> } >>>> >>>> + // Restore caller sp for c2i case. >>>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>>> + >>>> StubRoutines::ppc64::generate_load_crc32c_table_addr(_masm, table); >>>> >>>> if (!VM_Version::has_vpmsumb()) { >>>> @@ -2023,8 +2027,6 @@ >>>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, false); >>>> } >>>> >>>> - // Restore caller sp for c2i case and return. >>>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>>> __ blr(); >>>> >>>> BLOCK_COMMENT("} CRC32C_update{Bytes|DirectByteBuffer}"); >>>> >>>> >>>> -----Original Message----- >>>> From: Gustavo Romero >>>> Sent: Donnerstag, 3. Januar 2019 17:13 >>>> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >>>> >>>> Hi Martin, >>>> >>>> oh that's nice. You removed the 512-byte block constraint and also wired it up to the Interpreter :) >>>> >>>> For the worst case, unaligned 512 byte array, I see the gap to aligned 512 byte array reduced by about ~5.7x. >>>> >>>> On the Interpreter I see an improvement of at least 50% for 1024 bytes. >>>> >>>> This is all for the CRC32 class. >>>> >>>> On CRC32C I'm getting a SIGSEV that can be reproduced running against ./test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java. >>>> >>>> I've upload a full log into http://cr.openjdk.java.net/~gromero/logs/crc32c_sigsegv/ >>>> >>>> I'm leaving for the lunch and I'll take a closer look when back. But probably you will figure it out before I sit to appreciate the meal :) >>>> >>>> Finally, since the change does some cleanup, I wonder if it would be worth fixing the following typos: >>>> >>>> I think it's Barrett const., not Barret. Probably 'barret' is used in the code as a short version >>>> for Barrett but it should be changed in >>>> >>>> + // Point to Barret constants >>>> + add_const_optimized(cur_const, constants, outer_consts_size + inner_consts_size); >>>> + >>>> >>>> ? >>>> >>>> s/not/note/ in: >>>> cpu/ppc/macroAssembler_ppc.cpp:3977:// A not on the lookup table address(es): >>>> >>>> d/lives/ in: >>>> cpu/ppc/macroAssembler_ppc.cpp:4265: mtvrwz(VCRC, crc); // crc lives lives in VCRC, now >>>> >>>> Best regards, >>>> Gustavo >>>> >>>> On 01/03/2019 12:17 PM, Doerr, Martin wrote: >>>>> Hi, >>>>> >>>>> the JVM on PPC64 currently misses usage of the fast vector implementation in the interpreter code. >>>>> >>>>> In addition, performance is not good for short arrays (unaligned 512 byte arrays or shorter arrays) because the current vector implementation needs at least 512 bytes. >>>>> >>>>> Bug: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8216060 >>>>> >>>>> I have addressed these 2 issues + some cleanup with the following webrev: >>>>> >>>>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ >>>>> >>>>> Please review. >>>>> >>>>> Best regards, >>>>> >>>>> Martin >>>>> >>>> >>> >> > From gromero at linux.vnet.ibm.com Thu Jan 17 14:07:58 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Thu, 17 Jan 2019 12:07:58 -0200 Subject: [12] RFR(S) 8215317: [GRAAL] unit test CheckGraalIntrinsics failed after 8213754 In-Reply-To: <7de3a2e1-5dc2-2d23-aec9-92085d7d7cff@oracle.com> References: <5df6d8d0-4fba-4515-ffee-870cf8cff9d3@oracle.com> <23eb2db4-fc81-e5e4-4c8f-67a617426353@linux.vnet.ibm.com> <7de3a2e1-5dc2-2d23-aec9-92085d7d7cff@oracle.com> Message-ID: <0af27551-f6f6-28b9-b4b6-cae6427fefad@linux.vnet.ibm.com> Hi Vladimir, On 01/16/2019 11:40 PM, Vladimir Kozlov wrote: > You should combine both changes and request them at the same time. Changeset will have to list both changes. Originally your changes should include CheckGraalIntrinsics.java fix. Thanks a lot for advising. Just one question: by "Changeset will have to list both changes" you mean the commit title (or body) must say explicitly the change is a combination of 8213754 + 821531? > Note, for 8213754 corresponding Graal test fix is 8215317 (and not 8215687, that one is for 8212043 Math.min/max). Thanks for the detailed note. I commented on the right thread but pasted the wrong bug! Thank you and best regards, Gustavo > Yes, Graal fix in jdk11u is different - new intrinsics should be listed under existing condition isJDK11OrHigher(): > > http://hg.openjdk.java.net/jdk-updates/jdk11u/file/5fc74655f16d/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l371 > > Regards, > Vladimir > > On 1/16/19 1:57 PM, Gustavo Romero wrote: >> Hi Vladimir, >> >> I would like to request the approval to backport the change: >> >> 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace >> https://bugs.openjdk.java.net/browse/JDK-8213754 >> >> to jdk11u, but if it gets integrated before 8215687 it will break Graal >> test HotspotTest.java/CheckGraalIntrinsics.java again, as expected. >> >> Are you fine if I request the approval to backport first this change, i.e. >> 8215687? >> >> Actually I'll have to tweak a bit and s/isJDK12OrHigher/isJDK11OrHigher/, >> right? >> >> Thank you. >> >> Best regards, >> Gustavo >> >> On 12/13/2018 02:10 AM, Vladimir Kozlov wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8215317 >>> >>> JDK-8213754 added new intrinsics which cause Graal's unit test failure. >>> >>> CheckGraalIntrinsics test is adjusted for new intrinsics: >>> >>> src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java >>> @@ -376,6 +376,14 @@ >>> "jdk/jfr/internal/JVM.getEventWriter()Ljava/lang/Object;"); >>> } >>> >>> + if (isJDK12OrHigher()) { >>> + add(toBeInvestigated, >>> + "java/lang/CharacterDataLatin1.isDigit(I)Z", >>> + "java/lang/CharacterDataLatin1.isLowerCase(I)Z", >>> + "java/lang/CharacterDataLatin1.isUpperCase(I)Z", >>> + "java/lang/CharacterDataLatin1.isWhitespace(I)Z"); >>> + } >>> + >>> if (!config.inlineNotify()) { >>> add(ignore, "java/lang/Object.notify()V"); >>> } >>> >>> Tested tier1 and tier3-graal (where test is run). >>> >>> I also pushed changes into Lab's Graal repo so this test will be updated during next sync. >>> But I want to push fix into JDK because JDK 12 will be forked very soon. >>> >> > From adinn at redhat.com Thu Jan 17 14:27:44 2019 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 17 Jan 2019 14:27:44 +0000 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <34cfc530-8517-ac1a-0c04-446dc3dc2436@oracle.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <09c92b1a-c6da-e16b-bb68-553876e8a6ea@oracle.com> <2a0a385d-81ee-df66-f147-e4dd9aa5b72e@oracle.com> <8b2ab749-20f1-8dd3-3cc7-64db5d45bc7d@redhat.com> <0aae37aa-7797-fde5-63d5-96c8eb961183@oracle.com> <86a1988a-a8d2-b6af-0985-11a94d6d76a5@redhat.com> <69510788-52e6-815b-1ed7-a6f4886d0398@oracle.com> <3e3c4f7d-049e-4aec-c165-f2664e7c98ef@redhat.com> <34cfc530-8517-ac1a-0c04-446dc3dc2436@oracle.com> Message-ID: <21ef0e11-3f3d-e9a4-5dc6-898d4ac18efa@redhat.com> Hi Alan, Thanks for your response. On 17/01/2019 12:53, Alan Bateman wrote: > I skimmed through the current draft. In the most recent discussion then > I think we had converged on "SYNC" rather than "PERSISTENT", the > reasoning being that there is persistence already with regular file > mapped files, also it aligns with the MAP_SYNC flag to mmap. I don't > recall if the discussion on isPersistent concluded, that was more of a > naming issue and whether you include an isXXX method or not is not > critical to the proposal. The overload of the force method to specify a > range is a good addition, irrespective of the JEP. Ok, thanks. At least sync is now being used consistently in the public API. I will look at renaming internal vars/methods to use sync when I publish the next webrev. > One thing to clarify is the heading "Proposed Restricted Public JDK API > Changes". The proposal (and the early webrevs) exposed writebackMemory > in the internal Unsafe, not sun.misc.Unsafe, which I think is right. > This makes it a JDK internal API so it doesn't need to be in JEP. I am happy to remove it from the JEP if needed. Does it do any harm to leave it? > Did you get any feedback on the Testing section? Given that the feature > needs special hardware then it will need commitment to test is on a > regular basis. It's a similar issue to the draft "JEP 337: RDMA Network > Sockets" where special hardware is needed to full test the feature. In > the case of JEP 337 then some testing with emulation is possible. I believe I received no specific feedback on that topic. Some of the other Red Hat dev teams (i.e. not OpenJDK) and also dev staff at Intel are very keen to base some of their future work on this feature. So, it will certainly get tested /after/ JDK release :-) Red Hat does have the Intel hardware needed to test this feature but, so far, nothing that can be used to test on AArch64. Our OpenJDK team can access this kit for one-off testing but it is not currently available for continuous integration testing. I will propose to my manager that we acquire the relevant kit and ensure that all JDKs which implement this JEP are tested prior to release. We should also be able to test AArch64 using volatile memory to simulate a non-volatile memory device up to the point where the requisite AArch64-based NVM hardware becomes available. I am fairly confident this plan will be agreeable to the overlords whom I humbly serve. Perhaps Intel also could provide help with testing? [Sadhya, is this an option?] My bigger concern was that crash recovery tests may never be 100% reliable. A 100% guarantee requires the ability to engineer a machine crash at a precisely defined critical point of execution and some of the relevant critical locations will be embedded in the middle of JITted code making it hard to provoke the crash. So, the situations where a crash /can/ be engineered may not fully reflect those that occur in live deployments. That said, a dash of artificiality in test code is, perhaps, not so worthy of remark . . . > Vladimir and I have reviewed the JEP, it will need an area lead to > endorse, I think it can be Brian or Mikael in this case. Ok, thanks for the above answers. Looking forward to hearing further from Brian and/or Mikael (Vidstedt, I assume? :-). regards, Andrew Dinn ----------- From lutz.schmidt at sap.com Thu Jan 17 15:47:00 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Thu, 17 Jan 2019 15:47:00 +0000 Subject: RFR(S, tedious): 8217250: Optimize CodeHeap Analytics In-Reply-To: <2d7b7963-61be-95b8-017b-956f2752c8f3@oracle.com> References: <2d7b7963-61be-95b8-017b-956f2752c8f3@oracle.com> Message-ID: <9AD26B9E-015A-4BD4-A44F-12DDE2793ED0@sap.com> Hi Vladimir & all, there is a new webrev available: http://cr.openjdk.java.net/~lucy/webrevs/8217250.01/ What's new (in addition to some comments) is the macro // Flush the buffer contents if the remaining capacity is less // than the calculated threshold (256 bytes + capacity/16) // That should suffice for all reasonably sized output lines. #define BUFFEREDSTREAM_FLUSH_AUTO(_termString) \ BUFFEREDSTREAM_FLUSH_IF(_termString, 256+(_capacity>>4)) It replaced the previous BUFFEREDSTREAM_FLUSH_IF("string", 512) occurrences. Regards, Lutz ?On 16.01.19, 22:53, "Vladimir Kozlov" wrote: On 1/16/19 12:37 PM, Schmidt, Lutz wrote: > Hi Vladimir, > > thanks a lot for looking at this so quickly. > > Sure, I could declare a specialized "BUFFEREDSTREAM_FLUSH_512" for this. The "512" originated from the thought "its large enough for a well-behaved line and small enough to save some flushes". > > I was also thinking about a "BUFFEREDSTREAM_FLUSH_AUTO", where the spare space is derived from the buffer capacity, maybe something like 10 percent of the capacity, 256 bytes minimum. I wasn't sure if that could be categorized as over-engineered. Yes, I think BUFFEREDSTREAM_FLUSH_AUTO is better than fixed size. Vladimir > > Your thoughts? > > Thanks, > Lutz > > On 16.01.19, 19:10, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: > > Hi Lutz, > > I see that you have only one usage in all cases for: > BUFFEREDSTREAM_FLUSH_IF("", 512) > > Can you simple declare simplified macro for this? > > Otherwise looks good. > > Thanks, > Vladimir > > On 1/16/19 6:52 AM, Schmidt, Lutz wrote: > > Dear all, > > > > may I please have reviews for this (semantically) small change. Its purpose is to reduce the bufferedStream buffer flushes while printing CodeHeap Analytics. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8217250 > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8217250.00/ > > > > Thank you! > > Lutz > > > > > > From gromero at linux.vnet.ibm.com Thu Jan 17 16:18:39 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Thu, 17 Jan 2019 14:18:39 -0200 Subject: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: References: <1c4646d554954551b73c077fa40f983d@sap.com> <771a457e-6b4d-f73b-e072-703490c9ced5@linux.vnet.ibm.com> <9863276de30643338249ead2a6ac7fe9@sap.com> <452dcb69-189e-700d-5995-582ba13669b9@linux.vnet.ibm.com> <37d9e6f3b2b4400d8963f54d2fe7767f@sap.com> <406db3e3-2ac3-dc16-f384-99a314e62a42@linux.vnet.ibm.com> <180a6c0b-7abe-9d5c-51e6-dffbb23570d3@linux.vnet.ibm.com> <301fd43a-e5b5-d970-7a1a-2458dbaeec36@linux.vnet.ibm.com> Message-ID: Hi Martin, On 01/17/2019 11:18 AM, Doerr, Martin wrote: > Hi, > > the rebased webrev.01 applies on jdk/jdk, now (after JDK-8216376). So the issue Gustavo had observed does not longer exist. > http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ > > I have updated copyrights and retested it. I tested it when JDK-8216376 was submitted for review, but I retested this rebase from webrev.01 again on top of the most recent changes again (just in case) and all looks fine to me. Thank you. Best regards, Gustavo > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero > Sent: Montag, 7. Januar 2019 14:52 > To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' > Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays > > Hi Martin, > > On 01/07/2019 11:49 AM, Doerr, Martin wrote: >> I want to check all places where we use "mr(R1_SP, R21_sender_SP)". There may be more issues with that. I'll probably handle that in a separate change and push this CRC change afterwards. > > I see. Thanks for letting me know. > > Best regards, > Gustavo > >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: Gustavo Romero >> Sent: Freitag, 4. Januar 2019 19:55 >> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >> >> Hi Martin, >> >> On 01/04/2019 02:13 PM, Doerr, Martin wrote: >>> Hi Gustavo, >>> >>> when called from the interpreter (the scenario you observed), R21 is set before resizing the frame to avoid wasted stack space (InterpreterMacroAssembler::call_from_interpreter). >> >> Got it. Thanks a lot for the explanations. >> >> I think it doesn't currently matter in practice, but I'm wondering if to be >> consistent we should cut back the stack back earlier also in >> TemplateInterpreterGenerator::generate_CRC32_update_entry()? >> >> diff -r a35f8c35d8c9 src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp >> --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan 04 10:09:00 2019 +0100 >> +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan 04 13:44:37 2019 -0500 >> @@ -1840,11 +1840,12 @@ >> #endif >> __ lwz(crc, 2*wordSize, argP); // Current crc state, zero extend to 64 bit to have a clean register. >> >> + // Restore caller sp for c2i case and return. >> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >> + >> StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); >> __ kernel_crc32_singleByte(crc, data, dataLen, table, tmp, true); >> >> - // Restore caller sp for c2i case and return. >> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >> __ blr(); >> >> // Generate a vanilla native entry as the slow path. >> >> Currently there is no issue probably because generated code is simpler and does >> no spills. >> >> Best regards, >> Gustavo >> >>> When called from compiled methods, R21 is set by a c2i adapter which extends the compiled frame by space for arguments (gen_c2i_adapter). >>> >>> "mr(R1_SP, R21_sender_SP)" is more error-prone than "resize_frame_absolute" so I think the latter would be better (though it takes more registers and instructions), but I don't want to replace that as part of this CRC change. >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Gustavo Romero >>> Sent: Freitag, 4. Januar 2019 14:44 >>> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >>> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >>> >>> Hi Martin, >>> >>> On 01/04/2019 07:30 AM, Doerr, Martin wrote: >>>> thank you very much for confirming. This makes sense. We use different frame headers depending on whether the frame is the top Java frame or not (and on whether it's a debug build or not). Setting R1_SP to sender_SP is a shortcut for leaf calls which relies on having an unmodified stack until this point. So the patch fixes the issue. >>> >>> Glad to help! Thanks for the additional information, I was not aware that the >>> selection of different frame headers could be done at compile time. One last >>> question only for my education: what exactly advanced (incremented) R1_SP so it >>> has to be cut back using sender_SP value, i.e. sender_SP tracks the frame for >>> which function exactly or "who" is the caller exactly here? >>> >>> Thank you. >>> >>> Best regards, >>> Gustavo >>> >>>> New webrev: >>>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Gustavo Romero >>>> Sent: Donnerstag, 3. Januar 2019 19:36 >>>> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >>>> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >>>> >>>> Hi Martin, >>>> >>>> On 01/03/2019 03:34 PM, Doerr, Martin wrote: >>>>> Unfortunately, I can't reproduce the crash. TestCRC32C works stable on our machine (with fastdbg build). >>>>> I guess that the frameless spills mess up the stack. Can you check if the patch below helps? >>>> >>>> Thanks for providing a fix so I can try it. >>>> Yes, I confirm the patch below indeed fixes the sigsegv crash when CRC32C update() method is used. >>>> I also confirm that I don't observe the crash on the fastdebug build, only on the release build. >>>> It also only affects the Interpreter mode, so passing -Xcomp avoids the crash on the release build. >>>> >>>> Just as reference, I can reproduce it on the release build with the following trivial code: >>>> >>>> import java.util.zip.CRC32C; >>>> >>>> class CRC32C_v1 { >>>> public static void main(String[] arg) { >>>> byte[] b = new byte[1024]; >>>> >>>> CRC32C crc32c = new CRC32C(); >>>> crc32c.update(b, 0, b.length); >>>> >>>> System.out.println(crc32c.getValue()); >>>> } >>>> } >>>> >>>> Thanks for fixing the typos. >>>> >>>> >>>> Best regards, >>>> Gustavo >>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> >>>>> diff -r a33f49d5998c src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp >>>>> --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 17:30:03 2019 +0100 >>>>> +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu Jan 03 18:33:16 2019 +0100 >>>>> @@ -1924,6 +1924,9 @@ >>>>> __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); >>>>> } >>>>> >>>>> + // Restore caller sp for c2i case. >>>>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>>>> + >>>>> StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); >>>>> >>>>> if (!VM_Version::has_vpmsumb()) { >>>>> @@ -1933,8 +1936,6 @@ >>>>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, true); >>>>> } >>>>> >>>>> - // Restore caller sp for c2i case and return. >>>>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>>>> __ blr(); >>>>> >>>>> // Generate a vanilla native entry as the slow path. >>>>> @@ -2014,6 +2015,9 @@ >>>>> __ addi(data, data, arrayOopDesc::base_offset_in_bytes(T_BYTE)); >>>>> } >>>>> >>>>> + // Restore caller sp for c2i case. >>>>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>>>> + >>>>> StubRoutines::ppc64::generate_load_crc32c_table_addr(_masm, table); >>>>> >>>>> if (!VM_Version::has_vpmsumb()) { >>>>> @@ -2023,8 +2027,6 @@ >>>>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, tc0, tc1, tc2, false); >>>>> } >>>>> >>>>> - // Restore caller sp for c2i case and return. >>>>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller started. >>>>> __ blr(); >>>>> >>>>> BLOCK_COMMENT("} CRC32C_update{Bytes|DirectByteBuffer}"); >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Gustavo Romero >>>>> Sent: Donnerstag, 3. Januar 2019 17:13 >>>>> To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' >>>>> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays >>>>> >>>>> Hi Martin, >>>>> >>>>> oh that's nice. You removed the 512-byte block constraint and also wired it up to the Interpreter :) >>>>> >>>>> For the worst case, unaligned 512 byte array, I see the gap to aligned 512 byte array reduced by about ~5.7x. >>>>> >>>>> On the Interpreter I see an improvement of at least 50% for 1024 bytes. >>>>> >>>>> This is all for the CRC32 class. >>>>> >>>>> On CRC32C I'm getting a SIGSEV that can be reproduced running against ./test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java. >>>>> >>>>> I've upload a full log into http://cr.openjdk.java.net/~gromero/logs/crc32c_sigsegv/ >>>>> >>>>> I'm leaving for the lunch and I'll take a closer look when back. But probably you will figure it out before I sit to appreciate the meal :) >>>>> >>>>> Finally, since the change does some cleanup, I wonder if it would be worth fixing the following typos: >>>>> >>>>> I think it's Barrett const., not Barret. Probably 'barret' is used in the code as a short version >>>>> for Barrett but it should be changed in >>>>> >>>>> + // Point to Barret constants >>>>> + add_const_optimized(cur_const, constants, outer_consts_size + inner_consts_size); >>>>> + >>>>> >>>>> ? >>>>> >>>>> s/not/note/ in: >>>>> cpu/ppc/macroAssembler_ppc.cpp:3977:// A not on the lookup table address(es): >>>>> >>>>> d/lives/ in: >>>>> cpu/ppc/macroAssembler_ppc.cpp:4265: mtvrwz(VCRC, crc); // crc lives lives in VCRC, now >>>>> >>>>> Best regards, >>>>> Gustavo >>>>> >>>>> On 01/03/2019 12:17 PM, Doerr, Martin wrote: >>>>>> Hi, >>>>>> >>>>>> the JVM on PPC64 currently misses usage of the fast vector implementation in the interpreter code. >>>>>> >>>>>> In addition, performance is not good for short arrays (unaligned 512 byte arrays or shorter arrays) because the current vector implementation needs at least 512 bytes. >>>>>> >>>>>> Bug: >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8216060 >>>>>> >>>>>> I have addressed these 2 issues + some cleanup with the following webrev: >>>>>> >>>>>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ >>>>>> >>>>>> Please review. >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Martin >>>>>> >>>>> >>>> >>> >> > From vladimir.kozlov at oracle.com Thu Jan 17 18:12:42 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 17 Jan 2019 10:12:42 -0800 Subject: RFR (XS): 8217266: Remove dead LIR_List::compare_to and LIR_Code::lir_compare_to In-Reply-To: <877ef3wunv.fsf@redhat.com> References: <39E5F5F4-598E-46C8-8BAD-C95D16439EDA@oracle.com> <877ef3wunv.fsf@redhat.com> Message-ID: +1 Vladimir On 1/17/19 12:33 AM, Roland Westrelin wrote: > >> webrev: http://cr.openjdk.java.net/~mikael/webrevs/8217266/webrev.00/open/webrev/ > > That looks good to me. > > Roland. > From vladimir.kozlov at oracle.com Thu Jan 17 18:37:13 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 17 Jan 2019 10:37:13 -0800 Subject: [12] RFR(S) 8215317: [GRAAL] unit test CheckGraalIntrinsics failed after 8213754 In-Reply-To: <0af27551-f6f6-28b9-b4b6-cae6427fefad@linux.vnet.ibm.com> References: <5df6d8d0-4fba-4515-ffee-870cf8cff9d3@oracle.com> <23eb2db4-fc81-e5e4-4c8f-67a617426353@linux.vnet.ibm.com> <7de3a2e1-5dc2-2d23-aec9-92085d7d7cff@oracle.com> <0af27551-f6f6-28b9-b4b6-cae6427fefad@linux.vnet.ibm.com> Message-ID: On 1/17/19 6:07 AM, Gustavo Romero wrote: > Hi Vladimir, > > On 01/16/2019 11:40 PM, Vladimir Kozlov wrote: >> You should combine both changes and request them at the same time. Changeset will have to list >> both changes. Originally your changes should include CheckGraalIntrinsics.java fix. > > Thanks a lot for advising. Just one question: by "Changeset will have to > list both changes" you mean the commit title (or body) must say explicitly > the change is a combination of 8213754 + 821531? Separate lines per bug subject. Here is example: http://hg.openjdk.java.net/jdk/jdk/rev/c139884bd80e 8213348: jdk.internal.vm.compiler.management service providers missing in module descriptor 8211781: re-building fails after changing Graal sources Reviewed-by: erikj, mchung Vladimir > > >> Note, for 8213754 corresponding Graal test fix is 8215317 (and not 8215687, that one is for >> 8212043 Math.min/max). > > Thanks for the detailed note. I commented on the right thread but pasted > the wrong bug! > > Thank you and best regards, > Gustavo > >> Yes, Graal fix in jdk11u is different - new intrinsics should be listed under existing condition >> isJDK11OrHigher(): >> >> http://hg.openjdk.java.net/jdk-updates/jdk11u/file/5fc74655f16d/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l371 >> >> >> Regards, >> Vladimir >> >> On 1/16/19 1:57 PM, Gustavo Romero wrote: >>> Hi Vladimir, >>> >>> I would like to request the approval to backport the change: >>> >>> 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace >>> https://bugs.openjdk.java.net/browse/JDK-8213754 >>> >>> to jdk11u, but if it gets integrated before 8215687 it will break Graal >>> test HotspotTest.java/CheckGraalIntrinsics.java again, as expected. >>> >>> Are you fine if I request the approval to backport first this change, i.e. >>> 8215687? >>> >>> Actually I'll have to tweak a bit and s/isJDK12OrHigher/isJDK11OrHigher/, >>> right? >>> >>> Thank you. >>> >>> Best regards, >>> Gustavo >>> >>> On 12/13/2018 02:10 AM, Vladimir Kozlov wrote: >>>> https://bugs.openjdk.java.net/browse/JDK-8215317 >>>> >>>> JDK-8213754 added new intrinsics which cause Graal's unit test failure. >>>> >>>> CheckGraalIntrinsics test is adjusted for new intrinsics: >>>> >>>> src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java >>>> >>>> @@ -376,6 +376,14 @@ >>>> ????????????????????????????? "jdk/jfr/internal/JVM.getEventWriter()Ljava/lang/Object;"); >>>> ????????? } >>>> >>>> +??????? if (isJDK12OrHigher()) { >>>> +??????????? add(toBeInvestigated, >>>> +??????????????????????????? "java/lang/CharacterDataLatin1.isDigit(I)Z", >>>> +??????????????????????????? "java/lang/CharacterDataLatin1.isLowerCase(I)Z", >>>> +??????????????????????????? "java/lang/CharacterDataLatin1.isUpperCase(I)Z", >>>> +??????????????????????????? "java/lang/CharacterDataLatin1.isWhitespace(I)Z"); >>>> +??????? } >>>> + >>>> ????????? if (!config.inlineNotify()) { >>>> ????????????? add(ignore, "java/lang/Object.notify()V"); >>>> ????????? } >>>> >>>> Tested tier1 and tier3-graal (where test is run). >>>> >>>> I also pushed changes into Lab's Graal repo so this test will be updated during next sync. >>>> But I want to push fix into JDK because JDK 12 will be forked very soon. >>>> >>> >> > From vladimir.kozlov at oracle.com Thu Jan 17 18:39:34 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 17 Jan 2019 10:39:34 -0800 Subject: RFR(S, tedious): 8217250: Optimize CodeHeap Analytics In-Reply-To: <9AD26B9E-015A-4BD4-A44F-12DDE2793ED0@sap.com> References: <2d7b7963-61be-95b8-017b-956f2752c8f3@oracle.com> <9AD26B9E-015A-4BD4-A44F-12DDE2793ED0@sap.com> Message-ID: <4e61a8a5-3c6e-3c4f-0a2c-68d4b8bc2f9f@oracle.com> Looks good Thanks, Vladimir On 1/17/19 7:47 AM, Schmidt, Lutz wrote: > Hi Vladimir & all, > there is a new webrev available: http://cr.openjdk.java.net/~lucy/webrevs/8217250.01/ > What's new (in addition to some comments) is the macro > > // Flush the buffer contents if the remaining capacity is less > // than the calculated threshold (256 bytes + capacity/16) > // That should suffice for all reasonably sized output lines. > #define BUFFEREDSTREAM_FLUSH_AUTO(_termString) \ > BUFFEREDSTREAM_FLUSH_IF(_termString, 256+(_capacity>>4)) > > It replaced the previous BUFFEREDSTREAM_FLUSH_IF("string", 512) occurrences. > Regards, > Lutz > > ?On 16.01.19, 22:53, "Vladimir Kozlov" wrote: > > On 1/16/19 12:37 PM, Schmidt, Lutz wrote: > > Hi Vladimir, > > > > thanks a lot for looking at this so quickly. > > > > Sure, I could declare a specialized "BUFFEREDSTREAM_FLUSH_512" for this. The "512" originated from the thought "its large enough for a well-behaved line and small enough to save some flushes". > > > > I was also thinking about a "BUFFEREDSTREAM_FLUSH_AUTO", where the spare space is derived from the buffer capacity, maybe something like 10 percent of the capacity, 256 bytes minimum. I wasn't sure if that could be categorized as over-engineered. > > Yes, I think BUFFEREDSTREAM_FLUSH_AUTO is better than fixed size. > > Vladimir > > > > > Your thoughts? > > > > Thanks, > > Lutz > > > > On 16.01.19, 19:10, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: > > > > Hi Lutz, > > > > I see that you have only one usage in all cases for: > > BUFFEREDSTREAM_FLUSH_IF("", 512) > > > > Can you simple declare simplified macro for this? > > > > Otherwise looks good. > > > > Thanks, > > Vladimir > > > > On 1/16/19 6:52 AM, Schmidt, Lutz wrote: > > > Dear all, > > > > > > may I please have reviews for this (semantically) small change. Its purpose is to reduce the bufferedStream buffer flushes while printing CodeHeap Analytics. > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8217250 > > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8217250.00/ > > > > > > Thank you! > > > Lutz > > > > > > > > > > > > From gromero at linux.vnet.ibm.com Thu Jan 17 18:44:01 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Thu, 17 Jan 2019 16:44:01 -0200 Subject: [12] RFR(S) 8215317: [GRAAL] unit test CheckGraalIntrinsics failed after 8213754 In-Reply-To: References: <5df6d8d0-4fba-4515-ffee-870cf8cff9d3@oracle.com> <23eb2db4-fc81-e5e4-4c8f-67a617426353@linux.vnet.ibm.com> <7de3a2e1-5dc2-2d23-aec9-92085d7d7cff@oracle.com> <0af27551-f6f6-28b9-b4b6-cae6427fefad@linux.vnet.ibm.com> Message-ID: On 01/17/2019 04:37 PM, Vladimir Kozlov wrote: > On 1/17/19 6:07 AM, Gustavo Romero wrote: >> Hi Vladimir, >> >> On 01/16/2019 11:40 PM, Vladimir Kozlov wrote: >>> You should combine both changes and request them at the same time. Changeset will have to list both changes. Originally your changes should include CheckGraalIntrinsics.java fix. >> >> Thanks a lot for advising. Just one question: by "Changeset will have to >> list both changes" you mean the commit title (or body) must say explicitly >> the change is a combination of 8213754 + 821531? > > Separate lines per bug subject. Here is example: > > http://hg.openjdk.java.net/jdk/jdk/rev/c139884bd80e > > 8213348: jdk.internal.vm.compiler.management service providers missing in module descriptor > 8211781: re-building fails after changing Graal sources > Reviewed-by: erikj, mchung Got it. I'll send the change to jdk-updates for review before tagging the bugs with "jdk11u-fix-request" so. Thanks, Vladimir. Regards, Gustavo > Vladimir > >> >> >>> Note, for 8213754 corresponding Graal test fix is 8215317 (and not 8215687, that one is for 8212043 Math.min/max). >> >> Thanks for the detailed note. I commented on the right thread but pasted >> the wrong bug! >> >> Thank you and best regards, >> Gustavo >> >>> Yes, Graal fix in jdk11u is different - new intrinsics should be listed under existing condition isJDK11OrHigher(): >>> >>> http://hg.openjdk.java.net/jdk-updates/jdk11u/file/5fc74655f16d/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java#l371 >>> >>> Regards, >>> Vladimir >>> >>> On 1/16/19 1:57 PM, Gustavo Romero wrote: >>>> Hi Vladimir, >>>> >>>> I would like to request the approval to backport the change: >>>> >>>> 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace >>>> https://bugs.openjdk.java.net/browse/JDK-8213754 >>>> >>>> to jdk11u, but if it gets integrated before 8215687 it will break Graal >>>> test HotspotTest.java/CheckGraalIntrinsics.java again, as expected. >>>> >>>> Are you fine if I request the approval to backport first this change, i.e. >>>> 8215687? >>>> >>>> Actually I'll have to tweak a bit and s/isJDK12OrHigher/isJDK11OrHigher/, >>>> right? >>>> >>>> Thank you. >>>> >>>> Best regards, >>>> Gustavo >>>> >>>> On 12/13/2018 02:10 AM, Vladimir Kozlov wrote: >>>>> https://bugs.openjdk.java.net/browse/JDK-8215317 >>>>> >>>>> JDK-8213754 added new intrinsics which cause Graal's unit test failure. >>>>> >>>>> CheckGraalIntrinsics test is adjusted for new intrinsics: >>>>> >>>>> src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.hotspot.test/src/org/graalvm/compiler/hotspot/test/CheckGraalIntrinsics.java >>>>> @@ -376,6 +376,14 @@ >>>>> ????????????????????????????? "jdk/jfr/internal/JVM.getEventWriter()Ljava/lang/Object;"); >>>>> ????????? } >>>>> >>>>> +??????? if (isJDK12OrHigher()) { >>>>> +??????????? add(toBeInvestigated, >>>>> +??????????????????????????? "java/lang/CharacterDataLatin1.isDigit(I)Z", >>>>> +??????????????????????????? "java/lang/CharacterDataLatin1.isLowerCase(I)Z", >>>>> +??????????????????????????? "java/lang/CharacterDataLatin1.isUpperCase(I)Z", >>>>> +??????????????????????????? "java/lang/CharacterDataLatin1.isWhitespace(I)Z"); >>>>> +??????? } >>>>> + >>>>> ????????? if (!config.inlineNotify()) { >>>>> ????????????? add(ignore, "java/lang/Object.notify()V"); >>>>> ????????? } >>>>> >>>>> Tested tier1 and tier3-graal (where test is run). >>>>> >>>>> I also pushed changes into Lab's Graal repo so this test will be updated during next sync. >>>>> But I want to push fix into JDK because JDK 12 will be forked very soon. >>>>> >>>> >>> >> > From lutz.schmidt at sap.com Thu Jan 17 19:45:13 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Thu, 17 Jan 2019 19:45:13 +0000 Subject: RFR(S, tedious): 8217250: Optimize CodeHeap Analytics In-Reply-To: <4e61a8a5-3c6e-3c4f-0a2c-68d4b8bc2f9f@oracle.com> References: <2d7b7963-61be-95b8-017b-956f2752c8f3@oracle.com> <9AD26B9E-015A-4BD4-A44F-12DDE2793ED0@sap.com> <4e61a8a5-3c6e-3c4f-0a2c-68d4b8bc2f9f@oracle.com> Message-ID: Thank you, Vladimir! Have a great day! Lutz ?On 17.01.19, 19:39, "Vladimir Kozlov" wrote: Looks good Thanks, Vladimir On 1/17/19 7:47 AM, Schmidt, Lutz wrote: > Hi Vladimir & all, > there is a new webrev available: http://cr.openjdk.java.net/~lucy/webrevs/8217250.01/ > What's new (in addition to some comments) is the macro > > // Flush the buffer contents if the remaining capacity is less > // than the calculated threshold (256 bytes + capacity/16) > // That should suffice for all reasonably sized output lines. > #define BUFFEREDSTREAM_FLUSH_AUTO(_termString) \ > BUFFEREDSTREAM_FLUSH_IF(_termString, 256+(_capacity>>4)) > > It replaced the previous BUFFEREDSTREAM_FLUSH_IF("string", 512) occurrences. > Regards, > Lutz > > On 16.01.19, 22:53, "Vladimir Kozlov" wrote: > > On 1/16/19 12:37 PM, Schmidt, Lutz wrote: > > Hi Vladimir, > > > > thanks a lot for looking at this so quickly. > > > > Sure, I could declare a specialized "BUFFEREDSTREAM_FLUSH_512" for this. The "512" originated from the thought "its large enough for a well-behaved line and small enough to save some flushes". > > > > I was also thinking about a "BUFFEREDSTREAM_FLUSH_AUTO", where the spare space is derived from the buffer capacity, maybe something like 10 percent of the capacity, 256 bytes minimum. I wasn't sure if that could be categorized as over-engineered. > > Yes, I think BUFFEREDSTREAM_FLUSH_AUTO is better than fixed size. > > Vladimir > > > > > Your thoughts? > > > > Thanks, > > Lutz > > > > On 16.01.19, 19:10, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: > > > > Hi Lutz, > > > > I see that you have only one usage in all cases for: > > BUFFEREDSTREAM_FLUSH_IF("", 512) > > > > Can you simple declare simplified macro for this? > > > > Otherwise looks good. > > > > Thanks, > > Vladimir > > > > On 1/16/19 6:52 AM, Schmidt, Lutz wrote: > > > Dear all, > > > > > > may I please have reviews for this (semantically) small change. Its purpose is to reduce the bufferedStream buffer flushes while printing CodeHeap Analytics. > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8217250 > > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8217250.00/ > > > > > > Thank you! > > > Lutz > > > > > > > > > > > > From bsrbnd at gmail.com Thu Jan 17 19:51:06 2019 From: bsrbnd at gmail.com (B. Blaser) Date: Thu, 17 Jan 2019 20:51:06 +0100 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions In-Reply-To: References: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> <81dc5407-4985-883c-83fc-2e1fa2b77e66@redhat.com> Message-ID: On Thu, 17 Jan 2019 at 10:17, Andrew Haley wrote: > > On 1/16/19 8:46 PM, B. Blaser wrote: > > > To answer Andrew Haley, one of the major difference between CISC and > > RISC is specifically the load/store architecture of the latter which > > is part of most instructions of the former; I don't see many good > > reasons to generate RISC-like load/store code using only a subset of > > instructions and to juggle with registers. > > Well, yes, but the question remains: does this change actually help > anything. And if it does, by how much? Here it is on intel xeon with 5*10e9 iterations: * mov+cmov = 10.94s * cmov = 10.15s Thoughts? Thanks, Bernard $ cat cmov.c // $ gcc -S cmov.c // $ cat cmov.s // $ gcc cmov.s // $ time ./a.out #include #include void main() { struct timespec start, stop; clock_gettime(CLOCK_THREAD_CPUTIME_ID, &start); for (long i=0; i<5000000000L; i++) { asm ("clc"); asm ("movq -8(%rbp), %rbx"); asm ("cmovncq %rbx, %rax"); // asm ("cmovncq -8(%rbp), %rax"); } clock_gettime(CLOCK_THREAD_CPUTIME_ID, &stop); long t = ((long)stop.tv_sec) * 1000000000L + stop.tv_nsec; t -= ((long)start.tv_sec) * 1000000000L + start.tv_nsec; printf("nsec: %ld\n", t); } $ gcc -S cmov.c $ gcc cmov.s $ time ./a.out nsec: 10942890857 real 0m10.951s user 0m10.941s sys 0m0.003s $ cat cmov.c [...] for (long i=0; i<5000000000L; i++) { asm ("clc"); // asm ("movq -8(%rbp), %rbx"); // asm ("cmovncq %rbx, %rax"); asm ("cmovncq -8(%rbp), %rax"); } [...] $ gcc -S cmov.c $ gcc cmov.s $ time ./a.out nsec: 10149026430 real 0m10.157s user 0m10.150s sys 0m0.001s From mikael.vidstedt at oracle.com Thu Jan 17 21:54:11 2019 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Thu, 17 Jan 2019 13:54:11 -0800 Subject: RFR (XS): 8217266: Remove dead LIR_List::compare_to and LIR_Code::lir_compare_to In-Reply-To: References: <39E5F5F4-598E-46C8-8BAD-C95D16439EDA@oracle.com> <877ef3wunv.fsf@redhat.com> Message-ID: Roland/Vladimir, thanks for the reviews. Change pushed. Cheers, Mikael > On Jan 17, 2019, at 10:12 AM, Vladimir Kozlov wrote: > > +1 > > Vladimir > > On 1/17/19 12:33 AM, Roland Westrelin wrote: >>> webrev: http://cr.openjdk.java.net/~mikael/webrevs/8217266/webrev.00/open/webrev/ >> That looks good to me. >> Roland. From vladimir.x.ivanov at oracle.com Thu Jan 17 22:35:15 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 17 Jan 2019 14:35:15 -0800 Subject: Why does call_site_target keep changing for a Nashorn method? In-Reply-To: <616C8E42-4B18-405B-B28A-C9F062EC9B6C@amazon.com> References: <616C8E42-4B18-405B-B28A-C9F062EC9B6C@amazon.com> Message-ID: <186d49e7-daa9-a0db-b0c6-1b9d4ff2adda@oracle.com> C1/C2 optimistically inline through CallSite instances even if those are mutable (MutableCallSite/VolatileCallSite). It requires a nmethod dependency and once CallSite target changes, all dependent nmethods should be invalidated. If such change happens during compilation, nmethod installation fails. That's exactly what you observe: the dependency is recorded during inlining, but failed verification during installation. Regarding the observed behavior, it is well-known [1] [2] and was a deliberate choice. As JDK-7087838 [1] states: "The consensus among language runtime implementors is that they want control over switch points (and thus call sites) and so it's their responsibility to handle extensive invalidation of such." So, such pathological behavior is treated as a bug in user code (Nashorn in this particular case). There's an RFE filed [3] to consider alternative options for unstable calls. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-7087838 [2] https://bugs.openjdk.java.net/browse/JDK-7177745 [3] https://bugs.openjdk.java.net/browse/JDK-8147550 On 16/01/2019 14:04, Liu, Xin wrote: > In one of our applications, C1/C2 keeps compiling a Javascript method > generated by Nashorn but the code fails a dependency check right before > installing in the code cache. This is with JDK tip. > > It can?t pass ?Dependencies::check_call_site_target_value?. > > [C2 Parsing] > > > > > > > > > > > > > > > > > > > > > > > > [Validating compilation dependencies] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > witness='jdk/nashorn/internal/runtime/linker/LinkerCallSite' > stamp='1113.578'/> > > It?s related to the GWT methodHandle. ?The 2 mismatched methodhandles > are very similar except for argL3, which is an int[2]. > > Even though arg0-2 are not identical objects, their contents are same. > > (gdb)call java_lang_invoke_CallSite::target(call_site)->print() > > java.lang.invoke.BoundMethodHandle$Species_LLLL > > {0x00000000f586ca98}- > klass:'java/lang/invoke/BoundMethodHandle$Species_LLLL' > > - ---- fields(total size 6 words): > > -'customizationCount''B'@12 0 > > - private final'type''Ljava/lang/invoke/MethodType;'@16 > a'java/lang/invoke/MethodType'{0x00000000e21e2878}=(Ljava/lang/Object;Ljdk/nashorn/internal/runtime/Undefined;Ljava/lang/Object;)Ljava/lang/Object;(e21e2878) > > - final'form''Ljava/lang/invoke/LambdaForm;'@20 > a'java/lang/invoke/LambdaForm'{0x00000000e1e4a670}=>a'java/lang/invoke/MemberName'{0x00000000e1e4a938}={method}{0x00007fffa512cb68}'guard''(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;'in'java/lang/invoke/LambdaForm$MH'(e1e4a670) > > -'asTypeCache''Ljava/lang/invoke/MethodHandle;'@24 NULL(0) > > - final'argL0''Ljava/lang/Object;'@28 > a'java/lang/invoke/BoundMethodHandle$Species_LL'{0x00000000f586c9e8}(f586c9e8) > > - final'argL1''Ljava/lang/Object;'@32 > a'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f586ca28}(f586ca28) > > - final'argL2''Ljava/lang/Object;'@36 > a'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f586ca60}(f586ca60) > > - final'argL3''Ljava/lang/Object;'@40 [I{0x00000000f586ca10}(f586ca10) > > (gdb)call method_handle->print() > > java.lang.invoke.BoundMethodHandle$Species_LLLL > > {0x00000000f6b18500}- > klass:'java/lang/invoke/BoundMethodHandle$Species_LLLL' > > - ---- fields(total size 6 words): > > -'customizationCount''B'@12 0 > > - private final'type''Ljava/lang/invoke/MethodType;'@16 > a'java/lang/invoke/MethodType'{0x00000000e21e2878}=(Ljava/lang/Object;Ljdk/nashorn/internal/runtime/Undefined;Ljava/lang/Object;)Ljava/lang/Object;(e21e2878) > > - final'form''Ljava/lang/invoke/LambdaForm;'@20 > a'java/lang/invoke/LambdaForm'{0x00000000e1e4a670}=>a'java/lang/invoke/MemberName'{0x00000000e1e4a938}={method}{0x00007fffa512cb68}'guard''(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;'in'java/lang/invoke/LambdaForm$MH'(e1e4a670) > > -'asTypeCache''Ljava/lang/invoke/MethodHandle;'@24 NULL(0) > > - final'argL0''Ljava/lang/Object;'@28 > a'java/lang/invoke/BoundMethodHandle$Species_LL'{0x00000000f6b18450}(f6b18450) > > - final'argL1''Ljava/lang/Object;'@32 > a'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f6b18490}(f6b18490) > > - final'argL2''Ljava/lang/Object;'@36 > a'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f6b184c8}(f6b184c8) > > - final'argL3''Ljava/lang/Object;'@40 [I{0x00000000f6b18478}(f6b18478) > > My guess is argL3 is counters in Java.lang.invoke.MethodHandleImpl. > > // Intrinsified by C2. Counters are used during parsing to calculate > branch frequencies. > @LambdaForm.Hidden > @jdk.internal.HotSpotIntrinsicCandidate > static > boolean profileBoolean(boolean result, int[] counters) { > // Profile is int[2] where [0] and [1] correspond to false and true > occurrences respectively. > int idx = result ? 1 : 0; > ??? try { > ??????? counters[idx] = Math./addExact/(counters[idx], 1); > } catch (ArithmeticException e) { > // Avoid continuous overflow by halving the problematic count. > counters[idx] = counters[idx] / 2; > } > return result; > } > > I am still struggling to understand the source code in > java.lang.invoke.*. ?Could anybody enlighten me why the target of the > callsite changes every time here? ?it is relative to this profiling thing? > > In validation log, it has validated the dep ?dependency > type='call_site_target_value' x0='1556' x='1866'? above. Why it can?t > pass it after then? My guess is one MH object has been changed by > another Java thread. > > One interesting fact that compiler thread can?t pass 22^th dep.? My > tuition is it goes over an unknown threshold. > > The 2nd question is about ciEnv:: validate_compile_task_dependencies. > ?Why does failure of call_site_target_value_changed not count as a deopt? > > The flag??_inc_decompile_count_on_failure =false stops MDO to mark this > method ?not_compileable?. ?C2 doesn?t set the flag, so C2 ends up > compiling it over and over, which makes C2 a cpu hog. Here?s the code in > validate_compile_task_dependencies > > ? bool counter_changed = system_dictionary_modification_counter_changed(); > > ? Dependencies::DepType result = > dependencies()->validate_dependencies(_task, counter_changed); > > ? if (result != Dependencies::end_marker) { > > ??? if (result == Dependencies::call_site_target_value) { > > ????? _inc_decompile_count_on_failure = false; > > ????? record_failure("call site target change"); > > Maybe the right thing to do is to count this as a deopt and change the > deopt limit computation to take into account the size of the method in > nodes, just as done for abandoning compilation if the graph is too big. > > Thanks, > > --lx > From sandhya.viswanathan at intel.com Fri Jan 18 01:13:35 2019 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 18 Jan 2019 01:13:35 +0000 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <21ef0e11-3f3d-e9a4-5dc6-898d4ac18efa@redhat.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <09c92b1a-c6da-e16b-bb68-553876e8a6ea@oracle.com> <2a0a385d-81ee-df66-f147-e4dd9aa5b72e@oracle.com> <8b2ab749-20f1-8dd3-3cc7-64db5d45bc7d@redhat.com> <0aae37aa-7797-fde5-63d5-96c8eb961183@oracle.com> <86a1988a-a8d2-b6af-0985-11a94d6d76a5@redhat.com> <69510788-52e6-815b-1ed7-a6f4886d0398@oracle.com> <3e3c4f7d-049e-4aec-c165-f2664e7c98ef@redhat.com> <34cfc530-8517-ac1a-0c04-446dc3dc2436@oracle.com> <21ef0e11-3f3d-e9a4-5dc6-898d4ac18efa@redhat.com> Message-ID: <02FCFB8477C4EF43A2AD8E0C60F3DA2BB1A50E3B@FMSMSX126.amr.corp.intel.com> Hi Andrew, >>> Perhaps Intel also could provide help with testing? [Sadhya, is this an option?] Yes, we can help with testing this feature as needed. Best Regards, Sandhya -----Original Message----- From: Andrew Dinn [mailto:adinn at redhat.com] Sent: Thursday, January 17, 2019 6:28 AM To: Alan Bateman ; Brian Goetz Cc: core-libs-dev at openjdk.java.net; hotspot compiler ; Jonathan Halliday ; Viswanathan, Sandhya ; Mikael Vidstedt Subject: Re: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory Hi Alan, Thanks for your response. On 17/01/2019 12:53, Alan Bateman wrote: > I skimmed through the current draft. In the most recent discussion > then I think we had converged on "SYNC" rather than "PERSISTENT", the > reasoning being that there is persistence already with regular file > mapped files, also it aligns with the MAP_SYNC flag to mmap. I don't > recall if the discussion on isPersistent concluded, that was more of a > naming issue and whether you include an isXXX method or not is not > critical to the proposal. The overload of the force method to specify > a range is a good addition, irrespective of the JEP. Ok, thanks. At least sync is now being used consistently in the public API. I will look at renaming internal vars/methods to use sync when I publish the next webrev. > One thing to clarify is the heading "Proposed Restricted Public JDK > API Changes". The proposal (and the early webrevs) exposed > writebackMemory in the internal Unsafe, not sun.misc.Unsafe, which I think is right. > This makes it a JDK internal API so it doesn't need to be in JEP. I am happy to remove it from the JEP if needed. Does it do any harm to leave it? > Did you get any feedback on the Testing section? Given that the > feature needs special hardware then it will need commitment to test is > on a regular basis. It's a similar issue to the draft "JEP 337: RDMA > Network Sockets" where special hardware is needed to full test the > feature. In the case of JEP 337 then some testing with emulation is possible. I believe I received no specific feedback on that topic. Some of the other Red Hat dev teams (i.e. not OpenJDK) and also dev staff at Intel are very keen to base some of their future work on this feature. So, it will certainly get tested /after/ JDK release :-) Red Hat does have the Intel hardware needed to test this feature but, so far, nothing that can be used to test on AArch64. Our OpenJDK team can access this kit for one-off testing but it is not currently available for continuous integration testing. I will propose to my manager that we acquire the relevant kit and ensure that all JDKs which implement this JEP are tested prior to release. We should also be able to test AArch64 using volatile memory to simulate a non-volatile memory device up to the point where the requisite AArch64-based NVM hardware becomes available. I am fairly confident this plan will be agreeable to the overlords whom I humbly serve. Perhaps Intel also could provide help with testing? [Sadhya, is this an option?] My bigger concern was that crash recovery tests may never be 100% reliable. A 100% guarantee requires the ability to engineer a machine crash at a precisely defined critical point of execution and some of the relevant critical locations will be embedded in the middle of JITted code making it hard to provoke the crash. So, the situations where a crash /can/ be engineered may not fully reflect those that occur in live deployments. That said, a dash of artificiality in test code is, perhaps, not so worthy of remark . . . > Vladimir and I have reviewed the JEP, it will need an area lead to > endorse, I think it can be Brian or Mikael in this case. Ok, thanks for the above answers. Looking forward to hearing further from Brian and/or Mikael (Vidstedt, I assume? :-). regards, Andrew Dinn ----------- From felix.yang at huawei.com Fri Jan 18 05:36:11 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Fri, 18 Jan 2019 05:36:11 +0000 Subject: [RFR] 8217359: C2 compiler triggers SIGSEGV after tranformation in ConvI2LNode::Ideal Message-ID: Hi, Can someone help review this change to the C2 compiler? Bug: https://bugs.openjdk.java.net/browse/JDK-8217359 Webrev: http://cr.openjdk.java.net/~fyang/8217359/webrev.00/ The bug triggers when C2 compiler does the following transformation in function ConvI2LNode::Ideal: // Convert ConvI2L(AddI(x, y)) to AddL(ConvI2L(x), ConvI2L(y)) ...... 395 Node* cx = phase->C->constrained_convI2L(phase, x, TypeInt::make(rxlo, rxhi, widen), NULL); 396 Node* cy = phase->C->constrained_convI2L(phase, y, TypeInt::make(rylo, ryhi, widen), NULL); .... Here is the process of how it triggers: // ========================================================= // Before line 395, x is an AddINode (id: 202). y is also an AddINode (id: 553) and x is a subtree of y. // The ideal graph looks like: // // ... ... ... ... // \ | / | // 86_Phi 33_ConI // | / // \ | / // 202_AddI // // ... ... ... ... // | \ | / // 27_ConI 202_AddI --------- (node x) // | / // \ | / // 549_SubI // | ... // \ | / // 553_AddI ---------- (node y) // // ========================================================== // After line 395, x is converted to cx and cx is an AddLNode (id: 1274). // At this point, everything looks fine. // ... ... ... // \ / / // 1271_ConvI2L 1273_ConL // | / // \ | / // 1274_AddL // // ========================================================== // In line 396, y will be converted to cy. In this progress, y // and its subnode will all be converted recursively. This is // a rather long progress. The convertion of y is like this: // // Node 27_ConI will be converted to node 1278_ConL. // // Since x(202) is the input edge of node 549, it will be // converted again. And the result cx_2 is node 1282_AddL. // The structure of cx_2 is the same as cx. After GVN(hash_find_insert()), // 1282_AddL is replaced with 1274_AddL. // // Then 549_SubI will be converted to 1283_SubL and the ideal graph looks like: // ... ... ... // \ / / // 1271_ConvI2L 1273_ConL // ... | / // | \ | / // 1278_ConL 1274_AddL // | / // \ | / // 1283_SubL // // After that, C2 will do the following transformation to node 1283_SubL: // x - (y + cons) ==> (x - y) - cons // // When this is done, node 1283_SubL is converted to node 1286_AddL: // ... ... ... // | | / // 1278_ConL 1271_ConI2L // | / ... // \ | / / // 1284_SubL 1285_ConL // | / // \ | / // 1286_AddL // // Then in function subsume_node(), 1283_SubL is replaced with 1286_AddL. // During this progress, following operations will be carried out: // | In function subsume_node(), 1283_SubL will be regarded as a // | dead_node since it is replaced by 1286_AddL. The same inspection // | of dead node will be carried out to the subnodes of 1283 // | recursively. And remove_dead_node() function will be called // | by subsume_node() to replace the input edges of dead node to NULL. // | 1274_AddL is node cx. At this moment, 1274_AddL has only one // | output edge, that is 1283. Since 1283 is a dead node, 1274 will // | also be regarded as a dead node. Then input edges of 1274_AddL // | will be set to NULL. After that, cx will be an isolated node which // | has neither input edge nor output edge. // // ========================================== // After all of this, program continues and cx->in(2) is used in addnode.cpp:163. // Since now cx has no input edges, the program crashes. The proposed solution is fairly straight-forward: After the conversion of x, build a hook node add a use to cx to prevent it from dying. When conversion of y is finished, this new output of cx is removed. JTreg tested with both x86_64 fastdebug & release build. Is it OK? Thanks, Felix From Nick.Gasson at arm.com Fri Jan 18 08:40:25 2019 From: Nick.Gasson at arm.com (Nick Gasson (Arm Technology China)) Date: Fri, 18 Jan 2019 08:40:25 +0000 Subject: RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered Message-ID: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> Hi, While I was cleaning up the patch for 8216350 I noticed an issue in the implementation of recursive locking in aarch64_enc_fast_lock: Bug: https://bugs.openjdk.java.net/browse/JDK-8217368 Webrev: http://cr.openjdk.java.net/~ngasson/8217368/webrev.0/ First we load the markOop of the object we want to lock and OR it with markOopDesc::unlocked_value (1). Then we do a CAS to exchange the address of the box on our thread's stack with the object's header word iff it's equal to the (markOop | 1) we just computed. If this fails, then we should check for a recursive lock by comparing (~(page size - 1) | 3) & (markOop - SP) == 0 Where "markOop" is the current object header word loaded by the failed CAS. This checks that the lock bits are zero (locked) and the stack address of the displaced header is within one page of the current SP. But on AArch64 we actually do this: (~(page size - 1) | 3) & ((old markOop | 1) - SP) == 0 Where "old markOop | 1" is the compare-to value used for the CAS. This is always false as the result has at least bit #0 set. This only affects C2, the C1_MacroAssembler version has the correct test. The diff looks big but all it does is swap the usage of registers `tmp' and `disp_hdr' in the first section so the markOop loaded by the CAS ends up in disp_hdr and tmp holds the (markOop | 1) compare-to value. Ran jtreg, plus jcstress with -XX:+UseLSE and -XX:-UseLSE. Also added another microbenchmark to micro/org/openjdk/bench/vm/lang/LockUnlock.java as I couldn't find an existing JMH case that triggered this. Without patch: Result "org.openjdk.bench.vm.lang.LockUnlock.testRecursiveSynchronizationNoBias": 510.781 ?(99.9%) 1.196 ns/op [Average] (min, avg, max) = (508.769, 510.781, 513.854), stdev = 1.597 CI (99.9%): [509.585, 511.977] (assumes normal distribution) With patch: Result "org.openjdk.bench.vm.lang.LockUnlock.testRecursiveSynchronizationNoBias": 197.038 ?(99.9%) 0.096 ns/op [Average] (min, avg, max) = (196.886, 197.038, 197.296), stdev = 0.128 CI (99.9%): [196.942, 197.134] (assumes normal distribution) Two other minor things: * Does anyone know what the comment "// Load Compare Value application register." means? It's present in the PPC and S390 ports too. * The x86 port #ifdef LP64 uses "7 - os::vm_page_size()" as the mask in the recursive lock test. I think the "7" here is markOopDesc::biased_lock_mask and is presumably there to prevent a silent mutual exclusion failure if a markOop with the bias locking bits set ends up the fast_lock path (although this should never happen). Should we change markOopDesc::lock_mask_in_place to markOopDesc::biased_lock_mask_in_place in the AArch64 port too? Thanks, Nick From aph at redhat.com Fri Jan 18 09:36:31 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 18 Jan 2019 09:36:31 +0000 Subject: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered In-Reply-To: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> References: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> Message-ID: On 1/18/19 8:40 AM, Nick Gasson (Arm Technology China) wrote: > Hi, > > While I was cleaning up the patch for 8216350 I noticed an issue in the > implementation of recursive locking in aarch64_enc_fast_lock: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8217368 > Webrev: http://cr.openjdk.java.net/~ngasson/8217368/webrev.0/ > > First we load the markOop of the object we want to lock and OR it with > markOopDesc::unlocked_value (1). Then we do a CAS to exchange the > address of the box on our thread's stack with the object's header word > iff it's equal to the (markOop | 1) we just computed. If this fails, > then we should check for a recursive lock by comparing > > (~(page size - 1) | 3) & (markOop - SP) == 0 > > Where "markOop" is the current object header word loaded by the failed > CAS. This checks that the lock bits are zero (locked) and the stack > address of the displaced header is within one page of the current SP. > But on AArch64 we actually do this: > > (~(page size - 1) | 3) & ((old markOop | 1) - SP) == 0 > > Where "old markOop | 1" is the compare-to value used for the CAS. This > is always false as the result has at least bit #0 set. This only affects > C2, the C1_MacroAssembler version has the correct test. > > The diff looks big but all it does is swap the usage of registers `tmp' > and `disp_hdr' in the first section so the markOop loaded by the CAS > ends up in disp_hdr and tmp holds the (markOop | 1) compare-to value. The patch looks good. However, I don't understand why we aren't using MacroAssembler::cmpxchgptr here. It looks like we should be, and you'd end up with a less complex result. > Two other minor things: > > * Does anyone know what the comment "// Load Compare Value application > register." means? It's present in the PPC and S390 ports too. Probably no-one can remember. We'll have inherited it from x86. > * The x86 port #ifdef LP64 uses "7 - os::vm_page_size()" as the mask in > the recursive lock test. I think the "7" here is > markOopDesc::biased_lock_mask and is presumably there to prevent a > silent mutual exclusion failure if a markOop with the bias locking bits > set ends up the fast_lock path (although this should never happen). > Should we change markOopDesc::lock_mask_in_place to > markOopDesc::biased_lock_mask_in_place in the AArch64 port too? I wouldn't think so. You're describing a change that by definition we can't test. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Fri Jan 18 09:49:14 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 18 Jan 2019 10:49:14 +0100 Subject: RFR(S, tedious): 8217250: Optimize CodeHeap Analytics In-Reply-To: <4e61a8a5-3c6e-3c4f-0a2c-68d4b8bc2f9f@oracle.com> References: <2d7b7963-61be-95b8-017b-956f2752c8f3@oracle.com> <9AD26B9E-015A-4BD4-A44F-12DDE2793ED0@sap.com> <4e61a8a5-3c6e-3c4f-0a2c-68d4b8bc2f9f@oracle.com> Message-ID: <52136751-929b-4976-477d-93282ce0a0d7@oracle.com> Hi Lutz, looks good to me too. Best regards, Tobias On 17.01.19 19:39, Vladimir Kozlov wrote: > Looks good > > Thanks, > Vladimir > > On 1/17/19 7:47 AM, Schmidt, Lutz wrote: >> Hi Vladimir & all, >> there is a new webrev available: http://cr.openjdk.java.net/~lucy/webrevs/8217250.01/ >> What's new (in addition to some comments) is the macro >> >> ?? // Flush the buffer contents if the remaining capacity is less >> ?? // than the calculated threshold (256 bytes + capacity/16) >> ?? // That should suffice for all reasonably sized output lines. >> ?? #define BUFFEREDSTREAM_FLUSH_AUTO(_termString)??????????????? \ >> ?????? BUFFEREDSTREAM_FLUSH_IF(_termString, 256+(_capacity>>4)) >> >> It replaced the previous BUFFEREDSTREAM_FLUSH_IF("string", 512) occurrences. >> Regards, >> Lutz >> >> ?On 16.01.19, 22:53, "Vladimir Kozlov" wrote: >> >> ???? On 1/16/19 12:37 PM, Schmidt, Lutz wrote: >> ???? > Hi Vladimir, >> ???? > >> ???? > thanks a lot for looking at this so quickly. >> ???? > >> ???? > Sure, I could declare a specialized "BUFFEREDSTREAM_FLUSH_512" for this. The "512" >> originated from the thought "its large enough for a well-behaved line and small enough to save >> some flushes". >> ???? > >> ???? > I was also thinking about a "BUFFEREDSTREAM_FLUSH_AUTO", where the spare space is derived >> from the buffer capacity, maybe something like 10 percent of the capacity, 256 bytes minimum. I >> wasn't sure if that could be categorized as over-engineered. >> ???? ???? Yes, I think BUFFEREDSTREAM_FLUSH_AUTO is better than fixed size. >> ???? ???? Vladimir >> ???? ???? > >> ???? > Your thoughts? >> ???? > >> ???? > Thanks, >> ???? > Lutz >> ???? > >> ???? > On 16.01.19, 19:10, "hotspot-compiler-dev on behalf of Vladimir Kozlov" >> wrote: >> ???? > >> ???? >????? Hi Lutz, >> ???? > >> ???? >????? I see that you have only one usage in all cases for: >> ???? >????? BUFFEREDSTREAM_FLUSH_IF("", 512) >> ???? > >> ???? >????? Can you simple declare simplified macro for this? >> ???? > >> ???? >????? Otherwise looks good. >> ???? > >> ???? >????? Thanks, >> ???? >????? Vladimir >> ???? > >> ???? >????? On 1/16/19 6:52 AM, Schmidt, Lutz wrote: >> ???? >????? > Dear all, >> ???? >????? > >> ???? >????? > may I please have reviews for this (semantically) small change. Its purpose is to >> reduce the bufferedStream buffer flushes while printing CodeHeap Analytics. >> ???? >????? > >> ???? >????? > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8217250 >> ???? >????? > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8217250.00/ >> ???? >????? > >> ???? >????? > Thank you! >> ???? >????? > Lutz >> ???? >????? > >> ???? >????? > >> ???? > >> ???? > >> ???? From Nick.Gasson at arm.com Fri Jan 18 09:52:50 2019 From: Nick.Gasson at arm.com (Nick Gasson (Arm Technology China)) Date: Fri, 18 Jan 2019 09:52:50 +0000 Subject: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered In-Reply-To: References: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> Message-ID: <4a09e8b7-9990-aa66-0afb-bf4e41cab831@arm.com> Hi Andrew, On 18/01/2019 17:36, Andrew Haley wrote: > > The patch looks good. However, I don't understand why we aren't using > MacroAssembler::cmpxchgptr here. It looks like we should be, and you'd > end up with a less complex result. > It's not exactly the same though: MacroAssembler::cmpxchgptr adds a "dmb ish" to the failure path which I don't think is required here. >> * Does anyone know what the comment "// Load Compare Value application >> register." means? It's present in the PPC and S390 ports too. > > Probably no-one can remember. We'll have inherited it from x86. Let's delete it then. Thanks, Nick From aph at redhat.com Fri Jan 18 09:54:52 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 18 Jan 2019 09:54:52 +0000 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions In-Reply-To: References: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> <81dc5407-4985-883c-83fc-2e1fa2b77e66@redhat.com> Message-ID: <3dd85d2c-f4d8-e360-21a2-68254b3c5e2b@redhat.com> On 1/17/19 7:51 PM, B. Blaser wrote: > Here it is on intel xeon with 5*10e9 iterations: > * mov+cmov = 10.94s > * cmov = 10.15s > > Thoughts? It looks like there's not much of a performance difference, but it might help by freeing a register. OTOH, we'd still need to be sure we weren't introducing a regression. We'd have to make sure that implicit null checks work. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Fri Jan 18 10:23:23 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 18 Jan 2019 11:23:23 +0100 Subject: [RFR] 8217359: C2 compiler triggers SIGSEGV after tranformation in ConvI2LNode::Ideal In-Reply-To: References: Message-ID: <32982b31-3a91-58fb-a6b8-b1cd9f7cdb41@oracle.com> Hi Felix, Could you please add the regression test as jtreg test? Otherwise, the fix looks reasonable to me. Nice analysis! Thanks, Tobias On 18.01.19 06:36, Yangfei (Felix) wrote: > Hi, > > Can someone help review this change to the C2 compiler? > > Bug: https://bugs.openjdk.java.net/browse/JDK-8217359 > Webrev: http://cr.openjdk.java.net/~fyang/8217359/webrev.00/ > > The bug triggers when C2 compiler does the following transformation in function ConvI2LNode::Ideal: > // Convert ConvI2L(AddI(x, y)) to AddL(ConvI2L(x), ConvI2L(y)) > ...... > 395 Node* cx = phase->C->constrained_convI2L(phase, x, TypeInt::make(rxlo, rxhi, widen), NULL); > 396 Node* cy = phase->C->constrained_convI2L(phase, y, TypeInt::make(rylo, ryhi, widen), NULL); > .... > > Here is the process of how it triggers: > > // ========================================================= > // Before line 395, x is an AddINode (id: 202). y is also an AddINode (id: 553) and x is a subtree of y. > // The ideal graph looks like: > // > // ... ... ... ... > // \ | / | > // 86_Phi 33_ConI > // | / > // \ | / > // 202_AddI > // > // ... ... ... ... > // | \ | / > // 27_ConI 202_AddI --------- (node x) > // | / > // \ | / > // 549_SubI > // | ... > // \ | / > // 553_AddI ---------- (node y) > // > // ========================================================== > // After line 395, x is converted to cx and cx is an AddLNode (id: 1274). > // At this point, everything looks fine. > // ... ... ... > // \ / / > // 1271_ConvI2L 1273_ConL > // | / > // \ | / > // 1274_AddL > // > // ========================================================== > // In line 396, y will be converted to cy. In this progress, y > // and its subnode will all be converted recursively. This is > // a rather long progress. The convertion of y is like this: > // > // Node 27_ConI will be converted to node 1278_ConL. > // > // Since x(202) is the input edge of node 549, it will be > // converted again. And the result cx_2 is node 1282_AddL. > // The structure of cx_2 is the same as cx. After GVN(hash_find_insert()), > // 1282_AddL is replaced with 1274_AddL. > // > // Then 549_SubI will be converted to 1283_SubL and the ideal graph looks like: > // ... ... ... > // \ / / > // 1271_ConvI2L 1273_ConL > // ... | / > // | \ | / > // 1278_ConL 1274_AddL > // | / > // \ | / > // 1283_SubL > // > // After that, C2 will do the following transformation to node 1283_SubL: > // x - (y + cons) ==> (x - y) - cons > // > // When this is done, node 1283_SubL is converted to node 1286_AddL: > // ... ... ... > // | | / > // 1278_ConL 1271_ConI2L > // | / ... > // \ | / / > // 1284_SubL 1285_ConL > // | / > // \ | / > // 1286_AddL > // > // Then in function subsume_node(), 1283_SubL is replaced with 1286_AddL. > // During this progress, following operations will be carried out: > // | In function subsume_node(), 1283_SubL will be regarded as a > // | dead_node since it is replaced by 1286_AddL. The same inspection > // | of dead node will be carried out to the subnodes of 1283 > // | recursively. And remove_dead_node() function will be called > // | by subsume_node() to replace the input edges of dead node to NULL. > // | 1274_AddL is node cx. At this moment, 1274_AddL has only one > // | output edge, that is 1283. Since 1283 is a dead node, 1274 will > // | also be regarded as a dead node. Then input edges of 1274_AddL > // | will be set to NULL. After that, cx will be an isolated node which > // | has neither input edge nor output edge. > // > // ========================================== > // After all of this, program continues and cx->in(2) is used in addnode.cpp:163. > // Since now cx has no input edges, the program crashes. > > The proposed solution is fairly straight-forward: > After the conversion of x, build a hook node add a use to cx to prevent it from dying. > When conversion of y is finished, this new output of cx is removed. > > JTreg tested with both x86_64 fastdebug & release build. Is it OK? > > Thanks, > Felix > From goetz.lindenmaier at sap.com Fri Jan 18 11:03:15 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 18 Jan 2019 11:03:15 +0000 Subject: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: References: <1c4646d554954551b73c077fa40f983d@sap.com> <771a457e-6b4d-f73b-e072-703490c9ced5@linux.vnet.ibm.com> <9863276de30643338249ead2a6ac7fe9@sap.com> <452dcb69-189e-700d-5995-582ba13669b9@linux.vnet.ibm.com> <37d9e6f3b2b4400d8963f54d2fe7767f@sap.com> <406db3e3-2ac3-dc16-f384-99a314e62a42@linux.vnet.ibm.com> <180a6c0b-7abe-9d5c-51e6-dffbb23570d3@linux.vnet.ibm.com> <301fd43a-e5b5-d970-7a1a-2458dbaeec36@linux.vnet.ibm.com> Message-ID: <8b1ca2bdba334f42a3c2b044a557dd8c@sap.com> Hi Martin, I had a look at your change. Overall looks good. According to Gustavos mail a nice improvement! I think though that the way to select the algorithm is quite messy: In templateInterpreter vpmsumb is checked and the methods are called directly. In stubGenerator, generate_CRC32...() vpmsumb is tested to decide on vector_constants = R2. and generic generate_CRC_updateBytes is called, which again checks whether verctor_constants == R2. I think generate_CRC_updateBytes() or some other generic function should be located in macroAssembler_ppc and be called from both locations. What do you think? Best regards, Goetz > -----Original Message----- > From: Doerr, Martin > Sent: Donnerstag, 17. Januar 2019 14:18 > To: Gustavo Romero ; 'hotspot-compiler- > dev at openjdk.java.net' ; > Lindenmaier, Goetz > Subject: RE: RFR(M): 8216060: [PPC64] Vector CRC implementation should be > used by interpreter and be faster for short arrays > > Hi, > > the rebased webrev.01 applies on jdk/jdk, now (after JDK-8216376). So the > issue Gustavo had observed does not longer exist. > http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ > > I have updated copyrights and retested it. > > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero > Sent: Montag, 7. Januar 2019 14:52 > To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be > used by interpreter and be faster for short arrays > > Hi Martin, > > On 01/07/2019 11:49 AM, Doerr, Martin wrote: > > I want to check all places where we use "mr(R1_SP, R21_sender_SP)". > There may be more issues with that. I'll probably handle that in a separate > change and push this CRC change afterwards. > > I see. Thanks for letting me know. > > Best regards, > Gustavo > > > Best regards, > > Martin > > > > > > -----Original Message----- > > From: Gustavo Romero > > Sent: Freitag, 4. Januar 2019 19:55 > > To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > > Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should > be used by interpreter and be faster for short arrays > > > > Hi Martin, > > > > On 01/04/2019 02:13 PM, Doerr, Martin wrote: > >> Hi Gustavo, > >> > >> when called from the interpreter (the scenario you observed), R21 is set > before resizing the frame to avoid wasted stack space > (InterpreterMacroAssembler::call_from_interpreter). > > > > Got it. Thanks a lot for the explanations. > > > > I think it doesn't currently matter in practice, but I'm wondering if to be > > consistent we should cut back the stack back earlier also in > > TemplateInterpreterGenerator::generate_CRC32_update_entry()? > > > > diff -r a35f8c35d8c9 > src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp > > --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan > 04 10:09:00 2019 +0100 > > +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan > 04 13:44:37 2019 -0500 > > @@ -1840,11 +1840,12 @@ > > #endif > > __ lwz(crc, 2*wordSize, argP); // Current crc state, zero extend to 64 > bit to have a clean register. > > > > + // Restore caller sp for c2i case and return. > > + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller > started. > > + > > StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); > > __ kernel_crc32_singleByte(crc, data, dataLen, table, tmp, true); > > > > - // Restore caller sp for c2i case and return. > > - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller > started. > > __ blr(); > > > > // Generate a vanilla native entry as the slow path. > > > > Currently there is no issue probably because generated code is simpler and > does > > no spills. > > > > Best regards, > > Gustavo > > > >> When called from compiled methods, R21 is set by a c2i adapter which > extends the compiled frame by space for arguments (gen_c2i_adapter). > >> > >> "mr(R1_SP, R21_sender_SP)" is more error-prone than > "resize_frame_absolute" so I think the latter would be better (though it takes > more registers and instructions), but I don't want to replace that as part of > this CRC change. > >> > >> Best regards, > >> Martin > >> > >> > >> -----Original Message----- > >> From: Gustavo Romero > >> Sent: Freitag, 4. Januar 2019 14:44 > >> To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > >> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should > be used by interpreter and be faster for short arrays > >> > >> Hi Martin, > >> > >> On 01/04/2019 07:30 AM, Doerr, Martin wrote: > >>> thank you very much for confirming. This makes sense. We use different > frame headers depending on whether the frame is the top Java frame or not > (and on whether it's a debug build or not). Setting R1_SP to sender_SP is a > shortcut for leaf calls which relies on having an unmodified stack until this > point. So the patch fixes the issue. > >> > >> Glad to help! Thanks for the additional information, I was not aware that > the > >> selection of different frame headers could be done at compile time. One > last > >> question only for my education: what exactly advanced (incremented) > R1_SP so it > >> has to be cut back using sender_SP value, i.e. sender_SP tracks the frame > for > >> which function exactly or "who" is the caller exactly here? > >> > >> Thank you. > >> > >> Best regards, > >> Gustavo > >> > >>> New webrev: > >>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ > >>> > >>> Best regards, > >>> Martin > >>> > >>> > >>> -----Original Message----- > >>> From: Gustavo Romero > >>> Sent: Donnerstag, 3. Januar 2019 19:36 > >>> To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > >>> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation > should be used by interpreter and be faster for short arrays > >>> > >>> Hi Martin, > >>> > >>> On 01/03/2019 03:34 PM, Doerr, Martin wrote: > >>>> Unfortunately, I can't reproduce the crash. TestCRC32C works stable on > our machine (with fastdbg build). > >>>> I guess that the frameless spills mess up the stack. Can you check if the > patch below helps? > >>> > >>> Thanks for providing a fix so I can try it. > >>> Yes, I confirm the patch below indeed fixes the sigsegv crash when > CRC32C update() method is used. > >>> I also confirm that I don't observe the crash on the fastdebug build, only > on the release build. > >>> It also only affects the Interpreter mode, so passing -Xcomp avoids the > crash on the release build. > >>> > >>> Just as reference, I can reproduce it on the release build with the > following trivial code: > >>> > >>> import java.util.zip.CRC32C; > >>> > >>> class CRC32C_v1 { > >>> public static void main(String[] arg) { > >>> byte[] b = new byte[1024]; > >>> > >>> CRC32C crc32c = new CRC32C(); > >>> crc32c.update(b, 0, b.length); > >>> > >>> System.out.println(crc32c.getValue()); > >>> } > >>> } > >>> > >>> Thanks for fixing the typos. > >>> > >>> > >>> Best regards, > >>> Gustavo > >>> > >>>> Best regards, > >>>> Martin > >>>> > >>>> > >>>> diff -r a33f49d5998c > src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp > >>>> --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu > Jan 03 17:30:03 2019 +0100 > >>>> +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp > Thu Jan 03 18:33:16 2019 +0100 > >>>> @@ -1924,6 +1924,9 @@ > >>>> __ addi(data, data, > arrayOopDesc::base_offset_in_bytes(T_BYTE)); > >>>> } > >>>> > >>>> + // Restore caller sp for c2i case. > >>>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the > caller started. > >>>> + > >>>> StubRoutines::ppc64::generate_load_crc_table_addr(_masm, > table); > >>>> > >>>> if (!VM_Version::has_vpmsumb()) { > >>>> @@ -1933,8 +1936,6 @@ > >>>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, > tc0, tc1, tc2, true); > >>>> } > >>>> > >>>> - // Restore caller sp for c2i case and return. > >>>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the > caller started. > >>>> __ blr(); > >>>> > >>>> // Generate a vanilla native entry as the slow path. > >>>> @@ -2014,6 +2015,9 @@ > >>>> __ addi(data, data, > arrayOopDesc::base_offset_in_bytes(T_BYTE)); > >>>> } > >>>> > >>>> + // Restore caller sp for c2i case. > >>>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the > caller started. > >>>> + > >>>> StubRoutines::ppc64::generate_load_crc32c_table_addr(_masm, > table); > >>>> > >>>> if (!VM_Version::has_vpmsumb()) { > >>>> @@ -2023,8 +2027,6 @@ > >>>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, > tc0, tc1, tc2, false); > >>>> } > >>>> > >>>> - // Restore caller sp for c2i case and return. > >>>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the > caller started. > >>>> __ blr(); > >>>> > >>>> BLOCK_COMMENT("} CRC32C_update{Bytes|DirectByteBuffer}"); > >>>> > >>>> > >>>> -----Original Message----- > >>>> From: Gustavo Romero > >>>> Sent: Donnerstag, 3. Januar 2019 17:13 > >>>> To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > >>>> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation > should be used by interpreter and be faster for short arrays > >>>> > >>>> Hi Martin, > >>>> > >>>> oh that's nice. You removed the 512-byte block constraint and also > wired it up to the Interpreter :) > >>>> > >>>> For the worst case, unaligned 512 byte array, I see the gap to aligned > 512 byte array reduced by about ~5.7x. > >>>> > >>>> On the Interpreter I see an improvement of at least 50% for 1024 bytes. > >>>> > >>>> This is all for the CRC32 class. > >>>> > >>>> On CRC32C I'm getting a SIGSEV that can be reproduced running against > ./test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java. > >>>> > >>>> I've upload a full log into > http://cr.openjdk.java.net/~gromero/logs/crc32c_sigsegv/ > >>>> > >>>> I'm leaving for the lunch and I'll take a closer look when back. But > probably you will figure it out before I sit to appreciate the meal :) > >>>> > >>>> Finally, since the change does some cleanup, I wonder if it would be > worth fixing the following typos: > >>>> > >>>> I think it's Barrett const., not Barret. Probably 'barret' is used in the > code as a short version > >>>> for Barrett but it should be changed in > >>>> > >>>> + // Point to Barret constants > >>>> + add_const_optimized(cur_const, constants, outer_consts_size + > inner_consts_size); > >>>> + > >>>> > >>>> ? > >>>> > >>>> s/not/note/ in: > >>>> cpu/ppc/macroAssembler_ppc.cpp:3977:// A not on the lookup table > address(es): > >>>> > >>>> d/lives/ in: > >>>> cpu/ppc/macroAssembler_ppc.cpp:4265: mtvrwz(VCRC, crc); // crc > lives lives in VCRC, now > >>>> > >>>> Best regards, > >>>> Gustavo > >>>> > >>>> On 01/03/2019 12:17 PM, Doerr, Martin wrote: > >>>>> Hi, > >>>>> > >>>>> the JVM on PPC64 currently misses usage of the fast vector > implementation in the interpreter code. > >>>>> > >>>>> In addition, performance is not good for short arrays (unaligned 512 > byte arrays or shorter arrays) because the current vector implementation > needs at least 512 bytes. > >>>>> > >>>>> Bug: > >>>>> > >>>>> https://bugs.openjdk.java.net/browse/JDK-8216060 > >>>>> > >>>>> I have addressed these 2 issues + some cleanup with the following > webrev: > >>>>> > >>>>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ > > >>>>> > >>>>> Please review. > >>>>> > >>>>> Best regards, > >>>>> > >>>>> Martin > >>>>> > >>>> > >>> > >> > > > From Alan.Bateman at oracle.com Fri Jan 18 13:32:37 2019 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Fri, 18 Jan 2019 13:32:37 +0000 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <21ef0e11-3f3d-e9a4-5dc6-898d4ac18efa@redhat.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <09c92b1a-c6da-e16b-bb68-553876e8a6ea@oracle.com> <2a0a385d-81ee-df66-f147-e4dd9aa5b72e@oracle.com> <8b2ab749-20f1-8dd3-3cc7-64db5d45bc7d@redhat.com> <0aae37aa-7797-fde5-63d5-96c8eb961183@oracle.com> <86a1988a-a8d2-b6af-0985-11a94d6d76a5@redhat.com> <69510788-52e6-815b-1ed7-a6f4886d0398@oracle.com> <3e3c4f7d-049e-4aec-c165-f2664e7c98ef@redhat.com> <34cfc530-8517-ac1a-0c04-446dc3dc2436@oracle.com> <21ef0e11-3f3d-e9a4-5dc6-898d4ac18efa@redhat.com> Message-ID: On 17/01/2019 14:27, Andrew Dinn wrote: > : >> Vladimir and I have reviewed the JEP, it will need an area lead to >> endorse, I think it can be Brian or Mikael in this case. > Ok, thanks for the above answers. Looking forward to hearing further > from Brian and/or Mikael (Vidstedt, I assume? :-). I had a brief discussion with Brian about this yesterday. He brought up the same concern about using MBB as it's not the right API for this in the longer term.? So this JEP is very much about a short term/tactical solution as we've already concluded here. This leads to the question as to whether this JEP needs to evolve the standard/Java SE API or not. It's convenient for the implementation of course but we should at least explore doing this as a JDK-specific feature. To that end, one approach to explore is allowing the FC.map method accept map modes beyond those defined by MapMode. There is precedence for extensibility in this area already, e.g. FC.open allows you to specify options beyond the standard options specified by the method. It would require MapMode to define a protected constructor and would require a bit of plumbing to support MapMode defined in a JDK-specific module but there are examples to point to. Another approach is aanother class in a JDK-specific module to define the map method. It would require the same plumbing under the covers but would avoid touch the FC spec. -Alan From rkennke at redhat.com Fri Jan 18 13:37:46 2019 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 18 Jan 2019 14:37:46 +0100 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions In-Reply-To: <3dd85d2c-f4d8-e360-21a2-68254b3c5e2b@redhat.com> References: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> <81dc5407-4985-883c-83fc-2e1fa2b77e66@redhat.com> <3dd85d2c-f4d8-e360-21a2-68254b3c5e2b@redhat.com> Message-ID: <2f209ec9-e7f9-8da3-64a2-20ac909b4931@redhat.com> > On 1/17/19 7:51 PM, B. Blaser wrote: >> Here it is on intel xeon with 5*10e9 iterations: >> * mov+cmov = 10.94s >> * cmov = 10.15s >> >> Thoughts? > > It looks like there's not much of a performance difference, but it might > help by freeing a register. OTOH, we'd still need to be sure we weren't > introducing a regression. We'd have to make sure that implicit null checks > work. I'm pretty sure that null-checks work, in general. I used the cmov instructions in an experiment that I did with Shenandoah barriers of which I'm pretty sure would have blown up badly if it wouldn't. One thing I'm not sure of is: does cmov generate a SIGSEGV on a bad address, even if the condition is not true? I doubt it, because then we couldn't use this for other types (long, int, etc). I'm more worried about the bottom-type issue that is mentioned in the comment and by Andrew Dinn, and it would be very helpful if anybody knows about it and could clarify. Failing that we could dig deeper and/or do extensive testing? Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From peter.levart at gmail.com Fri Jan 18 14:11:57 2019 From: peter.levart at gmail.com (Peter Levart) Date: Fri, 18 Jan 2019 15:11:57 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <09c92b1a-c6da-e16b-bb68-553876e8a6ea@oracle.com> <2a0a385d-81ee-df66-f147-e4dd9aa5b72e@oracle.com> <8b2ab749-20f1-8dd3-3cc7-64db5d45bc7d@redhat.com> <0aae37aa-7797-fde5-63d5-96c8eb961183@oracle.com> <86a1988a-a8d2-b6af-0985-11a94d6d76a5@redhat.com> <69510788-52e6-815b-1ed7-a6f4886d0398@oracle.com> <3e3c4f7d-049e-4aec-c165-f2664e7c98ef@redhat.com> <34cfc530-8517-ac1a-0c04-446dc3dc2436@oracle.com> <21ef0e11-3f3d-e9a4-5dc6-898d4ac18efa@redhat.com> Message-ID: Hi Alan, On 1/18/19 2:32 PM, Alan Bateman wrote: > On 17/01/2019 14:27, Andrew Dinn wrote: >> : >>> Vladimir and I have reviewed the JEP, it will need an area lead to >>> endorse, I think it can be Brian or Mikael in this case. >> Ok, thanks for the above answers. Looking forward to hearing further >> from Brian and/or Mikael (Vidstedt, I assume? :-). > I had a brief discussion with Brian about this yesterday. He brought > up the same concern about using MBB as it's not the right API for this > in the longer term.? So this JEP is very much about a short > term/tactical solution as we've already concluded here. This leads to > the question as to whether this JEP needs to evolve the standard/Java > SE API or not. It's convenient for the implementation of course but we > should at least explore doing this as a JDK-specific feature. > > To that end, one approach to explore is allowing the FC.map method > accept map modes beyond those defined by MapMode. There is precedence > for extensibility in this area already, e.g. FC.open allows you to > specify options beyond the standard options specified by the method. > It would require MapMode to define a protected constructor and would > require a bit of plumbing to support MapMode defined in a JDK-specific > module but there are examples to point to. You meant package-private constructor, right? Protected constructor would allow subclassing MapMode by arbitrary user class which is not what would be desirable. So perhaps all that is needed is to declare the static final field in the MapMode class as package-private. That would allow referenceing it in the java.nio.channels package. Then add SharedSecrets mechanism to expose it's value to other needed java.base packages and to the additional module that would expose it publicly... Regards, Peter From peter.levart at gmail.com Fri Jan 18 14:28:34 2019 From: peter.levart at gmail.com (Peter Levart) Date: Fri, 18 Jan 2019 15:28:34 +0100 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <09c92b1a-c6da-e16b-bb68-553876e8a6ea@oracle.com> <2a0a385d-81ee-df66-f147-e4dd9aa5b72e@oracle.com> <8b2ab749-20f1-8dd3-3cc7-64db5d45bc7d@redhat.com> <0aae37aa-7797-fde5-63d5-96c8eb961183@oracle.com> <86a1988a-a8d2-b6af-0985-11a94d6d76a5@redhat.com> <69510788-52e6-815b-1ed7-a6f4886d0398@oracle.com> <3e3c4f7d-049e-4aec-c165-f2664e7c98ef@redhat.com> <34cfc530-8517-ac1a-0c04-446dc3dc2436@oracle.com> <21ef0e11-3f3d-e9a4-5dc6-898d4ac18efa@redhat.com> Message-ID: <708555d0-d3e5-2d2c-f69d-16f76a83f66a@gmail.com> On 1/18/19 3:11 PM, Peter Levart wrote: > You meant package-private constructor, right? Protected constructor > would allow subclassing MapMode by arbitrary user class which is not > what would be desirable. ...unless you actually want users to construct their own MapMode(s), like you mentioned is the case with FileChannel.open() and FileAttribute interface. But there this makes sense because the backend (FileSystem) is also pluggable, so users can define their own FileSystem implementations that consume their own FileAttribute(s)... Are you proposing to add an spi for MappedByteBuffer's here? That would be an overkill for this feature, I think... Regards, Peter From martin.doerr at sap.com Fri Jan 18 14:32:45 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 18 Jan 2019 14:32:45 +0000 Subject: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: <8b1ca2bdba334f42a3c2b044a557dd8c@sap.com> References: <1c4646d554954551b73c077fa40f983d@sap.com> <771a457e-6b4d-f73b-e072-703490c9ced5@linux.vnet.ibm.com> <9863276de30643338249ead2a6ac7fe9@sap.com> <452dcb69-189e-700d-5995-582ba13669b9@linux.vnet.ibm.com> <37d9e6f3b2b4400d8963f54d2fe7767f@sap.com> <406db3e3-2ac3-dc16-f384-99a314e62a42@linux.vnet.ibm.com> <180a6c0b-7abe-9d5c-51e6-dffbb23570d3@linux.vnet.ibm.com> <301fd43a-e5b5-d970-7a1a-2458dbaeec36@linux.vnet.ibm.com> <8b1ca2bdba334f42a3c2b044a557dd8c@sap.com> Message-ID: Hi G?tz, that's a good proposal. I've moved the common functionality into macroAssembler_ppc. This makes interpreter and stubGenerator code shorter. I've also moved the vector constants computation to stubGenerator such that we only do it when the intrinsics are enabled and the vector version is supported by the processor. New webrev: http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.02/ @Gustavo: Thanks for testing and confirming the issue (JDK-8216376) is fixed. Best regards, Martin -----Original Message----- From: Lindenmaier, Goetz Sent: Freitag, 18. Januar 2019 12:03 To: Doerr, Martin ; Gustavo Romero ; 'hotspot-compiler-dev at openjdk.java.net' Subject: RE: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays Hi Martin, I had a look at your change. Overall looks good. According to Gustavos mail a nice improvement! I think though that the way to select the algorithm is quite messy: In templateInterpreter vpmsumb is checked and the methods are called directly. In stubGenerator, generate_CRC32...() vpmsumb is tested to decide on vector_constants = R2. and generic generate_CRC_updateBytes is called, which again checks whether verctor_constants == R2. I think generate_CRC_updateBytes() or some other generic function should be located in macroAssembler_ppc and be called from both locations. What do you think? Best regards, Goetz > -----Original Message----- > From: Doerr, Martin > Sent: Donnerstag, 17. Januar 2019 14:18 > To: Gustavo Romero ; 'hotspot-compiler- > dev at openjdk.java.net' ; > Lindenmaier, Goetz > Subject: RE: RFR(M): 8216060: [PPC64] Vector CRC implementation should be > used by interpreter and be faster for short arrays > > Hi, > > the rebased webrev.01 applies on jdk/jdk, now (after JDK-8216376). So the > issue Gustavo had observed does not longer exist. > http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ > > I have updated copyrights and retested it. > > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero > Sent: Montag, 7. Januar 2019 14:52 > To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should be > used by interpreter and be faster for short arrays > > Hi Martin, > > On 01/07/2019 11:49 AM, Doerr, Martin wrote: > > I want to check all places where we use "mr(R1_SP, R21_sender_SP)". > There may be more issues with that. I'll probably handle that in a separate > change and push this CRC change afterwards. > > I see. Thanks for letting me know. > > Best regards, > Gustavo > > > Best regards, > > Martin > > > > > > -----Original Message----- > > From: Gustavo Romero > > Sent: Freitag, 4. Januar 2019 19:55 > > To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > > Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should > be used by interpreter and be faster for short arrays > > > > Hi Martin, > > > > On 01/04/2019 02:13 PM, Doerr, Martin wrote: > >> Hi Gustavo, > >> > >> when called from the interpreter (the scenario you observed), R21 is set > before resizing the frame to avoid wasted stack space > (InterpreterMacroAssembler::call_from_interpreter). > > > > Got it. Thanks a lot for the explanations. > > > > I think it doesn't currently matter in practice, but I'm wondering if to be > > consistent we should cut back the stack back earlier also in > > TemplateInterpreterGenerator::generate_CRC32_update_entry()? > > > > diff -r a35f8c35d8c9 > src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp > > --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan > 04 10:09:00 2019 +0100 > > +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan > 04 13:44:37 2019 -0500 > > @@ -1840,11 +1840,12 @@ > > #endif > > __ lwz(crc, 2*wordSize, argP); // Current crc state, zero extend to 64 > bit to have a clean register. > > > > + // Restore caller sp for c2i case and return. > > + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller > started. > > + > > StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); > > __ kernel_crc32_singleByte(crc, data, dataLen, table, tmp, true); > > > > - // Restore caller sp for c2i case and return. > > - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller > started. > > __ blr(); > > > > // Generate a vanilla native entry as the slow path. > > > > Currently there is no issue probably because generated code is simpler and > does > > no spills. > > > > Best regards, > > Gustavo > > > >> When called from compiled methods, R21 is set by a c2i adapter which > extends the compiled frame by space for arguments (gen_c2i_adapter). > >> > >> "mr(R1_SP, R21_sender_SP)" is more error-prone than > "resize_frame_absolute" so I think the latter would be better (though it takes > more registers and instructions), but I don't want to replace that as part of > this CRC change. > >> > >> Best regards, > >> Martin > >> > >> > >> -----Original Message----- > >> From: Gustavo Romero > >> Sent: Freitag, 4. Januar 2019 14:44 > >> To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > >> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should > be used by interpreter and be faster for short arrays > >> > >> Hi Martin, > >> > >> On 01/04/2019 07:30 AM, Doerr, Martin wrote: > >>> thank you very much for confirming. This makes sense. We use different > frame headers depending on whether the frame is the top Java frame or not > (and on whether it's a debug build or not). Setting R1_SP to sender_SP is a > shortcut for leaf calls which relies on having an unmodified stack until this > point. So the patch fixes the issue. > >> > >> Glad to help! Thanks for the additional information, I was not aware that > the > >> selection of different frame headers could be done at compile time. One > last > >> question only for my education: what exactly advanced (incremented) > R1_SP so it > >> has to be cut back using sender_SP value, i.e. sender_SP tracks the frame > for > >> which function exactly or "who" is the caller exactly here? > >> > >> Thank you. > >> > >> Best regards, > >> Gustavo > >> > >>> New webrev: > >>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ > >>> > >>> Best regards, > >>> Martin > >>> > >>> > >>> -----Original Message----- > >>> From: Gustavo Romero > >>> Sent: Donnerstag, 3. Januar 2019 19:36 > >>> To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > >>> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation > should be used by interpreter and be faster for short arrays > >>> > >>> Hi Martin, > >>> > >>> On 01/03/2019 03:34 PM, Doerr, Martin wrote: > >>>> Unfortunately, I can't reproduce the crash. TestCRC32C works stable on > our machine (with fastdbg build). > >>>> I guess that the frameless spills mess up the stack. Can you check if the > patch below helps? > >>> > >>> Thanks for providing a fix so I can try it. > >>> Yes, I confirm the patch below indeed fixes the sigsegv crash when > CRC32C update() method is used. > >>> I also confirm that I don't observe the crash on the fastdebug build, only > on the release build. > >>> It also only affects the Interpreter mode, so passing -Xcomp avoids the > crash on the release build. > >>> > >>> Just as reference, I can reproduce it on the release build with the > following trivial code: > >>> > >>> import java.util.zip.CRC32C; > >>> > >>> class CRC32C_v1 { > >>> public static void main(String[] arg) { > >>> byte[] b = new byte[1024]; > >>> > >>> CRC32C crc32c = new CRC32C(); > >>> crc32c.update(b, 0, b.length); > >>> > >>> System.out.println(crc32c.getValue()); > >>> } > >>> } > >>> > >>> Thanks for fixing the typos. > >>> > >>> > >>> Best regards, > >>> Gustavo > >>> > >>>> Best regards, > >>>> Martin > >>>> > >>>> > >>>> diff -r a33f49d5998c > src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp > >>>> --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu > Jan 03 17:30:03 2019 +0100 > >>>> +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp > Thu Jan 03 18:33:16 2019 +0100 > >>>> @@ -1924,6 +1924,9 @@ > >>>> __ addi(data, data, > arrayOopDesc::base_offset_in_bytes(T_BYTE)); > >>>> } > >>>> > >>>> + // Restore caller sp for c2i case. > >>>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the > caller started. > >>>> + > >>>> StubRoutines::ppc64::generate_load_crc_table_addr(_masm, > table); > >>>> > >>>> if (!VM_Version::has_vpmsumb()) { > >>>> @@ -1933,8 +1936,6 @@ > >>>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, > tc0, tc1, tc2, true); > >>>> } > >>>> > >>>> - // Restore caller sp for c2i case and return. > >>>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the > caller started. > >>>> __ blr(); > >>>> > >>>> // Generate a vanilla native entry as the slow path. > >>>> @@ -2014,6 +2015,9 @@ > >>>> __ addi(data, data, > arrayOopDesc::base_offset_in_bytes(T_BYTE)); > >>>> } > >>>> > >>>> + // Restore caller sp for c2i case. > >>>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the > caller started. > >>>> + > >>>> StubRoutines::ppc64::generate_load_crc32c_table_addr(_masm, > table); > >>>> > >>>> if (!VM_Version::has_vpmsumb()) { > >>>> @@ -2023,8 +2027,6 @@ > >>>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, > tc0, tc1, tc2, false); > >>>> } > >>>> > >>>> - // Restore caller sp for c2i case and return. > >>>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the > caller started. > >>>> __ blr(); > >>>> > >>>> BLOCK_COMMENT("} CRC32C_update{Bytes|DirectByteBuffer}"); > >>>> > >>>> > >>>> -----Original Message----- > >>>> From: Gustavo Romero > >>>> Sent: Donnerstag, 3. Januar 2019 17:13 > >>>> To: Doerr, Martin ; 'hotspot-compiler- > dev at openjdk.java.net' > >>>> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation > should be used by interpreter and be faster for short arrays > >>>> > >>>> Hi Martin, > >>>> > >>>> oh that's nice. You removed the 512-byte block constraint and also > wired it up to the Interpreter :) > >>>> > >>>> For the worst case, unaligned 512 byte array, I see the gap to aligned > 512 byte array reduced by about ~5.7x. > >>>> > >>>> On the Interpreter I see an improvement of at least 50% for 1024 bytes. > >>>> > >>>> This is all for the CRC32 class. > >>>> > >>>> On CRC32C I'm getting a SIGSEV that can be reproduced running against > ./test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java. > >>>> > >>>> I've upload a full log into > http://cr.openjdk.java.net/~gromero/logs/crc32c_sigsegv/ > >>>> > >>>> I'm leaving for the lunch and I'll take a closer look when back. But > probably you will figure it out before I sit to appreciate the meal :) > >>>> > >>>> Finally, since the change does some cleanup, I wonder if it would be > worth fixing the following typos: > >>>> > >>>> I think it's Barrett const., not Barret. Probably 'barret' is used in the > code as a short version > >>>> for Barrett but it should be changed in > >>>> > >>>> + // Point to Barret constants > >>>> + add_const_optimized(cur_const, constants, outer_consts_size + > inner_consts_size); > >>>> + > >>>> > >>>> ? > >>>> > >>>> s/not/note/ in: > >>>> cpu/ppc/macroAssembler_ppc.cpp:3977:// A not on the lookup table > address(es): > >>>> > >>>> d/lives/ in: > >>>> cpu/ppc/macroAssembler_ppc.cpp:4265: mtvrwz(VCRC, crc); // crc > lives lives in VCRC, now > >>>> > >>>> Best regards, > >>>> Gustavo > >>>> > >>>> On 01/03/2019 12:17 PM, Doerr, Martin wrote: > >>>>> Hi, > >>>>> > >>>>> the JVM on PPC64 currently misses usage of the fast vector > implementation in the interpreter code. > >>>>> > >>>>> In addition, performance is not good for short arrays (unaligned 512 > byte arrays or shorter arrays) because the current vector implementation > needs at least 512 bytes. > >>>>> > >>>>> Bug: > >>>>> > >>>>> https://bugs.openjdk.java.net/browse/JDK-8216060 > >>>>> > >>>>> I have addressed these 2 issues + some cleanup with the following > webrev: > >>>>> > >>>>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ > > >>>>> > >>>>> Please review. > >>>>> > >>>>> Best regards, > >>>>> > >>>>> Martin > >>>>> > >>>> > >>> > >> > > > From goetz.lindenmaier at sap.com Fri Jan 18 14:42:14 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 18 Jan 2019 14:42:14 +0000 Subject: RFR(M): 8216060: [PPC64] Vector CRC implementation should be used by interpreter and be faster for short arrays In-Reply-To: References: <1c4646d554954551b73c077fa40f983d@sap.com> <771a457e-6b4d-f73b-e072-703490c9ced5@linux.vnet.ibm.com> <9863276de30643338249ead2a6ac7fe9@sap.com> <452dcb69-189e-700d-5995-582ba13669b9@linux.vnet.ibm.com> <37d9e6f3b2b4400d8963f54d2fe7767f@sap.com> <406db3e3-2ac3-dc16-f384-99a314e62a42@linux.vnet.ibm.com> <180a6c0b-7abe-9d5c-51e6-dffbb23570d3@linux.vnet.ibm.com> <301fd43a-e5b5-d970-7a1a-2458dbaeec36@linux.vnet.ibm.com> <8b1ca2bdba334f42a3c2b044a557dd8c@sap.com> Message-ID: <3a71eaf686bb4cf48946d668c6cb3868@sap.com> Hi Martin, thanks for improving this, looks good now! Actually, this is much more cleanup than I expected :) Best regards, Goetz. > -----Original Message----- > From: Doerr, Martin > Sent: Freitag, 18. Januar 2019 15:33 > To: Lindenmaier, Goetz ; Gustavo Romero > ; 'hotspot-compiler-dev at openjdk.java.net' > > Subject: RE: RFR(M): 8216060: [PPC64] Vector CRC implementation should be > used by interpreter and be faster for short arrays > > Hi G?tz, > > that's a good proposal. I've moved the common functionality into > macroAssembler_ppc. This makes interpreter and stubGenerator code shorter. > > I've also moved the vector constants computation to stubGenerator such that > we only do it when the intrinsics are enabled and the vector version is > supported by the processor. > > New webrev: > http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.02/ > > @Gustavo: Thanks for testing and confirming the issue (JDK-8216376) is fixed. > > Best regards, > Martin > > > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Freitag, 18. Januar 2019 12:03 > To: Doerr, Martin ; Gustavo Romero > ; 'hotspot-compiler-dev at openjdk.java.net' > > Subject: RE: RFR(M): 8216060: [PPC64] Vector CRC implementation should be > used by interpreter and be faster for short arrays > > Hi Martin, > > I had a look at your change. > Overall looks good. According to Gustavos mail a nice improvement! > > I think though that the way to select the algorithm is quite > messy: > In templateInterpreter vpmsumb is checked and the methods are > called directly. > In stubGenerator, generate_CRC32...() > vpmsumb is tested to decide on vector_constants = R2. > and generic generate_CRC_updateBytes is called, which > again checks whether verctor_constants == R2. > > I think generate_CRC_updateBytes() or some other generic > function should be located in macroAssembler_ppc and > be called from both locations. > > What do you think? > > Best regards, > Goetz > > > > > -----Original Message----- > > From: Doerr, Martin > > Sent: Donnerstag, 17. Januar 2019 14:18 > > To: Gustavo Romero ; 'hotspot-compiler- > > dev at openjdk.java.net' ; > > Lindenmaier, Goetz > > Subject: RE: RFR(M): 8216060: [PPC64] Vector CRC implementation should > be > > used by interpreter and be faster for short arrays > > > > Hi, > > > > the rebased webrev.01 applies on jdk/jdk, now (after JDK-8216376). So the > > issue Gustavo had observed does not longer exist. > > http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ > > > > I have updated copyrights and retested it. > > > > Best regards, > > Martin > > > > > > -----Original Message----- > > From: Gustavo Romero > > Sent: Montag, 7. Januar 2019 14:52 > > To: Doerr, Martin ; 'hotspot-compiler- > > dev at openjdk.java.net' > > Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should > be > > used by interpreter and be faster for short arrays > > > > Hi Martin, > > > > On 01/07/2019 11:49 AM, Doerr, Martin wrote: > > > I want to check all places where we use "mr(R1_SP, R21_sender_SP)". > > There may be more issues with that. I'll probably handle that in a separate > > change and push this CRC change afterwards. > > > > I see. Thanks for letting me know. > > > > Best regards, > > Gustavo > > > > > Best regards, > > > Martin > > > > > > > > > -----Original Message----- > > > From: Gustavo Romero > > > Sent: Freitag, 4. Januar 2019 19:55 > > > To: Doerr, Martin ; 'hotspot-compiler- > > dev at openjdk.java.net' > > > Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation should > > be used by interpreter and be faster for short arrays > > > > > > Hi Martin, > > > > > > On 01/04/2019 02:13 PM, Doerr, Martin wrote: > > >> Hi Gustavo, > > >> > > >> when called from the interpreter (the scenario you observed), R21 is set > > before resizing the frame to avoid wasted stack space > > (InterpreterMacroAssembler::call_from_interpreter). > > > > > > Got it. Thanks a lot for the explanations. > > > > > > I think it doesn't currently matter in practice, but I'm wondering if to be > > > consistent we should cut back the stack back earlier also in > > > TemplateInterpreterGenerator::generate_CRC32_update_entry()? > > > > > > diff -r a35f8c35d8c9 > > src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp > > > --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan > > 04 10:09:00 2019 +0100 > > > +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Fri Jan > > 04 13:44:37 2019 -0500 > > > @@ -1840,11 +1840,12 @@ > > > #endif > > > __ lwz(crc, 2*wordSize, argP); // Current crc state, zero extend to 64 > > bit to have a clean register. > > > > > > + // Restore caller sp for c2i case and return. > > > + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller > > started. > > > + > > > StubRoutines::ppc64::generate_load_crc_table_addr(_masm, table); > > > __ kernel_crc32_singleByte(crc, data, dataLen, table, tmp, true); > > > > > > - // Restore caller sp for c2i case and return. > > > - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the caller > > started. > > > __ blr(); > > > > > > // Generate a vanilla native entry as the slow path. > > > > > > Currently there is no issue probably because generated code is simpler and > > does > > > no spills. > > > > > > Best regards, > > > Gustavo > > > > > >> When called from compiled methods, R21 is set by a c2i adapter which > > extends the compiled frame by space for arguments (gen_c2i_adapter). > > >> > > >> "mr(R1_SP, R21_sender_SP)" is more error-prone than > > "resize_frame_absolute" so I think the latter would be better (though it takes > > more registers and instructions), but I don't want to replace that as part of > > this CRC change. > > >> > > >> Best regards, > > >> Martin > > >> > > >> > > >> -----Original Message----- > > >> From: Gustavo Romero > > >> Sent: Freitag, 4. Januar 2019 14:44 > > >> To: Doerr, Martin ; 'hotspot-compiler- > > dev at openjdk.java.net' > > >> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation > should > > be used by interpreter and be faster for short arrays > > >> > > >> Hi Martin, > > >> > > >> On 01/04/2019 07:30 AM, Doerr, Martin wrote: > > >>> thank you very much for confirming. This makes sense. We use different > > frame headers depending on whether the frame is the top Java frame or not > > (and on whether it's a debug build or not). Setting R1_SP to sender_SP is a > > shortcut for leaf calls which relies on having an unmodified stack until this > > point. So the patch fixes the issue. > > >> > > >> Glad to help! Thanks for the additional information, I was not aware that > > the > > >> selection of different frame headers could be done at compile time. One > > last > > >> question only for my education: what exactly advanced (incremented) > > R1_SP so it > > >> has to be cut back using sender_SP value, i.e. sender_SP tracks the frame > > for > > >> which function exactly or "who" is the caller exactly here? > > >> > > >> Thank you. > > >> > > >> Best regards, > > >> Gustavo > > >> > > >>> New webrev: > > >>> http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.01/ > > >>> > > >>> Best regards, > > >>> Martin > > >>> > > >>> > > >>> -----Original Message----- > > >>> From: Gustavo Romero > > >>> Sent: Donnerstag, 3. Januar 2019 19:36 > > >>> To: Doerr, Martin ; 'hotspot-compiler- > > dev at openjdk.java.net' > > >>> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation > > should be used by interpreter and be faster for short arrays > > >>> > > >>> Hi Martin, > > >>> > > >>> On 01/03/2019 03:34 PM, Doerr, Martin wrote: > > >>>> Unfortunately, I can't reproduce the crash. TestCRC32C works stable on > > our machine (with fastdbg build). > > >>>> I guess that the frameless spills mess up the stack. Can you check if the > > patch below helps? > > >>> > > >>> Thanks for providing a fix so I can try it. > > >>> Yes, I confirm the patch below indeed fixes the sigsegv crash when > > CRC32C update() method is used. > > >>> I also confirm that I don't observe the crash on the fastdebug build, only > > on the release build. > > >>> It also only affects the Interpreter mode, so passing -Xcomp avoids the > > crash on the release build. > > >>> > > >>> Just as reference, I can reproduce it on the release build with the > > following trivial code: > > >>> > > >>> import java.util.zip.CRC32C; > > >>> > > >>> class CRC32C_v1 { > > >>> public static void main(String[] arg) { > > >>> byte[] b = new byte[1024]; > > >>> > > >>> CRC32C crc32c = new CRC32C(); > > >>> crc32c.update(b, 0, b.length); > > >>> > > >>> System.out.println(crc32c.getValue()); > > >>> } > > >>> } > > >>> > > >>> Thanks for fixing the typos. > > >>> > > >>> > > >>> Best regards, > > >>> Gustavo > > >>> > > >>>> Best regards, > > >>>> Martin > > >>>> > > >>>> > > >>>> diff -r a33f49d5998c > > src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp > > >>>> --- a/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp Thu > > Jan 03 17:30:03 2019 +0100 > > >>>> +++ b/src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp > > Thu Jan 03 18:33:16 2019 +0100 > > >>>> @@ -1924,6 +1924,9 @@ > > >>>> __ addi(data, data, > > arrayOopDesc::base_offset_in_bytes(T_BYTE)); > > >>>> } > > >>>> > > >>>> + // Restore caller sp for c2i case. > > >>>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the > > caller started. > > >>>> + > > >>>> StubRoutines::ppc64::generate_load_crc_table_addr(_masm, > > table); > > >>>> > > >>>> if (!VM_Version::has_vpmsumb()) { > > >>>> @@ -1933,8 +1936,6 @@ > > >>>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, > > tc0, tc1, tc2, true); > > >>>> } > > >>>> > > >>>> - // Restore caller sp for c2i case and return. > > >>>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the > > caller started. > > >>>> __ blr(); > > >>>> > > >>>> // Generate a vanilla native entry as the slow path. > > >>>> @@ -2014,6 +2015,9 @@ > > >>>> __ addi(data, data, > > arrayOopDesc::base_offset_in_bytes(T_BYTE)); > > >>>> } > > >>>> > > >>>> + // Restore caller sp for c2i case. > > >>>> + __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the > > caller started. > > >>>> + > > >>>> StubRoutines::ppc64::generate_load_crc32c_table_addr(_masm, > > table); > > >>>> > > >>>> if (!VM_Version::has_vpmsumb()) { > > >>>> @@ -2023,8 +2027,6 @@ > > >>>> __ kernel_crc32_vpmsum(crc, data, dataLen, table, t0, t1, t2, t3, > > tc0, tc1, tc2, false); > > >>>> } > > >>>> > > >>>> - // Restore caller sp for c2i case and return. > > >>>> - __ mr(R1_SP, R21_sender_SP); // Cut the stack back to where the > > caller started. > > >>>> __ blr(); > > >>>> > > >>>> BLOCK_COMMENT("} CRC32C_update{Bytes|DirectByteBuffer}"); > > >>>> > > >>>> > > >>>> -----Original Message----- > > >>>> From: Gustavo Romero > > >>>> Sent: Donnerstag, 3. Januar 2019 17:13 > > >>>> To: Doerr, Martin ; 'hotspot-compiler- > > dev at openjdk.java.net' > > >>>> Subject: Re: RFR(M): 8216060: [PPC64] Vector CRC implementation > > should be used by interpreter and be faster for short arrays > > >>>> > > >>>> Hi Martin, > > >>>> > > >>>> oh that's nice. You removed the 512-byte block constraint and also > > wired it up to the Interpreter :) > > >>>> > > >>>> For the worst case, unaligned 512 byte array, I see the gap to aligned > > 512 byte array reduced by about ~5.7x. > > >>>> > > >>>> On the Interpreter I see an improvement of at least 50% for 1024 bytes. > > >>>> > > >>>> This is all for the CRC32 class. > > >>>> > > >>>> On CRC32C I'm getting a SIGSEV that can be reproduced running > against > > ./test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java. > > >>>> > > >>>> I've upload a full log into > > http://cr.openjdk.java.net/~gromero/logs/crc32c_sigsegv/ > > >>>> > > >>>> I'm leaving for the lunch and I'll take a closer look when back. But > > probably you will figure it out before I sit to appreciate the meal :) > > >>>> > > >>>> Finally, since the change does some cleanup, I wonder if it would be > > worth fixing the following typos: > > >>>> > > >>>> I think it's Barrett const., not Barret. Probably 'barret' is used in the > > code as a short version > > >>>> for Barrett but it should be changed in > > >>>> > > >>>> + // Point to Barret constants > > >>>> + add_const_optimized(cur_const, constants, outer_consts_size + > > inner_consts_size); > > >>>> + > > >>>> > > >>>> ? > > >>>> > > >>>> s/not/note/ in: > > >>>> cpu/ppc/macroAssembler_ppc.cpp:3977:// A not on the lookup table > > address(es): > > >>>> > > >>>> d/lives/ in: > > >>>> cpu/ppc/macroAssembler_ppc.cpp:4265: mtvrwz(VCRC, crc); // crc > > lives lives in VCRC, now > > >>>> > > >>>> Best regards, > > >>>> Gustavo > > >>>> > > >>>> On 01/03/2019 12:17 PM, Doerr, Martin wrote: > > >>>>> Hi, > > >>>>> > > >>>>> the JVM on PPC64 currently misses usage of the fast vector > > implementation in the interpreter code. > > >>>>> > > >>>>> In addition, performance is not good for short arrays (unaligned 512 > > byte arrays or shorter arrays) because the current vector implementation > > needs at least 512 bytes. > > >>>>> > > >>>>> Bug: > > >>>>> > > >>>>> https://bugs.openjdk.java.net/browse/JDK-8216060 > > >>>>> > > >>>>> I have addressed these 2 issues + some cleanup with the following > > webrev: > > >>>>> > > >>>>> > http://cr.openjdk.java.net/~mdoerr/8216060_PPC64_CRC/webrev.00/ > > > > >>>>> > > >>>>> Please review. > > >>>>> > > >>>>> Best regards, > > >>>>> > > >>>>> Martin > > >>>>> > > >>>> > > >>> > > >> > > > > > > > From aph at redhat.com Fri Jan 18 14:56:07 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 18 Jan 2019 14:56:07 +0000 Subject: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered In-Reply-To: <4a09e8b7-9990-aa66-0afb-bf4e41cab831@arm.com> References: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> <4a09e8b7-9990-aa66-0afb-bf4e41cab831@arm.com> Message-ID: Hi, On 1/18/19 9:52 AM, Nick Gasson (Arm Technology China) wrote: > On 18/01/2019 17:36, Andrew Haley wrote: >> >> The patch looks good. However, I don't understand why we aren't using >> MacroAssembler::cmpxchgptr here. It looks like we should be, and you'd >> end up with a less complex result. > > It's not exactly the same though: MacroAssembler::cmpxchgptr adds a "dmb > ish" to the failure path which I don't think is required here. Oh, sorry. I should have said MacroAssembler::cmpxchg, with a br.eq(cont) afterward. >>> * Does anyone know what the comment "// Load Compare Value application >>> register." means? It's present in the PPC and S390 ports too. >> >> Probably no-one can remember. We'll have inherited it from x86. > > Let's delete it then. OK. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From gromero at linux.vnet.ibm.com Fri Jan 18 14:57:13 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Fri, 18 Jan 2019 12:57:13 -0200 Subject: [11u backport] RFR(S): 8215317: [GRAAL] unit test CheckGraalIntrinsics failed after 8213754 Message-ID: <2ed804b4-85d5-feb7-edab-85d6dee66c74@linux.vnet.ibm.com> Hi, Could the following backport to 11u be reviewed, please? Bug : https://bugs.openjdk.java.net/browse/JDK-8215317 Change : http://hg.openjdk.java.net/jdk/jdk/rev/108a161aed93 Backport: http://cr.openjdk.java.net/~gromero/8215317_jdk11u/v1/ It adds 4 intrinsics to the Graal test CheckGraalIntrinsics.java list so JDK 11u becomes aware of them. Otherwise that test will break once change 8213754 [0] lands 11u (which will effectively add the 4 intrinsics to PPC64/Hotspot and adapt the correlated methods to be intrinsified). The backport changed the inclusion of the intrinsics for JDK 11 or higher, instead for JDK 12 or higher (original patch). This backport was tested on x86_64 with ./test/hotspot/jtreg/compiler/{c1,c2,intrinsics} plus ./test/hotspot/jtreg/compiler/graalunit (with Graal compiler enabled) and no regressions were observed too. Thank you. Best regards, Gustavo [0] https://bugs.openjdk.java.net/browse/JDK-8213754 From gromero at linux.vnet.ibm.com Fri Jan 18 15:07:20 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Fri, 18 Jan 2019 13:07:20 -0200 Subject: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace Message-ID: <2d4d1747-a83d-5f65-eea3-d982969ae4fd@linux.vnet.ibm.com> Hi, Could the following backport to 11u be reviewed, please? Bug : https://bugs.openjdk.java.net/browse/JDK-8213754 Change : http://hg.openjdk.java.net/jdk/jdk/rev/7384e00d5860 Backport: http://cr.openjdk.java.net/~gromero/8213754_jdk11u/v1/ It adds 4 intrinsics that use instructions introduced by POWER9 in order to speed up methods isDigit, isLowerCase, isUpperCase, and isWhitespace. The change is mostly PPC64-only but it does touch shared code, for instance, in order to adapt the methods in question to be properly intrinsified. It also needs an additional change [0], since one Graal test has to be adapted (a separated RFR to backport [0] was sent to [1]). The change applies almost cleanly: only a small tweak is necessary because the hunk for ppc.ad file relies on some absent text in the 11u code around the change to be applied. That absent text is related to the Superword feature (a non-related feature), which is not backported yet to 11u. This backport was tested on POWER8 and POWER9 and no regressions were observed. This backport was also tested on x86_64 with ./test/hotspot/jtreg/compiler/{c1,c2,intrinsics} plus ./test/hotspot/jtreg/compiler/graalunit (with Graal compiler enabled) with change 8215317 [0] applied and no regressions were observed too. Thank you. Best regards, Gustavo [0] http://cr.openjdk.java.net/~gromero/8215317_jdk11u/v1/ [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032266.html From Roger.Riggs at oracle.com Fri Jan 18 15:35:24 2019 From: Roger.Riggs at oracle.com (Roger Riggs) Date: Fri, 18 Jan 2019 10:35:24 -0500 Subject: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace In-Reply-To: <2d4d1747-a83d-5f65-eea3-d982969ae4fd@linux.vnet.ibm.com> References: <2d4d1747-a83d-5f65-eea3-d982969ae4fd@linux.vnet.ibm.com> Message-ID: Looks good for the jdk files. Regards, Roger On 01/18/2019 10:07 AM, Gustavo Romero wrote: > Hi, > > Could the following backport to 11u be reviewed, please? > > Bug???? : https://bugs.openjdk.java.net/browse/JDK-8213754 > Change? : http://hg.openjdk.java.net/jdk/jdk/rev/7384e00d5860 > Backport: http://cr.openjdk.java.net/~gromero/8213754_jdk11u/v1/ > > It adds 4 intrinsics that use instructions introduced by POWER9 in > order to > speed up methods isDigit, isLowerCase, isUpperCase, and isWhitespace. > > The change is mostly PPC64-only but it does touch shared code, for > instance, in order to adapt the methods in question to be properly > intrinsified. It also needs an additional change [0], since one Graal > test has to be adapted (a separated RFR to backport [0] was sent to [1]). > > The change applies almost cleanly: only a small tweak is necessary > because > the hunk for ppc.ad file relies on some absent text in the 11u code > around > the change to be applied. That absent text is related to the Superword > feature (a non-related feature), which is not backported yet to 11u. > > This backport was tested on POWER8 and POWER9 and no regressions were > observed. > > This backport was also tested on x86_64 with > ./test/hotspot/jtreg/compiler/{c1,c2,intrinsics} plus > ./test/hotspot/jtreg/compiler/graalunit (with Graal compiler enabled) > with > change 8215317 [0] applied and no regressions were observed too. > > Thank you. > > Best regards, > Gustavo > > [0] http://cr.openjdk.java.net/~gromero/8215317_jdk11u/v1/ > [1] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032266.html > -------------- next part -------------- An HTML attachment was scrubbed... URL: From claes.redestad at oracle.com Fri Jan 18 16:03:45 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Fri, 18 Jan 2019 17:03:45 +0100 Subject: RFR: 8217387: Remove dead develop flag CIFireOOMAt Message-ID: Hi, the develop flag CIFireOOMAt is effectively dead and should be removed. Webrev: http://cr.openjdk.java.net/~redestad/8217387/open.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8217387 Thanks! /Claes From shade at redhat.com Fri Jan 18 16:01:04 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 18 Jan 2019 17:01:04 +0100 Subject: RFR: 8217387: Remove dead develop flag CIFireOOMAt In-Reply-To: References: Message-ID: <03f5e7bf-bb4c-a4bd-c959-1ad3c754f130@redhat.com> On 1/18/19 5:03 PM, Claes Redestad wrote: > the develop flag CIFireOOMAt is effectively dead and should be removed. > > Webrev: http://cr.openjdk.java.net/~redestad/8217387/open.00/ > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8217387 Looks good. There are indeed no "write" usages. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From claes.redestad at oracle.com Fri Jan 18 16:10:08 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Fri, 18 Jan 2019 17:10:08 +0100 Subject: RFR: 8217387: Remove dead develop flag CIFireOOMAt In-Reply-To: <03f5e7bf-bb4c-a4bd-c959-1ad3c754f130@redhat.com> References: <03f5e7bf-bb4c-a4bd-c959-1ad3c754f130@redhat.com> Message-ID: On 2019-01-18 17:01, Aleksey Shipilev wrote: > Looks good. There are indeed no "write" usages. Thanks! /Claes From lutz.schmidt at sap.com Fri Jan 18 16:05:36 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Fri, 18 Jan 2019 16:05:36 +0000 Subject: RFR(S, tedious): 8217250: Optimize CodeHeap Analytics In-Reply-To: <52136751-929b-4976-477d-93282ce0a0d7@oracle.com> References: <2d7b7963-61be-95b8-017b-956f2752c8f3@oracle.com> <9AD26B9E-015A-4BD4-A44F-12DDE2793ED0@sap.com> <4e61a8a5-3c6e-3c4f-0a2c-68d4b8bc2f9f@oracle.com> <52136751-929b-4976-477d-93282ce0a0d7@oracle.com> Message-ID: Thank you, Tobias! As this enhancement will not make it into jdk12, I'll rebase it to jdk/jdk. I expect no conflicts and assume I can then push without further webrev/review. Thanks, Lutz ?On 18.01.19, 10:49, "Tobias Hartmann" wrote: Hi Lutz, looks good to me too. Best regards, Tobias On 17.01.19 19:39, Vladimir Kozlov wrote: > Looks good > > Thanks, > Vladimir > > On 1/17/19 7:47 AM, Schmidt, Lutz wrote: >> Hi Vladimir & all, >> there is a new webrev available: http://cr.openjdk.java.net/~lucy/webrevs/8217250.01/ >> What's new (in addition to some comments) is the macro >> >> // Flush the buffer contents if the remaining capacity is less >> // than the calculated threshold (256 bytes + capacity/16) >> // That should suffice for all reasonably sized output lines. >> #define BUFFEREDSTREAM_FLUSH_AUTO(_termString) \ >> BUFFEREDSTREAM_FLUSH_IF(_termString, 256+(_capacity>>4)) >> >> It replaced the previous BUFFEREDSTREAM_FLUSH_IF("string", 512) occurrences. >> Regards, >> Lutz >> >> On 16.01.19, 22:53, "Vladimir Kozlov" wrote: >> >> On 1/16/19 12:37 PM, Schmidt, Lutz wrote: >> > Hi Vladimir, >> > >> > thanks a lot for looking at this so quickly. >> > >> > Sure, I could declare a specialized "BUFFEREDSTREAM_FLUSH_512" for this. The "512" >> originated from the thought "its large enough for a well-behaved line and small enough to save >> some flushes". >> > >> > I was also thinking about a "BUFFEREDSTREAM_FLUSH_AUTO", where the spare space is derived >> from the buffer capacity, maybe something like 10 percent of the capacity, 256 bytes minimum. I >> wasn't sure if that could be categorized as over-engineered. >> Yes, I think BUFFEREDSTREAM_FLUSH_AUTO is better than fixed size. >> Vladimir >> > >> > Your thoughts? >> > >> > Thanks, >> > Lutz >> > >> > On 16.01.19, 19:10, "hotspot-compiler-dev on behalf of Vladimir Kozlov" >> wrote: >> > >> > Hi Lutz, >> > >> > I see that you have only one usage in all cases for: >> > BUFFEREDSTREAM_FLUSH_IF("", 512) >> > >> > Can you simple declare simplified macro for this? >> > >> > Otherwise looks good. >> > >> > Thanks, >> > Vladimir >> > >> > On 1/16/19 6:52 AM, Schmidt, Lutz wrote: >> > > Dear all, >> > > >> > > may I please have reviews for this (semantically) small change. Its purpose is to >> reduce the bufferedStream buffer flushes while printing CodeHeap Analytics. >> > > >> > > Bug: https://bugs.openjdk.java.net/browse/JDK-8217250 >> > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8217250.00/ >> > > >> > > Thank you! >> > > Lutz >> > > >> > > >> > >> > >> From martin.doerr at sap.com Fri Jan 18 16:07:53 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 18 Jan 2019 16:07:53 +0000 Subject: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace In-Reply-To: References: <2d4d1747-a83d-5f65-eea3-d982969ae4fd@linux.vnet.ibm.com> Message-ID: Hi Gustavo, hotspot part looks good, too. Best regards, Martin From: Roger Riggs Sent: Freitag, 18. Januar 2019 16:35 To: Gustavo Romero ; hotspot-compiler-dev at openjdk.java.net; Lindenmaier, Goetz ; Doerr, Martin ; vladimir.kozlov at oracle.com Cc: Michihiro Horie Subject: Re: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace Looks good for the jdk files. Regards, Roger On 01/18/2019 10:07 AM, Gustavo Romero wrote: Hi, Could the following backport to 11u be reviewed, please? Bug : https://bugs.openjdk.java.net/browse/JDK-8213754 Change : http://hg.openjdk.java.net/jdk/jdk/rev/7384e00d5860 Backport: http://cr.openjdk.java.net/~gromero/8213754_jdk11u/v1/ It adds 4 intrinsics that use instructions introduced by POWER9 in order to speed up methods isDigit, isLowerCase, isUpperCase, and isWhitespace. The change is mostly PPC64-only but it does touch shared code, for instance, in order to adapt the methods in question to be properly intrinsified. It also needs an additional change [0], since one Graal test has to be adapted (a separated RFR to backport [0] was sent to [1]). The change applies almost cleanly: only a small tweak is necessary because the hunk for ppc.ad file relies on some absent text in the 11u code around the change to be applied. That absent text is related to the Superword feature (a non-related feature), which is not backported yet to 11u. This backport was tested on POWER8 and POWER9 and no regressions were observed. This backport was also tested on x86_64 with ./test/hotspot/jtreg/compiler/{c1,c2,intrinsics} plus ./test/hotspot/jtreg/compiler/graalunit (with Graal compiler enabled) with change 8215317 [0] applied and no regressions were observed too. Thank you. Best regards, Gustavo [0] http://cr.openjdk.java.net/~gromero/8215317_jdk11u/v1/ [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032266.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From claes.redestad at oracle.com Fri Jan 18 16:15:04 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Fri, 18 Jan 2019 17:15:04 +0100 Subject: RFR (trivial): 8217388: Remove develop flag ProfilerPCTickThreshold Message-ID: Hi, this flag does not spark joy. Bug: https://bugs.openjdk.java.net/browse/JDK-8217388 Patch: diff -r 0a48b128e3d4 src/hotspot/share/runtime/globals.hpp --- a/src/hotspot/share/runtime/globals.hpp Fri Jan 18 16:49:35 2019 +0100 +++ b/src/hotspot/share/runtime/globals.hpp Fri Jan 18 17:07:38 2019 +0100 @@ -1670,9 +1670,6 @@ develop(intx, DontYieldALotInterval, 10, \ "Interval between which yields will be dropped (milliseconds)") \ \ - develop(intx, ProfilerPCTickThreshold, 15, \ - "Number of ticks in a PC buckets to be a hotspot") \ - \ notproduct(intx, DeoptimizeALotInterval, 5, \ "Number of exits until DeoptimizeALot kicks in") \ \ /Claes From shade at redhat.com Fri Jan 18 16:11:03 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 18 Jan 2019 17:11:03 +0100 Subject: RFR (trivial): 8217388: Remove develop flag ProfilerPCTickThreshold In-Reply-To: References: Message-ID: <181d4585-9d4f-c5ec-8cac-7cf44636fea4@redhat.com> On 1/18/19 5:15 PM, Claes Redestad wrote: > this flag does not spark joy. Which means "there are no uses anywhere at all". > Bug: https://bugs.openjdk.java.net/browse/JDK-8217388 > Patch: > diff -r 0a48b128e3d4 src/hotspot/share/runtime/globals.hpp > --- a/src/hotspot/share/runtime/globals.hpp??? Fri Jan 18 16:49:35 2019 +0100 > +++ b/src/hotspot/share/runtime/globals.hpp??? Fri Jan 18 17:07:38 2019 +0100 > @@ -1670,9 +1670,6 @@ > ?? develop(intx, DontYieldALotInterval,??? 10, ???? \ > ?????????? "Interval between which yields will be dropped (milliseconds)")?? \ > > ???? \ > -? develop(intx, ProfilerPCTickThreshold,??? 15, ??? \ > -????????? "Number of ticks in a PC buckets to be a hotspot") ??? \ > - ??? \ > ?? notproduct(intx, DeoptimizeALotInterval,???? 5, ???? \ > ?????????? "Number of exits until DeoptimizeALot kicks in") ???? \ > > ???? \ Looks good to me. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From claes.redestad at oracle.com Fri Jan 18 16:21:47 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Fri, 18 Jan 2019 17:21:47 +0100 Subject: RFR (trivial): 8217388: Remove develop flag ProfilerPCTickThreshold In-Reply-To: <181d4585-9d4f-c5ec-8cac-7cf44636fea4@redhat.com> References: <181d4585-9d4f-c5ec-8cac-7cf44636fea4@redhat.com> Message-ID: <4b078f61-39d5-42d6-b8a1-1627b0b8608a@oracle.com> On 2019-01-18 17:11, Aleksey Shipilev wrote: > Looks good to me. Thanks! /Claes From derekw at marvell.com Fri Jan 18 17:29:02 2019 From: derekw at marvell.com (Derek White) Date: Fri, 18 Jan 2019 17:29:02 +0000 Subject: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered Message-ID: > -----Original Message----- > From: aarch64-port-dev On > Behalf Of Andrew Haley > Sent: Friday, January 18, 2019 4:37 AM > To: Nick Gasson (Arm Technology China) ; hotspot- > compiler-dev at openjdk.java.net compiler dev at openjdk.java.net> > Cc: nd ; aarch64-port-dev at openjdk.java.net > Subject: [EXT] Re: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive > stack locking optimisation not triggered > ... > The patch looks good. However, I don't understand why we aren't using > MacroAssembler::cmpxchgptr here. It looks like we should be, and you'd end > up with a less complex result. Uh oh ?? The original code used cmpxchgptr, but it introduced too many unnecessary branches. So you or Ed changed it to this code, with a (7-8 line) comment "Formerly: __ cmpxchgptr" etc, etc. I thought that comment didn't add much for all that bulk so I asked Nick to rip the comment out! The function now fits on one screen (of sufficient size) though. Getting cmpxchgptr to work without the extra branches would be a better solution if someone has any thoughts in that direction. - Derek From derekw at marvell.com Fri Jan 18 18:14:18 2019 From: derekw at marvell.com (Derek White) Date: Fri, 18 Jan 2019 18:14:18 +0000 Subject: 8217368: AArch64: C2 recursive stack locking optimisation not triggered In-Reply-To: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> References: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> Message-ID: Hi Nick, Your changes look good to me. Once again some cleanup suggestions to pre-existing code: Line 3420: "// Handle existing monitor" -> "// Check for existing monitor" Line 3471: "// Handle existing monitor." Move to line 3473. Lines 3437, 3445, 3468, 3485, 3493: Add comment to lines: "// sets result" This set contains actual code changes, but should be clearer code: Lines 3483, 3485: "disp_hdr" -> "zr" Line 3493: cmp(disp_hdr, rscratch1) -> cmp(rscratch1, zr) Note that having the "sets result" comment here is important, because it's so tempting to merge CMP+BNE -> CBNZ. But that doesn't set the condition flags. Line 3480: delete mov. Thanks! - Derek > -----Original Message----- > From: aarch64-port-dev On > Behalf Of Nick Gasson (Arm Technology China) > Sent: Friday, January 18, 2019 3:40 AM > To: hotspot-compiler-dev at openjdk.java.net compiler dev at openjdk.java.net> > Cc: nd ; aarch64-port-dev at openjdk.java.net > Subject: [EXT] [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack > locking optimisation not triggered > > External Email > > ---------------------------------------------------------------------- > Hi, > > While I was cleaning up the patch for 8216350 I noticed an issue in the > implementation of recursive locking in aarch64_enc_fast_lock: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8217368 > Webrev: http://cr.openjdk.java.net/~ngasson/8217368/webrev.0/ > > First we load the markOop of the object we want to lock and OR it with > markOopDesc::unlocked_value (1). Then we do a CAS to exchange the > address of the box on our thread's stack with the object's header word iff it's > equal to the (markOop | 1) we just computed. If this fails, then we should > check for a recursive lock by comparing > > (~(page size - 1) | 3) & (markOop - SP) == 0 > > Where "markOop" is the current object header word loaded by the failed > CAS. This checks that the lock bits are zero (locked) and the stack address of > the displaced header is within one page of the current SP. > But on AArch64 we actually do this: > > (~(page size - 1) | 3) & ((old markOop | 1) - SP) == 0 > > Where "old markOop | 1" is the compare-to value used for the CAS. This is > always false as the result has at least bit #0 set. This only affects C2, the > C1_MacroAssembler version has the correct test. > > The diff looks big but all it does is swap the usage of registers `tmp' > and `disp_hdr' in the first section so the markOop loaded by the CAS ends up > in disp_hdr and tmp holds the (markOop | 1) compare-to value. > > Ran jtreg, plus jcstress with -XX:+UseLSE and -XX:-UseLSE. Also added > another microbenchmark to > micro/org/openjdk/bench/vm/lang/LockUnlock.java as I couldn't find an > existing JMH case that triggered this. > > Without patch: > > Result > "org.openjdk.bench.vm.lang.LockUnlock.testRecursiveSynchronizationNoBia > s": > 510.781 ?(99.9%) 1.196 ns/op [Average] > (min, avg, max) = (508.769, 510.781, 513.854), stdev = 1.597 > CI (99.9%): [509.585, 511.977] (assumes normal distribution) > > With patch: > > Result > "org.openjdk.bench.vm.lang.LockUnlock.testRecursiveSynchronizationNoBia > s": > > 197.038 ?(99.9%) 0.096 ns/op [Average] > (min, avg, max) = (196.886, 197.038, 197.296), stdev = 0.128 > CI (99.9%): [196.942, 197.134] (assumes normal distribution) > > Two other minor things: > > * Does anyone know what the comment "// Load Compare Value application > register." means? It's present in the PPC and S390 ports too. > > * The x86 port #ifdef LP64 uses "7 - os::vm_page_size()" as the mask in the > recursive lock test. I think the "7" here is markOopDesc::biased_lock_mask > and is presumably there to prevent a silent mutual exclusion failure if a > markOop with the bias locking bits set ends up the fast_lock path (although > this should never happen). > Should we change markOopDesc::lock_mask_in_place to > markOopDesc::biased_lock_mask_in_place in the AArch64 port too? > > Thanks, > Nick From aph at redhat.com Fri Jan 18 18:15:37 2019 From: aph at redhat.com (Andrew Haley) Date: Fri, 18 Jan 2019 18:15:37 +0000 Subject: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered In-Reply-To: References: Message-ID: <9e7eee2c-7b8d-49b1-d1e1-897346e9b1b8@redhat.com> On 1/18/19 5:29 PM, Derek White wrote: > The original code used cmpxchgptr, but it introduced too many > unnecessary branches. So you Me, I think. > or Ed changed it to this code, with a (7-8 line) comment "Formerly: > __ cmpxchgptr" etc, etc. I thought that comment didn't add much for > all that bulk so I asked Nick to rip the comment out! > > The function now fits on one screen (of sufficient size) though. > > Getting cmpxchgptr to work without the extra branches would be a > better solution if someone has any thoughts in that direction. There aren't any extra branches if you use MacroAssembler::cmpxchg. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From gromero at linux.vnet.ibm.com Fri Jan 18 18:16:05 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Fri, 18 Jan 2019 16:16:05 -0200 Subject: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace In-Reply-To: References: <2d4d1747-a83d-5f65-eea3-d982969ae4fd@linux.vnet.ibm.com> Message-ID: Hi Roger and Martin, Thanks a lot for the quick Reviews. I'll wait the Review for 8215317 and then request the approval to push for both 8215317 and this change. Goetz will kindly sponsor both then. Thank you. Best regards, Gustavo On 01/18/2019 02:07 PM, Doerr, Martin wrote: > Hi Gustavo, > > hotspot part looks good, too. > > Best regards, > > Martin > > *From:*Roger Riggs > *Sent:* Freitag, 18. Januar 2019 16:35 > *To:* Gustavo Romero ; hotspot-compiler-dev at openjdk.java.net; Lindenmaier, Goetz ; Doerr, Martin ; vladimir.kozlov at oracle.com > *Cc:* Michihiro Horie > *Subject:* Re: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace > > Looks good for the jdk files. > > Regards, Roger > > On 01/18/2019 10:07 AM, Gustavo Romero wrote: > > Hi, > > Could the following backport to 11u be reviewed, please? > > Bug???? : https://bugs.openjdk.java.net/browse/JDK-8213754 > Change? : http://hg.openjdk.java.net/jdk/jdk/rev/7384e00d5860 > Backport: http://cr.openjdk.java.net/~gromero/8213754_jdk11u/v1/ > > It adds 4 intrinsics that use instructions introduced by POWER9 in order to > speed up methods isDigit, isLowerCase, isUpperCase, and isWhitespace. > > The change is mostly PPC64-only but it does touch shared code, for > instance, in order to adapt the methods in question to be properly > intrinsified. It also needs an additional change [0], since one Graal > test has to be adapted (a separated RFR to backport [0] was sent to [1]). > > The change applies almost cleanly: only a small tweak is necessary because > the hunk for ppc.ad file relies on some absent text in the 11u code around > the change to be applied. That absent text is related to the Superword > feature (a non-related feature), which is not backported yet to 11u. > > This backport was tested on POWER8 and POWER9 and no regressions were > observed. > > This backport was also tested on x86_64 with > ./test/hotspot/jtreg/compiler/{c1,c2,intrinsics} plus > ./test/hotspot/jtreg/compiler/graalunit (with Graal compiler enabled) with > change 8215317 [0] applied and no regressions were observed too. > > Thank you. > > Best regards, > Gustavo > > [0] http://cr.openjdk.java.net/~gromero/8215317_jdk11u/v1/ > [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032266.html > From vladimir.kozlov at oracle.com Fri Jan 18 20:26:19 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 18 Jan 2019 12:26:19 -0800 Subject: [11u backport] RFR(S): 8215317: [GRAAL] unit test CheckGraalIntrinsics failed after 8213754 In-Reply-To: <2ed804b4-85d5-feb7-edab-85d6dee66c74@linux.vnet.ibm.com> References: <2ed804b4-85d5-feb7-edab-85d6dee66c74@linux.vnet.ibm.com> Message-ID: <1cb884f8-34c0-638b-768a-fe5eebd89c49@oracle.com> Looks good. Thanks, Vladimir On 1/18/19 6:57 AM, Gustavo Romero wrote: > Hi, > > Could the following backport to 11u be reviewed, please? > > Bug???? : https://bugs.openjdk.java.net/browse/JDK-8215317 > Change? : http://hg.openjdk.java.net/jdk/jdk/rev/108a161aed93 > Backport: http://cr.openjdk.java.net/~gromero/8215317_jdk11u/v1/ > > It adds 4 intrinsics to the Graal test CheckGraalIntrinsics.java list so > JDK 11u becomes aware of them. Otherwise that test will break once change > 8213754 [0] lands 11u (which will effectively add the 4 intrinsics to > PPC64/Hotspot and adapt the correlated methods to be intrinsified). > > The backport changed the inclusion of the intrinsics for JDK 11 or higher, > instead for JDK 12 or higher (original patch). > > This backport was tested on x86_64 with > ./test/hotspot/jtreg/compiler/{c1,c2,intrinsics} plus > ./test/hotspot/jtreg/compiler/graalunit (with Graal compiler enabled) > and no regressions were observed too. > > Thank you. > > Best regards, > Gustavo > > [0] https://bugs.openjdk.java.net/browse/JDK-8213754 > From andrewluotechnologies at outlook.com Fri Jan 18 22:16:51 2019 From: andrewluotechnologies at outlook.com (Andrew Luo) Date: Fri, 18 Jan 2019 22:16:51 +0000 Subject: Enhancing jaotc to automatically find VS2017 linker Message-ID: Hi, Has there been any plans to enhance jaotc to support automatically finding the link.exe in VS2017? If not, I am interested in contributing some work to support this. I see that in Linker.java (src/jdk.aot/share/classes/jdk.tools.jaotc/src/jdk/tools/jaotc/Linker.java) we find link.exe using the environment variables VS...COMNTOOLS, but since in VS2017 and forward, this is not defined, it seems another approach is necessary. Microsoft suggests that you use vswhere (https://github.com/Microsoft/vswhere, BSD licensed, included with Visual Studio 2017 15.2 and forward) or their COM API to find the latest VS2017 toolset. Anyways, if everyone agrees we should add VS2017 support, there are a few ways to do this (in order of simplest/easiest to most complex): 1. Check that vswhere exists on the system, if it does, call vswhere (out of process - not sure this is acceptable...) and use that to find the VS2017 link.exe 2. Ship vswhere with the JDK and call it out of process 3. Statically link a copy of vswhere (BSD licensed - is this okay?) into our code and add a JNI stub to call it 4. Call the COM API in a JNI function to get the latest version of VS2017 Personally I prefer (1), but if out-of-process isn't acceptable I'm fine with doing (4) or (3). Let me know if you have any comments/feedback on this proposal. Thanks, -Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug.simon at oracle.com Fri Jan 18 22:20:06 2019 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 18 Jan 2019 23:20:06 +0100 Subject: [12] RFR 8215375: [Graal] jck:vm/jvmti/Exception/excp001/excp00101 fails in Graal as JIT mode and -Xcomp mode Message-ID: <66BBADCE-3072-414F-AA08-3B19D5BC9B55@oracle.com> Please review this fix that makes Graal compiled code post a JVMTI event when throwing an exception. The code to post the event is only compiled in if the relevant JVMTI capabilities are enabled at compile time. The event posting code performs a dynamic check to see if the current thread is interested in exception events before posting an event. Testing: hs-tier6-graal https://bugs.openjdk.java.net/browse/JDK-8215375 http://cr.openjdk.java.net/~dnsimon/8215375 -Doug From vladimir.kozlov at oracle.com Fri Jan 18 22:54:11 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 18 Jan 2019 14:54:11 -0800 Subject: [12] RFR 8215375: [Graal] jck:vm/jvmti/Exception/excp001/excp00101 fails in Graal as JIT mode and -Xcomp mode In-Reply-To: <66BBADCE-3072-414F-AA08-3B19D5BC9B55@oracle.com> References: <66BBADCE-3072-414F-AA08-3B19D5BC9B55@oracle.com> Message-ID: Seems fine. Thanks, Vladimir On 1/18/19 2:20 PM, Doug Simon wrote: > Please review this fix that makes Graal compiled code post a JVMTI event when throwing an exception. > The code to post the event is only compiled in if the relevant JVMTI capabilities are enabled at compile time. The event posting code performs a dynamic check to see if the current thread is interested in exception events before posting an event. > > Testing: hs-tier6-graal > > https://bugs.openjdk.java.net/browse/JDK-8215375 > http://cr.openjdk.java.net/~dnsimon/8215375 > > -Doug > From dean.long at oracle.com Fri Jan 18 23:53:29 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 18 Jan 2019 15:53:29 -0800 Subject: 12 RFR(XXS) 8217394: Remove org.graalvm.compiler.debug.test.TimerKeyTest from problem list Message-ID: https://bugs.openjdk.java.net/browse/JDK-8217394 http://cr.openjdk.java.net/~dlong/8217394/webrev/ This should have been included with JDK-8210777, but it was missed. Trivial? dl From dean.long at oracle.com Sat Jan 19 00:10:38 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 18 Jan 2019 16:10:38 -0800 Subject: [12] RFR 8215375: [Graal] jck:vm/jvmti/Exception/excp001/excp00101 fails in Graal as JIT mode and -Xcomp mode In-Reply-To: References: <66BBADCE-3072-414F-AA08-3B19D5BC9B55@oracle.com> Message-ID: <50cdbe1c-a966-e2b5-a3ba-4391838a26fb@oracle.com> Looks good. dl On 1/18/19 2:54 PM, Vladimir Kozlov wrote: > Seems fine. > > Thanks, > Vladimir > > On 1/18/19 2:20 PM, Doug Simon wrote: >> Please review this fix that makes Graal compiled code post a JVMTI >> event when throwing an exception. >> The code to post the event is only compiled in if the relevant JVMTI >> capabilities are enabled at compile time. The event posting code >> performs a dynamic check to see if the current thread is interested >> in exception events before posting an event. >> >> Testing: hs-tier6-graal >> >> https://bugs.openjdk.java.net/browse/JDK-8215375 >> http://cr.openjdk.java.net/~dnsimon/8215375 >> >> -Doug >> From vladimir.kozlov at oracle.com Sat Jan 19 00:20:22 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 18 Jan 2019 16:20:22 -0800 Subject: 12 RFR(XXS) 8217394: Remove org.graalvm.compiler.debug.test.TimerKeyTest from problem list In-Reply-To: References: Message-ID: <7f3c685a-7805-6eff-0ee1-20184e0b54bb@oracle.com> Good. Trivial. Vladimir On 1/18/19 3:53 PM, dean.long at oracle.com wrote: > https://bugs.openjdk.java.net/browse/JDK-8217394 > http://cr.openjdk.java.net/~dlong/8217394/webrev/ > > This should have been included with JDK-8210777, but it was missed. Trivial? > > dl From dean.long at oracle.com Sat Jan 19 00:23:03 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 18 Jan 2019 16:23:03 -0800 Subject: 12 RFR(XXS) 8217394: Remove org.graalvm.compiler.debug.test.TimerKeyTest from problem list In-Reply-To: <7f3c685a-7805-6eff-0ee1-20184e0b54bb@oracle.com> References: <7f3c685a-7805-6eff-0ee1-20184e0b54bb@oracle.com> Message-ID: Thanks Vladimir. dl On 1/18/19 4:20 PM, Vladimir Kozlov wrote: > Good. Trivial. > > Vladimir > > On 1/18/19 3:53 PM, dean.long at oracle.com wrote: >> https://bugs.openjdk.java.net/browse/JDK-8217394 >> http://cr.openjdk.java.net/~dlong/8217394/webrev/ >> >> This should have been included with JDK-8210777, but it was missed. >> Trivial? >> >> dl From xxinliu at amazon.com Sat Jan 19 00:39:47 2019 From: xxinliu at amazon.com (Liu, Xin) Date: Sat, 19 Jan 2019 00:39:47 +0000 Subject: Why does call_site_target keep changing for a Nashorn method? In-Reply-To: <186d49e7-daa9-a0db-b0c6-1b9d4ff2adda@oracle.com> References: <616C8E42-4B18-405B-B28A-C9F062EC9B6C@amazon.com> <186d49e7-daa9-a0db-b0c6-1b9d4ff2adda@oracle.com> Message-ID: <837C4B07-9A3F-4459-A625-12F82C9E604F@amazon.com> Hi, Vladimir, Thank you for the response. After reading your email and associated RFEs, now I got the background story. I understand the design decision in hotspot. In my case, compiler thread crowds out the app thread because we run application in docker with 1 CPU. Is it good idea that we decay the invocation counts of the methods if they fail due to 'call_site_target value change?' Thanks, --lx ?On 1/17/19, 2:36 PM, "Vladimir Ivanov" wrote: C1/C2 optimistically inline through CallSite instances even if those are mutable (MutableCallSite/VolatileCallSite). It requires a nmethod dependency and once CallSite target changes, all dependent nmethods should be invalidated. If such change happens during compilation, nmethod installation fails. That's exactly what you observe: the dependency is recorded during inlining, but failed verification during installation. Regarding the observed behavior, it is well-known [1] [2] and was a deliberate choice. As JDK-7087838 [1] states: "The consensus among language runtime implementors is that they want control over switch points (and thus call sites) and so it's their responsibility to handle extensive invalidation of such." So, such pathological behavior is treated as a bug in user code (Nashorn in this particular case). There's an RFE filed [3] to consider alternative options for unstable calls. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-7087838 [2] https://bugs.openjdk.java.net/browse/JDK-7177745 [3] https://bugs.openjdk.java.net/browse/JDK-8147550 On 16/01/2019 14:04, Liu, Xin wrote: > In one of our applications, C1/C2 keeps compiling a Javascript method > generated by Nashorn but the code fails a dependency check right before > installing in the code cache. This is with JDK tip. > > It can?t pass ?Dependencies::check_call_site_target_value?. > > [C2 Parsing] > > > > > > > > > > > > > > > > > > > > > > > > [Validating compilation dependencies] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > witness='jdk/nashorn/internal/runtime/linker/LinkerCallSite' > stamp='1113.578'/> > > It?s related to the GWT methodHandle. The 2 mismatched methodhandles > are very similar except for argL3, which is an int[2]. > > Even though arg0-2 are not identical objects, their contents are same. > > (gdb)call java_lang_invoke_CallSite::target(call_site)->print() > > java.lang.invoke.BoundMethodHandle$Species_LLLL > > {0x00000000f586ca98}- > klass:'java/lang/invoke/BoundMethodHandle$Species_LLLL' > > - ---- fields(total size 6 words): > > -'customizationCount''B'@12 0 > > - private final'type''Ljava/lang/invoke/MethodType;'@16 > a'java/lang/invoke/MethodType'{0x00000000e21e2878}=(Ljava/lang/Object;Ljdk/nashorn/internal/runtime/Undefined;Ljava/lang/Object;)Ljava/lang/Object;(e21e2878) > > - final'form''Ljava/lang/invoke/LambdaForm;'@20 > a'java/lang/invoke/LambdaForm'{0x00000000e1e4a670}=>a'java/lang/invoke/MemberName'{0x00000000e1e4a938}={method}{0x00007fffa512cb68}'guard''(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;'in'java/lang/invoke/LambdaForm$MH'(e1e4a670) > > -'asTypeCache''Ljava/lang/invoke/MethodHandle;'@24 NULL(0) > > - final'argL0''Ljava/lang/Object;'@28 > a'java/lang/invoke/BoundMethodHandle$Species_LL'{0x00000000f586c9e8}(f586c9e8) > > - final'argL1''Ljava/lang/Object;'@32 > a'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f586ca28}(f586ca28) > > - final'argL2''Ljava/lang/Object;'@36 > a'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f586ca60}(f586ca60) > > - final'argL3''Ljava/lang/Object;'@40 [I{0x00000000f586ca10}(f586ca10) > > (gdb)call method_handle->print() > > java.lang.invoke.BoundMethodHandle$Species_LLLL > > {0x00000000f6b18500}- > klass:'java/lang/invoke/BoundMethodHandle$Species_LLLL' > > - ---- fields(total size 6 words): > > -'customizationCount''B'@12 0 > > - private final'type''Ljava/lang/invoke/MethodType;'@16 > a'java/lang/invoke/MethodType'{0x00000000e21e2878}=(Ljava/lang/Object;Ljdk/nashorn/internal/runtime/Undefined;Ljava/lang/Object;)Ljava/lang/Object;(e21e2878) > > - final'form''Ljava/lang/invoke/LambdaForm;'@20 > a'java/lang/invoke/LambdaForm'{0x00000000e1e4a670}=>a'java/lang/invoke/MemberName'{0x00000000e1e4a938}={method}{0x00007fffa512cb68}'guard''(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;'in'java/lang/invoke/LambdaForm$MH'(e1e4a670) > > -'asTypeCache''Ljava/lang/invoke/MethodHandle;'@24 NULL(0) > > - final'argL0''Ljava/lang/Object;'@28 > a'java/lang/invoke/BoundMethodHandle$Species_LL'{0x00000000f6b18450}(f6b18450) > > - final'argL1''Ljava/lang/Object;'@32 > a'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f6b18490}(f6b18490) > > - final'argL2''Ljava/lang/Object;'@36 > a'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f6b184c8}(f6b184c8) > > - final'argL3''Ljava/lang/Object;'@40 [I{0x00000000f6b18478}(f6b18478) > > My guess is argL3 is counters in Java.lang.invoke.MethodHandleImpl. > > // Intrinsified by C2. Counters are used during parsing to calculate > branch frequencies. > @LambdaForm.Hidden > @jdk.internal.HotSpotIntrinsicCandidate > static > boolean profileBoolean(boolean result, int[] counters) { > // Profile is int[2] where [0] and [1] correspond to false and true > occurrences respectively. > int idx = result ? 1 : 0; > try { > counters[idx] = Math./addExact/(counters[idx], 1); > } catch (ArithmeticException e) { > // Avoid continuous overflow by halving the problematic count. > counters[idx] = counters[idx] / 2; > } > return result; > } > > I am still struggling to understand the source code in > java.lang.invoke.*. Could anybody enlighten me why the target of the > callsite changes every time here? it is relative to this profiling thing? > > In validation log, it has validated the dep ?dependency > type='call_site_target_value' x0='1556' x='1866'? above. Why it can?t > pass it after then? My guess is one MH object has been changed by > another Java thread. > > One interesting fact that compiler thread can?t pass 22^th dep. My > tuition is it goes over an unknown threshold. > > The 2nd question is about ciEnv:: validate_compile_task_dependencies. > Why does failure of call_site_target_value_changed not count as a deopt? > > The flag _inc_decompile_count_on_failure =false stops MDO to mark this > method ?not_compileable?. C2 doesn?t set the flag, so C2 ends up > compiling it over and over, which makes C2 a cpu hog. Here?s the code in > validate_compile_task_dependencies > > bool counter_changed = system_dictionary_modification_counter_changed(); > > Dependencies::DepType result = > dependencies()->validate_dependencies(_task, counter_changed); > > if (result != Dependencies::end_marker) { > > if (result == Dependencies::call_site_target_value) { > > _inc_decompile_count_on_failure = false; > > record_failure("call site target change"); > > Maybe the right thing to do is to count this as a deopt and change the > deopt limit computation to take into account the size of the method in > nodes, just as done for abandoning compilation if the graph is too big. > > Thanks, > > --lx > From vladimir.x.ivanov at oracle.com Sat Jan 19 01:05:44 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 18 Jan 2019 17:05:44 -0800 Subject: Why does call_site_target keep changing for a Nashorn method? In-Reply-To: <837C4B07-9A3F-4459-A625-12F82C9E604F@amazon.com> References: <616C8E42-4B18-405B-B28A-C9F062EC9B6C@amazon.com> <186d49e7-daa9-a0db-b0c6-1b9d4ff2adda@oracle.com> <837C4B07-9A3F-4459-A625-12F82C9E604F@amazon.com> Message-ID: <30a97290-71c5-c445-cfaf-f8eda14fdfba@oracle.com> > Thank you for the response. After reading your email and associated RFEs, now I got the background story. > I understand the design decision in hotspot. > > In my case, compiler thread crowds out the app thread because we run application in docker with 1 CPU. > Is it good idea that we decay the invocation counts of the methods if they fail due to 'call_site_target value change?' Yes, sounds reasonable. I believe compilation bailed out due to invalidated call_site_target dependency should be treated as if it were a deoptimization with Action_reinterpret, but resetting invocation counts may be too much. So, decaying counters instead sounds reasonable. Also, it's hard to tell what method to act on: problematic CallSite may be located somewhere deep in inline tree, but only root method is known. Best regards, Vladimir Ivanov > ?On 1/17/19, 2:36 PM, "Vladimir Ivanov" wrote: > > C1/C2 optimistically inline through CallSite instances even if those are > mutable (MutableCallSite/VolatileCallSite). It requires a nmethod > dependency and once CallSite target changes, all dependent nmethods > should be invalidated. If such change happens during compilation, > nmethod installation fails. > > That's exactly what you observe: the dependency is recorded during > inlining, but failed verification during installation. > > Regarding the observed behavior, it is well-known [1] [2] and was a > deliberate choice. As JDK-7087838 [1] states: > > "The consensus among language runtime implementors is that they want > control over switch points (and thus call sites) and so it's their > responsibility to handle extensive invalidation of such." > > So, such pathological behavior is treated as a bug in user code (Nashorn > in this particular case). > > There's an RFE filed [3] to consider alternative options for unstable > calls. > > Best regards, > Vladimir Ivanov > > [1] https://bugs.openjdk.java.net/browse/JDK-7087838 > [2] https://bugs.openjdk.java.net/browse/JDK-7177745 > [3] https://bugs.openjdk.java.net/browse/JDK-8147550 > > On 16/01/2019 14:04, Liu, Xin wrote: > > In one of our applications, C1/C2 keeps compiling a Javascript method > > generated by Nashorn but the code fails a dependency check right before > > installing in the code cache. This is with JDK tip. > > > > It can?t pass ?Dependencies::check_call_site_target_value?. > > > > [C2 Parsing] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [Validating compilation dependencies] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > witness='jdk/nashorn/internal/runtime/linker/LinkerCallSite' > > stamp='1113.578'/> > > > > It?s related to the GWT methodHandle. The 2 mismatched methodhandles > > are very similar except for argL3, which is an int[2]. > > > > Even though arg0-2 are not identical objects, their contents are same. > > > > (gdb)call java_lang_invoke_CallSite::target(call_site)->print() > > > > java.lang.invoke.BoundMethodHandle$Species_LLLL > > > > {0x00000000f586ca98}- > > klass:'java/lang/invoke/BoundMethodHandle$Species_LLLL' > > > > - ---- fields(total size 6 words): > > > > -'customizationCount''B'@12 0 > > > > - private final'type''Ljava/lang/invoke/MethodType;'@16 > > a'java/lang/invoke/MethodType'{0x00000000e21e2878}=(Ljava/lang/Object;Ljdk/nashorn/internal/runtime/Undefined;Ljava/lang/Object;)Ljava/lang/Object;(e21e2878) > > > > - final'form''Ljava/lang/invoke/LambdaForm;'@20 > > a'java/lang/invoke/LambdaForm'{0x00000000e1e4a670}=>a'java/lang/invoke/MemberName'{0x00000000e1e4a938}={method}{0x00007fffa512cb68}'guard''(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;'in'java/lang/invoke/LambdaForm$MH'(e1e4a670) > > > > -'asTypeCache''Ljava/lang/invoke/MethodHandle;'@24 NULL(0) > > > > - final'argL0''Ljava/lang/Object;'@28 > > a'java/lang/invoke/BoundMethodHandle$Species_LL'{0x00000000f586c9e8}(f586c9e8) > > > > - final'argL1''Ljava/lang/Object;'@32 > > a'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f586ca28}(f586ca28) > > > > - final'argL2''Ljava/lang/Object;'@36 > > a'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f586ca60}(f586ca60) > > > > - final'argL3''Ljava/lang/Object;'@40 [I{0x00000000f586ca10}(f586ca10) > > > > (gdb)call method_handle->print() > > > > java.lang.invoke.BoundMethodHandle$Species_LLLL > > > > {0x00000000f6b18500}- > > klass:'java/lang/invoke/BoundMethodHandle$Species_LLLL' > > > > - ---- fields(total size 6 words): > > > > -'customizationCount''B'@12 0 > > > > - private final'type''Ljava/lang/invoke/MethodType;'@16 > > a'java/lang/invoke/MethodType'{0x00000000e21e2878}=(Ljava/lang/Object;Ljdk/nashorn/internal/runtime/Undefined;Ljava/lang/Object;)Ljava/lang/Object;(e21e2878) > > > > - final'form''Ljava/lang/invoke/LambdaForm;'@20 > > a'java/lang/invoke/LambdaForm'{0x00000000e1e4a670}=>a'java/lang/invoke/MemberName'{0x00000000e1e4a938}={method}{0x00007fffa512cb68}'guard''(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;'in'java/lang/invoke/LambdaForm$MH'(e1e4a670) > > > > -'asTypeCache''Ljava/lang/invoke/MethodHandle;'@24 NULL(0) > > > > - final'argL0''Ljava/lang/Object;'@28 > > a'java/lang/invoke/BoundMethodHandle$Species_LL'{0x00000000f6b18450}(f6b18450) > > > > - final'argL1''Ljava/lang/Object;'@32 > > a'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f6b18490}(f6b18490) > > > > - final'argL2''Ljava/lang/Object;'@36 > > a'java/lang/invoke/MethodHandleImpl$CountingWrapper'{0x00000000f6b184c8}(f6b184c8) > > > > - final'argL3''Ljava/lang/Object;'@40 [I{0x00000000f6b18478}(f6b18478) > > > > My guess is argL3 is counters in Java.lang.invoke.MethodHandleImpl. > > > > // Intrinsified by C2. Counters are used during parsing to calculate > > branch frequencies. > > @LambdaForm.Hidden > > @jdk.internal.HotSpotIntrinsicCandidate > > static > > boolean profileBoolean(boolean result, int[] counters) { > > // Profile is int[2] where [0] and [1] correspond to false and true > > occurrences respectively. > > int idx = result ? 1 : 0; > > try { > > counters[idx] = Math./addExact/(counters[idx], 1); > > } catch (ArithmeticException e) { > > // Avoid continuous overflow by halving the problematic count. > > counters[idx] = counters[idx] / 2; > > } > > return result; > > } > > > > I am still struggling to understand the source code in > > java.lang.invoke.*. Could anybody enlighten me why the target of the > > callsite changes every time here? it is relative to this profiling thing? > > > > In validation log, it has validated the dep ?dependency > > type='call_site_target_value' x0='1556' x='1866'? above. Why it can?t > > pass it after then? My guess is one MH object has been changed by > > another Java thread. > > > > One interesting fact that compiler thread can?t pass 22^th dep. My > > tuition is it goes over an unknown threshold. > > > > The 2nd question is about ciEnv:: validate_compile_task_dependencies. > > Why does failure of call_site_target_value_changed not count as a deopt? > > > > The flag _inc_decompile_count_on_failure =false stops MDO to mark this > > method ?not_compileable?. C2 doesn?t set the flag, so C2 ends up > > compiling it over and over, which makes C2 a cpu hog. Here?s the code in > > validate_compile_task_dependencies > > > > bool counter_changed = system_dictionary_modification_counter_changed(); > > > > Dependencies::DepType result = > > dependencies()->validate_dependencies(_task, counter_changed); > > > > if (result != Dependencies::end_marker) { > > > > if (result == Dependencies::call_site_target_value) { > > > > _inc_decompile_count_on_failure = false; > > > > record_failure("call site target change"); > > > > Maybe the right thing to do is to count this as a deopt and change the > > deopt limit computation to take into account the size of the method in > > nodes, just as done for abandoning compilation if the graph is too big. > > > > Thanks, > > > > --lx > > > > From bsrbnd at gmail.com Sat Jan 19 13:42:24 2019 From: bsrbnd at gmail.com (B. Blaser) Date: Sat, 19 Jan 2019 14:42:24 +0100 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions In-Reply-To: <2f209ec9-e7f9-8da3-64a2-20ac909b4931@redhat.com> References: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> <81dc5407-4985-883c-83fc-2e1fa2b77e66@redhat.com> <3dd85d2c-f4d8-e360-21a2-68254b3c5e2b@redhat.com> <2f209ec9-e7f9-8da3-64a2-20ac909b4931@redhat.com> Message-ID: On Fri, 18 Jan 2019 at 14:37, Roman Kennke wrote: > > > On 1/17/19 7:51 PM, B. Blaser wrote: > >> Here it is on intel xeon with 5*10e9 iterations: > >> * mov+cmov = 10.94s > >> * cmov = 10.15s > >> > >> Thoughts? > > > > It looks like there's not much of a performance difference, but it might > > help by freeing a register. OTOH, we'd still need to be sure we weren't > > introducing a regression. We'd have to make sure that implicit null checks > > work. > > I'm pretty sure that null-checks work, in general. I used the cmov > instructions in an experiment that I did with Shenandoah barriers of > which I'm pretty sure would have blown up badly if it wouldn't. One > thing I'm not sure of is: does cmov generate a SIGSEGV on a bad address, > even if the condition is not true? I doubt it, because then we couldn't > use this for other types (long, int, etc). > > I'm more worried about the bottom-type issue that is mentioned in the > comment and by Andrew Dinn, and it would be very helpful if anybody > knows about it and could clarify. Failing that we could dig deeper > and/or do extensive testing? I'm definitely not an expert in this area but does ADLC treat this really differently from a single LoadP / mov? http://hg.openjdk.java.net/jdk/jdk/file/683a112e0e1e/src/hotspot/cpu/x86/x86_64.ad#l5349 Bernard From kim.barrett at oracle.com Sun Jan 20 00:37:43 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Sat, 19 Jan 2019 19:37:43 -0500 Subject: RFR: 8217387: Remove dead develop flag CIFireOOMAt In-Reply-To: References: Message-ID: > On Jan 18, 2019, at 11:03 AM, Claes Redestad wrote: > > Hi, > > the develop flag CIFireOOMAt is effectively dead and should be removed. > > Webrev: http://cr.openjdk.java.net/~redestad/8217387/open.00/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8217387 > > Thanks! > > /Claes Looks good. From claes.redestad at oracle.com Sun Jan 20 00:41:58 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Sun, 20 Jan 2019 01:41:58 +0100 Subject: RFR: 8217387: Remove dead develop flag CIFireOOMAt In-Reply-To: References: Message-ID: On 2019-01-20 01:37, Kim Barrett wrote: > Looks good. Thanks, Kim! /Claes From doug.simon at oracle.com Sun Jan 20 13:54:57 2019 From: doug.simon at oracle.com (Doug Simon) Date: Sun, 20 Jan 2019 14:54:57 +0100 Subject: [12] RFR 8215375: [Graal] jck:vm/jvmti/Exception/excp001/excp00101 fails in Graal as JIT mode and -Xcomp mode In-Reply-To: <50cdbe1c-a966-e2b5-a3ba-4391838a26fb@oracle.com> References: <66BBADCE-3072-414F-AA08-3B19D5BC9B55@oracle.com> <50cdbe1c-a966-e2b5-a3ba-4391838a26fb@oracle.com> Message-ID: <27D7FC5E-2D18-43B9-B68C-5ED92860E83B@oracle.com> Thanks Dean and Vladimir for the review. > On 19 Jan 2019, at 01:10, dean.long at oracle.com wrote: > > Looks good. > > dl > > On 1/18/19 2:54 PM, Vladimir Kozlov wrote: >> Seems fine. >> >> Thanks, >> Vladimir >> >> On 1/18/19 2:20 PM, Doug Simon wrote: >>> Please review this fix that makes Graal compiled code post a JVMTI event when throwing an exception. >>> The code to post the event is only compiled in if the relevant JVMTI capabilities are enabled at compile time. The event posting code performs a dynamic check to see if the current thread is interested in exception events before posting an event. >>> >>> Testing: hs-tier6-graal >>> >>> https://bugs.openjdk.java.net/browse/JDK-8215375 >>> http://cr.openjdk.java.net/~dnsimon/8215375 >>> >>> -Doug >>> > From Nick.Gasson at arm.com Mon Jan 21 06:01:01 2019 From: Nick.Gasson at arm.com (Nick Gasson (Arm Technology China)) Date: Mon, 21 Jan 2019 06:01:01 +0000 Subject: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered In-Reply-To: References: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> <4a09e8b7-9990-aa66-0afb-bf4e41cab831@arm.com> Message-ID: <79118967-c5b6-ca5c-7c6b-4adb80a4ed60@arm.com> Hi Andrew, On 18/01/2019 22:56, Andrew Haley wrote: >>> The patch looks good. However, I don't understand why we aren't using >>> MacroAssembler::cmpxchgptr here. It looks like we should be, and you'd >>> end up with a less complex result. >> >> It's not exactly the same though: MacroAssembler::cmpxchgptr adds a "dmb >> ish" to the failure path which I don't think is required here. > > Oh, sorry. I should have said MacroAssembler::cmpxchg, with a > br.eq(cont) afterward. > OK I'll change all three places in aarch64_enc_fast_lock/unlock that do a compare-exchange to use MacroAssembler::cmpxchg. Thanks, Nick From tobias.hartmann at oracle.com Mon Jan 21 07:16:30 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 21 Jan 2019 08:16:30 +0100 Subject: RFR (trivial): 8217388: Remove develop flag ProfilerPCTickThreshold In-Reply-To: References: Message-ID: <4cea098d-687f-bcb1-9eed-632b45bd355c@oracle.com> Hi Claes, looks good to me. Best regards, Tobias On 18.01.19 17:15, Claes Redestad wrote: > Hi, > > this flag does not spark joy. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8217388 > Patch: > diff -r 0a48b128e3d4 src/hotspot/share/runtime/globals.hpp > --- a/src/hotspot/share/runtime/globals.hpp??? Fri Jan 18 16:49:35 2019 +0100 > +++ b/src/hotspot/share/runtime/globals.hpp??? Fri Jan 18 17:07:38 2019 +0100 > @@ -1670,9 +1670,6 @@ > ?? develop(intx, DontYieldALotInterval,??? 10, ???? \ > ?????????? "Interval between which yields will be dropped (milliseconds)")?? \ > > ???? \ > -? develop(intx, ProfilerPCTickThreshold,??? 15, ??? \ > -????????? "Number of ticks in a PC buckets to be a hotspot") ??? \ > - ??? \ > ?? notproduct(intx, DeoptimizeALotInterval,???? 5, ???? \ > ?????????? "Number of exits until DeoptimizeALot kicks in") ???? \ > > ???? \ > > /Claes From tobias.hartmann at oracle.com Mon Jan 21 08:21:24 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 21 Jan 2019 09:21:24 +0100 Subject: [12] RFR(S): 8217230: assert(t == t_no_spec) failure in NodeHash::check_no_speculative_types() Message-ID: Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8217230 http://cr.openjdk.java.net/~thartmann/8217230/webrev.00/ A SafePointNode becomes dead when being cut off from root in Compile::remove_root_to_sfpts_edges() but is not processed by IGVN and therefore remains in the graph. Since it is not reachable by root anymore, it is not processed by Compile::remove_speculative_types and we hit the assert. The problem was introduced by the fix for JDK-8214862 [1] in JDK 12 b27. Thanks, Tobias [1] https://bugs.openjdk.java.net/browse/JDK-8214862 From aph at redhat.com Mon Jan 21 09:10:31 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 21 Jan 2019 09:10:31 +0000 Subject: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered In-Reply-To: <79118967-c5b6-ca5c-7c6b-4adb80a4ed60@arm.com> References: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> <4a09e8b7-9990-aa66-0afb-bf4e41cab831@arm.com> <79118967-c5b6-ca5c-7c6b-4adb80a4ed60@arm.com> Message-ID: On 1/21/19 6:01 AM, Nick Gasson (Arm Technology China) wrote: > OK I'll change all three places in aarch64_enc_fast_lock/unlock that do > a compare-exchange to use MacroAssembler::cmpxchg. If you wish: be aware that if you change anything other than this place there'll be a lot more testing to do, and review will take longer. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From Nick.Gasson at arm.com Mon Jan 21 09:27:47 2019 From: Nick.Gasson at arm.com (Nick Gasson (Arm Technology China)) Date: Mon, 21 Jan 2019 09:27:47 +0000 Subject: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered In-Reply-To: References: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> <4a09e8b7-9990-aa66-0afb-bf4e41cab831@arm.com> <79118967-c5b6-ca5c-7c6b-4adb80a4ed60@arm.com> Message-ID: Hi Andrew, On 21/01/2019 17:10, Andrew Haley wrote: > On 1/21/19 6:01 AM, Nick Gasson (Arm Technology China) wrote: >> OK I'll change all three places in aarch64_enc_fast_lock/unlock that do >> a compare-exchange to use MacroAssembler::cmpxchg. > > If you wish: be aware that if you change anything other than this place there'll > be a lot more testing to do, and review will take longer. > I think it will be confusing for anyone looking at these functions in the future to have one call to cmpxhg and then two copies of essentially the same code inlined a few lines afterwards. IMO we should either change all three for consistency, or stick with the original minimal patch (+ Derek's cleanup suggestions) which should be easier to review. Thanks, Nick From tobias.hartmann at oracle.com Mon Jan 21 09:47:22 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 21 Jan 2019 10:47:22 +0100 Subject: [13] RFR(S): 8217291: Failure of ::realloc() should be handled correctly in adlc/forms.cpp Message-ID: <984d33e8-1aab-6fd5-9f45-64b4b08421f2@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8217291 http://cr.openjdk.java.net/~thartmann/8217291/webrev.00/ Similar to the fix for JDK-8212779 [1], I've introduced a wrapper method for re-allocation that handles failures by printing a message and exiting. Thanks, Tobias [1] http://hg.openjdk.java.net/jdk/jdk/rev/a3aa8d5380d9 From Pengfei.Li at arm.com Mon Jan 21 10:53:47 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Mon, 21 Jan 2019 10:53:47 +0000 Subject: RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics Message-ID: Hi Reviewers, Webrev: http://cr.openjdk.java.net/~pli/rfr/8216259/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8216259 This is a vectorization optimization of AArch64 intrinsic code of Adler-32 checksum. An Adler-32 checksum is obtained by calculating two 16-bit checksums s1 and s2, and then concatenating their bits into a 32-bit integer. Details of the algorithm could be found on Wikipedia at https://en.wikipedia.org/wiki/Adler-32 . In previous Adler-32 intrinsic code written by Edward Nevill, we accumulate the lower and upper halves of the checksum value, s1 and s2, for every 16 bytes in the nmax_loop and by16_loop. In this patch, these accumulation operations are vectorized with NEON instructions in these 2 loops. I tested the correctness of my patch by comparing the checksum results of 5000 byte arrays of 1MB size. Test code and script can be found at [1]. I also tested the performance with and without my patch by a JMH case [2]. The JMH result shows that the performance gets ~2.5x optimized by this. [1] http://cr.openjdk.java.net/~pli/rfr/8216259/Adler32Test.java [2] http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java -- Thanks, Pengfei From goetz.lindenmaier at sap.com Mon Jan 21 10:54:21 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 21 Jan 2019 10:54:21 +0000 Subject: [11u backport] RFR(S): 8215317: [GRAAL] unit test CheckGraalIntrinsics failed after 8213754 In-Reply-To: <2ed804b4-85d5-feb7-edab-85d6dee66c74@linux.vnet.ibm.com> References: <2ed804b4-85d5-feb7-edab-85d6dee66c74@linux.vnet.ibm.com> Message-ID: <0094c5f18c034632bba0123a2bf6cf02@sap.com> Hi, looks good to me. Best regards, Goetz. > -----Original Message----- > From: Gustavo Romero > Sent: Freitag, 18. Januar 2019 15:57 > To: hotspot-compiler-dev at openjdk.java.net; Lindenmaier, Goetz > ; vladimir.kozlov at oracle.com > Subject: [11u backport] RFR(S): 8215317: [GRAAL] unit test > CheckGraalIntrinsics failed after 8213754 > > Hi, > > Could the following backport to 11u be reviewed, please? > > Bug : https://bugs.openjdk.java.net/browse/JDK-8215317 > Change : http://hg.openjdk.java.net/jdk/jdk/rev/108a161aed93 > Backport: http://cr.openjdk.java.net/~gromero/8215317_jdk11u/v1/ > > It adds 4 intrinsics to the Graal test CheckGraalIntrinsics.java list so > JDK 11u becomes aware of them. Otherwise that test will break once change > 8213754 [0] lands 11u (which will effectively add the 4 intrinsics to > PPC64/Hotspot and adapt the correlated methods to be intrinsified). > > The backport changed the inclusion of the intrinsics for JDK 11 or higher, > instead for JDK 12 or higher (original patch). > > This backport was tested on x86_64 with > ./test/hotspot/jtreg/compiler/{c1,c2,intrinsics} plus > ./test/hotspot/jtreg/compiler/graalunit (with Graal compiler enabled) > and no regressions were observed too. > > Thank you. > > Best regards, > Gustavo > > [0] https://bugs.openjdk.java.net/browse/JDK-8213754 From goetz.lindenmaier at sap.com Mon Jan 21 11:10:20 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Mon, 21 Jan 2019 11:10:20 +0000 Subject: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace In-Reply-To: <2d4d1747-a83d-5f65-eea3-d982969ae4fd@linux.vnet.ibm.com> References: <2d4d1747-a83d-5f65-eea3-d982969ae4fd@linux.vnet.ibm.com> Message-ID: <2ac3e91da61b43dcb2d4e45325202264@sap.com> Hi Gustavo, also this change looks good. Best regards, Goetz. > -----Original Message----- > From: Gustavo Romero > Sent: Freitag, 18. Januar 2019 16:07 > To: hotspot-compiler-dev at openjdk.java.net; Lindenmaier, Goetz > ; Doerr, Martin ; > vladimir.kozlov at oracle.com; Roger Riggs > Cc: Michihiro Horie > Subject: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for > isDigit/isLowerCase/isUpperCase/isWhitespace > > Hi, > > Could the following backport to 11u be reviewed, please? > > Bug : https://bugs.openjdk.java.net/browse/JDK-8213754 > Change : http://hg.openjdk.java.net/jdk/jdk/rev/7384e00d5860 > Backport: http://cr.openjdk.java.net/~gromero/8213754_jdk11u/v1/ > > It adds 4 intrinsics that use instructions introduced by POWER9 in order to > speed up methods isDigit, isLowerCase, isUpperCase, and isWhitespace. > > The change is mostly PPC64-only but it does touch shared code, for > instance, in order to adapt the methods in question to be properly > intrinsified. It also needs an additional change [0], since one Graal > test has to be adapted (a separated RFR to backport [0] was sent to [1]). > > The change applies almost cleanly: only a small tweak is necessary because > the hunk for ppc.ad file relies on some absent text in the 11u code around > the change to be applied. That absent text is related to the Superword > feature (a non-related feature), which is not backported yet to 11u. > > This backport was tested on POWER8 and POWER9 and no regressions were > observed. > > This backport was also tested on x86_64 with > ./test/hotspot/jtreg/compiler/{c1,c2,intrinsics} plus > ./test/hotspot/jtreg/compiler/graalunit (with Graal compiler enabled) with > change 8215317 [0] applied and no regressions were observed too. > > Thank you. > > Best regards, > Gustavo > > [0] http://cr.openjdk.java.net/~gromero/8215317_jdk11u/v1/ > [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019- > January/032266.html From gromero at linux.vnet.ibm.com Mon Jan 21 11:39:44 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 21 Jan 2019 09:39:44 -0200 Subject: [11u backport] RFR(S): 8215317: [GRAAL] unit test CheckGraalIntrinsics failed after 8213754 In-Reply-To: <1cb884f8-34c0-638b-768a-fe5eebd89c49@oracle.com> References: <2ed804b4-85d5-feb7-edab-85d6dee66c74@linux.vnet.ibm.com> <1cb884f8-34c0-638b-768a-fe5eebd89c49@oracle.com> Message-ID: On 01/18/2019 06:26 PM, Vladimir Kozlov wrote: > Looks good. Thanks for the review, Vladimir! Regards, Gustavo > Thanks, > Vladimir > > On 1/18/19 6:57 AM, Gustavo Romero wrote: >> Hi, >> >> Could the following backport to 11u be reviewed, please? >> >> Bug???? : https://bugs.openjdk.java.net/browse/JDK-8215317 >> Change? : http://hg.openjdk.java.net/jdk/jdk/rev/108a161aed93 >> Backport: http://cr.openjdk.java.net/~gromero/8215317_jdk11u/v1/ >> >> It adds 4 intrinsics to the Graal test CheckGraalIntrinsics.java list so >> JDK 11u becomes aware of them. Otherwise that test will break once change >> 8213754 [0] lands 11u (which will effectively add the 4 intrinsics to >> PPC64/Hotspot and adapt the correlated methods to be intrinsified). >> >> The backport changed the inclusion of the intrinsics for JDK 11 or higher, >> instead for JDK 12 or higher (original patch). >> >> This backport was tested on x86_64 with >> ./test/hotspot/jtreg/compiler/{c1,c2,intrinsics} plus >> ./test/hotspot/jtreg/compiler/graalunit (with Graal compiler enabled) >> and no regressions were observed too. >> >> Thank you. >> >> Best regards, >> Gustavo >> >> [0] https://bugs.openjdk.java.net/browse/JDK-8213754 >> > From gromero at linux.vnet.ibm.com Mon Jan 21 11:41:23 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 21 Jan 2019 09:41:23 -0200 Subject: [11u backport] RFR(S): 8215317: [GRAAL] unit test CheckGraalIntrinsics failed after 8213754 In-Reply-To: <0094c5f18c034632bba0123a2bf6cf02@sap.com> References: <2ed804b4-85d5-feb7-edab-85d6dee66c74@linux.vnet.ibm.com> <0094c5f18c034632bba0123a2bf6cf02@sap.com> Message-ID: On 01/21/2019 08:54 AM, Lindenmaier, Goetz wrote: > looks good to me. Thank for the review, Goetz! Regards, Gustavo > Best regards, > Goetz. > >> -----Original Message----- >> From: Gustavo Romero >> Sent: Freitag, 18. Januar 2019 15:57 >> To: hotspot-compiler-dev at openjdk.java.net; Lindenmaier, Goetz >> ; vladimir.kozlov at oracle.com >> Subject: [11u backport] RFR(S): 8215317: [GRAAL] unit test >> CheckGraalIntrinsics failed after 8213754 >> >> Hi, >> >> Could the following backport to 11u be reviewed, please? >> >> Bug : https://bugs.openjdk.java.net/browse/JDK-8215317 >> Change : http://hg.openjdk.java.net/jdk/jdk/rev/108a161aed93 >> Backport: http://cr.openjdk.java.net/~gromero/8215317_jdk11u/v1/ >> >> It adds 4 intrinsics to the Graal test CheckGraalIntrinsics.java list so >> JDK 11u becomes aware of them. Otherwise that test will break once change >> 8213754 [0] lands 11u (which will effectively add the 4 intrinsics to >> PPC64/Hotspot and adapt the correlated methods to be intrinsified). >> >> The backport changed the inclusion of the intrinsics for JDK 11 or higher, >> instead for JDK 12 or higher (original patch). >> >> This backport was tested on x86_64 with >> ./test/hotspot/jtreg/compiler/{c1,c2,intrinsics} plus >> ./test/hotspot/jtreg/compiler/graalunit (with Graal compiler enabled) >> and no regressions were observed too. >> >> Thank you. >> >> Best regards, >> Gustavo >> >> [0] https://bugs.openjdk.java.net/browse/JDK-8213754 > From gromero at linux.vnet.ibm.com Mon Jan 21 11:45:44 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 21 Jan 2019 09:45:44 -0200 Subject: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace In-Reply-To: <2ac3e91da61b43dcb2d4e45325202264@sap.com> References: <2d4d1747-a83d-5f65-eea3-d982969ae4fd@linux.vnet.ibm.com> <2ac3e91da61b43dcb2d4e45325202264@sap.com> Message-ID: <8083b8db-c546-29e8-c83a-f06ebd4e624e@linux.vnet.ibm.com> On 01/21/2019 09:10 AM, Lindenmaier, Goetz wrote: > also this change looks good. Thanks for reviewing it, Goetz! I'll ping once the approvals are ok. Thank you. Regards, Gustavo > Best regards, > Goetz. > >> -----Original Message----- >> From: Gustavo Romero >> Sent: Freitag, 18. Januar 2019 16:07 >> To: hotspot-compiler-dev at openjdk.java.net; Lindenmaier, Goetz >> ; Doerr, Martin ; >> vladimir.kozlov at oracle.com; Roger Riggs >> Cc: Michihiro Horie >> Subject: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for >> isDigit/isLowerCase/isUpperCase/isWhitespace >> >> Hi, >> >> Could the following backport to 11u be reviewed, please? >> >> Bug : https://bugs.openjdk.java.net/browse/JDK-8213754 >> Change : http://hg.openjdk.java.net/jdk/jdk/rev/7384e00d5860 >> Backport: http://cr.openjdk.java.net/~gromero/8213754_jdk11u/v1/ >> >> It adds 4 intrinsics that use instructions introduced by POWER9 in order to >> speed up methods isDigit, isLowerCase, isUpperCase, and isWhitespace. >> >> The change is mostly PPC64-only but it does touch shared code, for >> instance, in order to adapt the methods in question to be properly >> intrinsified. It also needs an additional change [0], since one Graal >> test has to be adapted (a separated RFR to backport [0] was sent to [1]). >> >> The change applies almost cleanly: only a small tweak is necessary because >> the hunk for ppc.ad file relies on some absent text in the 11u code around >> the change to be applied. That absent text is related to the Superword >> feature (a non-related feature), which is not backported yet to 11u. >> >> This backport was tested on POWER8 and POWER9 and no regressions were >> observed. >> >> This backport was also tested on x86_64 with >> ./test/hotspot/jtreg/compiler/{c1,c2,intrinsics} plus >> ./test/hotspot/jtreg/compiler/graalunit (with Graal compiler enabled) with >> change 8215317 [0] applied and no regressions were observed too. >> >> Thank you. >> >> Best regards, >> Gustavo >> >> [0] http://cr.openjdk.java.net/~gromero/8215317_jdk11u/v1/ >> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019- >> January/032266.html > From adinn at redhat.com Mon Jan 21 11:55:10 2019 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 21 Jan 2019 11:55:10 +0000 Subject: RFR: 8216392: Enable cmovP_mem and cmovP_memU instructions In-Reply-To: References: <239a5ec9-170d-9d5c-c624-7bd9a6aed699@redhat.com> <81dc5407-4985-883c-83fc-2e1fa2b77e66@redhat.com> <3dd85d2c-f4d8-e360-21a2-68254b3c5e2b@redhat.com> <2f209ec9-e7f9-8da3-64a2-20ac909b4931@redhat.com> Message-ID: <46696e98-6519-58cb-f517-1aca8ea0ebd5@redhat.com> On 19/01/2019 13:42, B. Blaser wrote: > I'm definitely not an expert in this area but does ADLC treat this > really differently from a single LoadP / mov? > > http://hg.openjdk.java.net/jdk/jdk/file/683a112e0e1e/src/hotspot/cpu/x86/x86_64.ad#l5349 You are looking in the wrong place to answer that question. The place where handling of cases might differ is not in the rule file but in the implementation of the bottom_type method in the node classes associated with those rules. That's a tad more difficult to check than meets they eye at first glance. The implementation of bottom_type for built-in nodes is in the relevant classes defined in the opto tree headers. However for machine node classes it is determined by the code which gets generated when the adlc preprocessor consumes the rules included in the ad file. So, although the rules in question here look uniform the derivation of the relevant bottom types is not guaranteed to be the same. Indeed, that is how it looks to me. The case handling for rules which match CMove does not appear (to me) to be able to deal with memory inputs correctly. Different case handling applies for rules which match LoadP or LoadN and I very much hope (and expect) it generates code which does compute bottom types correctly but I have not checked it. I could probably work out what the differences between the two cases are if I spent the time studying the code but I'm assuming (hoping :-) someone here knows how it works and can avoid the need for me to put in that effort. I would strongly advise against employing these rules without a guarantee -- from someone who understands the code -- that the comment about miscomputation of bottom types is not (no longer?) valid. The rules may have generated valid code in all the cases they have matched against so far. However, it is the nature of pattern-based programming models that unexpected matches can turn up at some point and screw the pooch. Even with existing uses there /might/ be as yet untested compile contexts where re-ordering of instructions based on miscomputed bottom types could manifest. It's vital to correctness that the compiler knows which memory slices instructions are operating on, meaning it's equally as important that these bottom types are computed correctly. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From aph at redhat.com Mon Jan 21 12:21:13 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 21 Jan 2019 12:21:13 +0000 Subject: [aarch64-port-dev ] RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics In-Reply-To: References: Message-ID: On 1/21/19 10:53 AM, Pengfei Li (Arm Technology China) wrote: > I also tested the performance with and without my patch by a JMH > case [2]. The JMH result shows that the performance gets ~2.5x > optimized by this. Fair enough; it does look like an improvement. However, please show us the actual numbers, especially at small sizes. Also, how much is the Adler32 checksum actually used? Is it something we care about? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rwestrel at redhat.com Mon Jan 21 12:24:02 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 21 Jan 2019 13:24:02 +0100 Subject: [12] RFR(S): 8217230: assert(t == t_no_spec) failure in NodeHash::check_no_speculative_types() In-Reply-To: References: Message-ID: <87lg3e8ahp.fsf@redhat.com> > http://cr.openjdk.java.net/~thartmann/8217230/webrev.00/ Looks good to me. Roland. From nils.eliasson at oracle.com Mon Jan 21 12:17:50 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 21 Jan 2019 13:17:50 +0100 Subject: [12] RFR(S): 8217230: assert(t == t_no_spec) failure in NodeHash::check_no_speculative_types() In-Reply-To: References: Message-ID: Looks good! // Nils On 2019-01-21 09:21, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8217230 > http://cr.openjdk.java.net/~thartmann/8217230/webrev.00/ > > A SafePointNode becomes dead when being cut off from root in Compile::remove_root_to_sfpts_edges() > but is not processed by IGVN and therefore remains in the graph. Since it is not reachable by root > anymore, it is not processed by Compile::remove_speculative_types and we hit the assert. > > The problem was introduced by the fix for JDK-8214862 [1] in JDK 12 b27. > > Thanks, > Tobias > > [1] https://bugs.openjdk.java.net/browse/JDK-8214862 From tobias.hartmann at oracle.com Mon Jan 21 12:25:48 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 21 Jan 2019 13:25:48 +0100 Subject: [12] RFR(S): 8217230: assert(t == t_no_spec) failure in NodeHash::check_no_speculative_types() In-Reply-To: <87lg3e8ahp.fsf@redhat.com> References: <87lg3e8ahp.fsf@redhat.com> Message-ID: <0aba9d71-ce73-049c-5d10-af89d4446b24@oracle.com> Thanks Roland. Best regards, Tobias On 21.01.19 13:24, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~thartmann/8217230/webrev.00/ > > Looks good to me. > > Roland. > From aph at redhat.com Mon Jan 21 12:27:56 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 21 Jan 2019 12:27:56 +0000 Subject: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered In-Reply-To: References: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> <4a09e8b7-9990-aa66-0afb-bf4e41cab831@arm.com> <79118967-c5b6-ca5c-7c6b-4adb80a4ed60@arm.com> Message-ID: Hi, On 1/21/19 9:27 AM, Nick Gasson (Arm Technology China) wrote: > On 21/01/2019 17:10, Andrew Haley wrote: >> On 1/21/19 6:01 AM, Nick Gasson (Arm Technology China) wrote: >>> OK I'll change all three places in aarch64_enc_fast_lock/unlock that do >>> a compare-exchange to use MacroAssembler::cmpxchg. >> >> If you wish: be aware that if you change anything other than this place there'll >> be a lot more testing to do, and review will take longer. > > I think it will be confusing for anyone looking at these functions in > the future to have one call to cmpxhg and then two copies of essentially > the same code inlined a few lines afterwards. IMO we should either > change all three for consistency, or stick with the original minimal > patch (+ Derek's cleanup suggestions) which should be easier to review. OK, if that's your position: you're writing the patch. Using cmpxhg everywhere will make that rather twisted code much easier to read. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Mon Jan 21 12:27:33 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 21 Jan 2019 13:27:33 +0100 Subject: [12] RFR(S): 8217230: assert(t == t_no_spec) failure in NodeHash::check_no_speculative_types() In-Reply-To: References: Message-ID: <717ae8b0-35d3-cdb9-c6d4-23f3e5bbadde@oracle.com> Thanks Nils. Best regards, Tobias On 21.01.19 13:17, Nils Eliasson wrote: > Looks good! > > // Nils > > On 2019-01-21 09:21, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8217230 >> http://cr.openjdk.java.net/~thartmann/8217230/webrev.00/ >> >> A SafePointNode becomes dead when being cut off from root in Compile::remove_root_to_sfpts_edges() >> but is not processed by IGVN and therefore remains in the graph. Since it is not reachable by root >> anymore, it is not processed by Compile::remove_speculative_types and we hit the assert. >> >> The problem was introduced by the fix for JDK-8214862 [1] in JDK 12 b27. >> >> Thanks, >> Tobias >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8214862 From nils.eliasson at oracle.com Mon Jan 21 12:20:22 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 21 Jan 2019 13:20:22 +0100 Subject: [13] RFR(S): 8217291: Failure of ::realloc() should be handled correctly in adlc/forms.cpp In-Reply-To: <984d33e8-1aab-6fd5-9f45-64b4b08421f2@oracle.com> References: <984d33e8-1aab-6fd5-9f45-64b4b08421f2@oracle.com> Message-ID: <8bf33a6e-3d53-ae39-d301-ea097d14088d@oracle.com> Looks good! // Nils On 2019-01-21 10:47, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8217291 > http://cr.openjdk.java.net/~thartmann/8217291/webrev.00/ > > Similar to the fix for JDK-8212779 [1], I've introduced a wrapper method for re-allocation that > handles failures by printing a message and exiting. > > Thanks, > Tobias > > [1] http://hg.openjdk.java.net/jdk/jdk/rev/a3aa8d5380d9 From tobias.hartmann at oracle.com Mon Jan 21 12:36:09 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 21 Jan 2019 13:36:09 +0100 Subject: [13] RFR(S): 8217291: Failure of ::realloc() should be handled correctly in adlc/forms.cpp In-Reply-To: <8bf33a6e-3d53-ae39-d301-ea097d14088d@oracle.com> References: <984d33e8-1aab-6fd5-9f45-64b4b08421f2@oracle.com> <8bf33a6e-3d53-ae39-d301-ea097d14088d@oracle.com> Message-ID: Thanks Nils! Best regards, Tobias On 21.01.19 13:20, Nils Eliasson wrote: > Looks good! > > // Nils > > On 2019-01-21 10:47, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8217291 >> http://cr.openjdk.java.net/~thartmann/8217291/webrev.00/ >> >> Similar to the fix for JDK-8212779 [1], I've introduced a wrapper method for re-allocation that >> handles failures by printing a message and exiting. >> >> Thanks, >> Tobias >> >> [1] http://hg.openjdk.java.net/jdk/jdk/rev/a3aa8d5380d9 From doug.simon at oracle.com Mon Jan 21 12:57:02 2019 From: doug.simon at oracle.com (Doug Simon) Date: Mon, 21 Jan 2019 13:57:02 +0100 Subject: RFR: 8217445: [JVMCI] incorrect management of JVMCI compilation failure reason string Message-ID: <6E1B238A-8546-4163-A3E5-D155AF18EB47@oracle.com> The CompileTask::_failure_reason field assumes it is only ever assigned a compile-time constant string value (i.e. never needs to be freed). This is not the case when the value is derived from a JVMCI exception message. This patch adds support for managing a C heap allocated value in this field. https://bugs.openjdk.java.net/browse/JDK-8217445 http://cr.openjdk.java.net/~dnsimon/8217445 -Doug From tobias.hartmann at oracle.com Mon Jan 21 13:02:28 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 21 Jan 2019 14:02:28 +0100 Subject: [13] RFR(S): 8217447: Develop flag TraceICs is broken Message-ID: <1690f02c-7452-07ac-4055-94760ea3609c@oracle.com> Hi, please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8217447 http://cr.openjdk.java.net/~thartmann/8217447/webrev.00/ While working on the value type calling convention, I've noticed that -XX:+TraceICs is broken. The problem is that info.cached_metadata() can be NULL for optimized calls (the assert right before even verifies that). I've also removed the ":" from the output. Before: IC at 0x00007f8020ae948b: monomorphic to compiled (rcvr klass) NULL: After: IC at 0x00007f8020ae948b: monomorphic to compiled (rcvr klass = NULL) Thanks, Tobias From aph at redhat.com Mon Jan 21 13:12:29 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 21 Jan 2019 13:12:29 +0000 Subject: [aarch64-port-dev ] RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics In-Reply-To: References: Message-ID: <8b95459f-4acd-729b-5174-670460b76c58@redhat.com> On 1/21/19 12:21 PM, Andrew Haley wrote: > Also, how much is the Adler32 checksum actually used? Is it > something we care about? ... the ZIP file format uses Adler32, but as far as I remember we're using zlib, an external library, for our zipfile handling (i.e. our jar files.) If we are using an external library then the performance of our intrinsicmight not matter at all, Please check. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dmitry.chuyko at bell-sw.com Mon Jan 21 14:11:12 2019 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Mon, 21 Jan 2019 17:11:12 +0300 Subject: [aarch64-port-dev ] RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics In-Reply-To: <8b95459f-4acd-729b-5174-670460b76c58@redhat.com> References: <8b95459f-4acd-729b-5174-670460b76c58@redhat.com> Message-ID: <7b071ae1-7bf5-9d9a-f5ef-2b5d26d57de3@bell-sw.com> Adler32 may be chosen as HDFS checksum. Hadoop uses 512 byte blocks by default. I see some speedups on Cavium Thunder X (1st gen, TX2 data later) with provided patch: 64 B. 8% 512 B. 10% 1 MB. 10%. We considered following improvements without using vector instructions. Just split loads and break some data dependencies like: ??? __ ldr(temp0, Address(__ post(buff, 8))); ??? __ ldr(temp1, Address(__ post(buff, 8))); ??? __ add(s1, s1, temp0, ext::uxtb); ??? __ ubfx(temp2, temp0, 8, 8); ??? __ add(s2, s2, s1); ??? __ add(s1, s1, temp2); ??? __ ubfx(temp3, temp0, 16, 8); ??? __ add(s2, s2, s1); ??? __ add(s1, s1, temp3); ??? __ ubfx(temp2, temp0, 24, 8); ??? __ add(s2, s2, s1); ??? __ add(s1, s1, temp2); ??? __ ubfx(temp3, temp0, 32, 8); ??? __ add(s2, s2, s1); ??? __ add(s1, s1, temp3); ??? __ ubfx(temp2, temp0, 40, 8); ??? __ add(s2, s2, s1); ??? __ add(s1, s1, temp2); ??? __ ubfx(temp3, temp0, 48, 8); ??? __ add(s2, s2, s1); ??? __ add(s1, s1, temp3); It shows 23% improvement on TX1 for size=512 but relatively the same performance as baseline on TX2. -Dmitry On 1/21/19 4:12 PM, Andrew Haley wrote: > On 1/21/19 12:21 PM, Andrew Haley wrote: > >> Also, how much is the Adler32 checksum actually used? Is it >> something we care about? > ... the ZIP file format uses Adler32, but as far as I remember we're > using zlib, an external library, for our zipfile handling (i.e. our > jar files.) If we are using an external library then the performance > of our intrinsicmight not matter at all, Please check. > From adinn at redhat.com Mon Jan 21 14:58:50 2019 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 21 Jan 2019 14:58:50 +0000 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: <75d28ca7-9e80-4bd4-11a6-c858048e4380@bell-sw.com> References: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> <07582b62-ccdf-97c8-5bd9-f441b488fa03@bell-sw.com> <79558b49-6375-f0d4-1278-f66a4f470b13@redhat.com> <75d28ca7-9e80-4bd4-11a6-c858048e4380@bell-sw.com> Message-ID: <37a39d6a-6b1d-2d96-9808-9141359114c0@redhat.com> Hello Dmitrij, On 10/01/2019 15:10, Dmitrij Pochepko wrote: > I?ll focus on addressing your technical questions about testing this > patch and intrinsic first. > . . . > I referenced this test in initial review request for this intrinsic. It > takes a long time to run, so I did not include it in the webrev. I'm > going to update the webrev to include a subset of this test as jtreg. Ok, thank you for providing full details of the testing regime. If you add the test as a jtreg test then I'm happy for it and your one line fix to be pushed. > Even brute force tests with 100% code coverage don't guarantee 100% > correctness. The search-garbage-after-string test case for "algorithm G" > and StringBuilder::setLength usage is a good catch by Stefan and > Pengfei. And recent webrev addresses this case. I also tested a case > symmetric to Pengfei's case checking that no "garbage" is read before > specified source string [4]. I also am going to include it in the webrev. I am aware of the limits of brute force methods. However, note that in my previous post I set the bar at tests that would inspire confidence in the code not ones that would guarantee correctness. God forbid that we go down the route of formal verification, Grails are hard to come by. The second, extra jtreg test is good. > Indeed it is hard to review complex algorithms. The Boyer-Moore comments > you referenced were updated as part of the original webrev to describe > changes in algorithm E, which is in macroAssembler_aarch64.cpp. I once > asked to validate the level of comments with you during pow function > review [3]. If this is the level of comments you find reasonable, I?ll > be happy to improve it here and elsewhere to this level. Yes, I believe the code generated in the stub needs more documentation. However, it is important to fix what is currently broken quickly. Please raise a separate JIRA for the doc fixes and then submit an algorithm and/or comments in the generator code that explain what the stub is doing. > Once again, this is to address your question around testing for this > intrinsic and patch. We are working on testing and review complex > intrinsics to handle the wider problem of ensuring better quality of > AArch64 intrinsics. We?ll follow up in a different email on that. Well, one thing that needs to form part of that discussion is the potential benefit of these patches vs the cost of producing, reviewing and maintaining them. Included in the equation for the benefits is the number of users it will help and the criticality of the problem they face without the patch. On the costs side we need to factor in the effort needed to clearly document complex code compared with the potential cost of someone having to pick it up later and also the potential, even with good documentation, of the resulting code becoming a fly trap for developer and/or maintainer time. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From adinn at redhat.com Mon Jan 21 16:14:37 2019 From: adinn at redhat.com (Andrew Dinn) Date: Mon, 21 Jan 2019 16:14:37 +0000 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <3df3b5cd-dbc7-7fdd-bfc5-2a54d11127da@redhat.com> <27c9458d-7257-378a-4e3a-bd03402794be@oracle.com> <09c92b1a-c6da-e16b-bb68-553876e8a6ea@oracle.com> <2a0a385d-81ee-df66-f147-e4dd9aa5b72e@oracle.com> <8b2ab749-20f1-8dd3-3cc7-64db5d45bc7d@redhat.com> <0aae37aa-7797-fde5-63d5-96c8eb961183@oracle.com> <86a1988a-a8d2-b6af-0985-11a94d6d76a5@redhat.com> <69510788-52e6-815b-1ed7-a6f4886d0398@oracle.com> <3e3c4f7d-049e-4aec-c165-f2664e7c98ef@redhat.com> <34cfc530-8517-ac1a-0c04-446dc3dc2436@oracle.com> <21ef0e11-3f3d-e9a4-5dc6-898d4ac18efa@redhat.com> Message-ID: Hi Alan, On 18/01/2019 13:32, Alan Bateman wrote: > I had a brief discussion with Brian about this yesterday. He brought up > the same concern about using MBB as it's not the right API for this in > the longer term.? So this JEP is very much about a short term/tactical > solution as we've already concluded here. This leads to the question as > to whether this JEP needs to evolve the standard/Java SE API or not. > It's convenient for the implementation of course but we should at least > explore doing this as a JDK-specific feature. I disagree with your characterization of use of MBB as a short term/ tactical solution. Despite not being entirely suitable for the task MBB is a de facto standard way for many applications to gain direct access to files of data located on persistent storage. The current proposal is not, as you characterize it, a quick fix to use MBB as a temporary way to access NVM storage until something better comes along. The intention is rather to ensure that the current API caters for a new addition to the persistent memory tier. The imperative is to allow existing code to employ it now. Of course, a better API may come along for accessing persistent storage, whether that be NVM, flash disk or spinning platter. However, I would hazard that in many cases existing application code and libraries will still want/need to continue to use the MBB API, including cases where that storage can most usefully be NVM. Rewriting application code to use a new API will not always be feasible or cost-effective. Yet, the improved speed of NVM suggests that an API encompassing this new case will be very welcome and may well be cost-effective to adopt. In sum, far from being a stop-gap this proposal should be seen as a step towards completing and maintaining the existing MBB API for emergent tech. > To that end, one approach to explore is allowing the FC.map method > accept map modes beyond those defined by MapMode. There is precedence > for extensibility in this area already, e.g. FC.open allows you to > specify options beyond the standard options specified by the method. It > would require MapMode to define a protected constructor and would > require a bit of plumbing to support MapMode defined in a JDK-specific > module but there are examples to point to. Another approach is aanother > class in a JDK-specific module to define the map method. It would > require the same plumbing under the covers but would avoid touch the FC > spec. I'm not sure what this side-step is supposed to achieve nor how that relates to the concerns over use of MBB (perhaps it doesn't). I'm not really clear what problem you are trying to avoid here by allowing the MapMode enum to be extensible via a protected constructor. If your desire is to avoid adding extra API surface to FileChannel then where would you consider it appropriate to add such a surface. Something is going to have to create and employ the extra enum tags that are currently proposed for addition to MapMode. How is a client application going to reach that something? Perhaps we might benefit form looking at a simple example? Currently, my most basic test program drives the API to create an MBB as follows: . . . String dir = "/mnt/pmem/test"; // mapSync should work, since fs mount is -o dax Path path = new File(dir, "pmemtest").toPath(); FileChannel fileChannel = (FileChannel) Files .newByteChannel(path, EnumSet.of( StandardOpenOption.READ, StandardOpenOption.WRITE, StandardOpenOption.CREATE)); MappedByteBuffer mappedByteBuffer = fileChannel.map(FileChannel.MapMode.READ_WRITE_PERSISTENT, 0, 1024); . . . Could you give a sketch of an alternative way that you see a client operating? One thing I did wonder about was whether we could insert the relevant behavioural switch in the call to Files.newByteChannel rather than the map call? If we passed an ExtendedOpenOption (e.g. ExtendedOpenOption.SYNC) to this call then method newByteChannel could create and return a corresponding variant of FleChannelImpl, say an instance of a subclass called SyncFileChannelImpl. This could behave as per a normal FileChannelImpl apart from adding the MAP_SYNC flag to the mmap call (well, also rejecting PRIVATE maps). Would that be a better way to drive this? Would it address the concerns you raised above? regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From lutz.schmidt at sap.com Mon Jan 21 17:15:52 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 21 Jan 2019 17:15:52 +0000 Subject: RFR(S, tedious): 8217250: Optimize CodeHeap Analytics In-Reply-To: References: <2d7b7963-61be-95b8-017b-956f2752c8f3@oracle.com> <9AD26B9E-015A-4BD4-A44F-12DDE2793ED0@sap.com> <4e61a8a5-3c6e-3c4f-0a2c-68d4b8bc2f9f@oracle.com> <52136751-929b-4976-477d-93282ce0a0d7@oracle.com> Message-ID: Hi all, as said on Friday, I rebased the changeset to jdk/jdk and pushed it. The pushed version can be found at http://cr.openjdk.java.net/~lucy/webrevs/8217250.02/ It is identical to version 01 which was based on jdk12. Thanks, Lutz ?On 18.01.19, 17:05, "Schmidt, Lutz" wrote: Thank you, Tobias! As this enhancement will not make it into jdk12, I'll rebase it to jdk/jdk. I expect no conflicts and assume I can then push without further webrev/review. Thanks, Lutz On 18.01.19, 10:49, "Tobias Hartmann" wrote: Hi Lutz, looks good to me too. Best regards, Tobias On 17.01.19 19:39, Vladimir Kozlov wrote: > Looks good > > Thanks, > Vladimir > > On 1/17/19 7:47 AM, Schmidt, Lutz wrote: >> Hi Vladimir & all, >> there is a new webrev available: http://cr.openjdk.java.net/~lucy/webrevs/8217250.01/ >> What's new (in addition to some comments) is the macro >> >> // Flush the buffer contents if the remaining capacity is less >> // than the calculated threshold (256 bytes + capacity/16) >> // That should suffice for all reasonably sized output lines. >> #define BUFFEREDSTREAM_FLUSH_AUTO(_termString) \ >> BUFFEREDSTREAM_FLUSH_IF(_termString, 256+(_capacity>>4)) >> >> It replaced the previous BUFFEREDSTREAM_FLUSH_IF("string", 512) occurrences. >> Regards, >> Lutz >> >> On 16.01.19, 22:53, "Vladimir Kozlov" wrote: >> >> On 1/16/19 12:37 PM, Schmidt, Lutz wrote: >> > Hi Vladimir, >> > >> > thanks a lot for looking at this so quickly. >> > >> > Sure, I could declare a specialized "BUFFEREDSTREAM_FLUSH_512" for this. The "512" >> originated from the thought "its large enough for a well-behaved line and small enough to save >> some flushes". >> > >> > I was also thinking about a "BUFFEREDSTREAM_FLUSH_AUTO", where the spare space is derived >> from the buffer capacity, maybe something like 10 percent of the capacity, 256 bytes minimum. I >> wasn't sure if that could be categorized as over-engineered. >> Yes, I think BUFFEREDSTREAM_FLUSH_AUTO is better than fixed size. >> Vladimir >> > >> > Your thoughts? >> > >> > Thanks, >> > Lutz >> > >> > On 16.01.19, 19:10, "hotspot-compiler-dev on behalf of Vladimir Kozlov" >> wrote: >> > >> > Hi Lutz, >> > >> > I see that you have only one usage in all cases for: >> > BUFFEREDSTREAM_FLUSH_IF("", 512) >> > >> > Can you simple declare simplified macro for this? >> > >> > Otherwise looks good. >> > >> > Thanks, >> > Vladimir >> > >> > On 1/16/19 6:52 AM, Schmidt, Lutz wrote: >> > > Dear all, >> > > >> > > may I please have reviews for this (semantically) small change. Its purpose is to >> reduce the bufferedStream buffer flushes while printing CodeHeap Analytics. >> > > >> > > Bug: https://bugs.openjdk.java.net/browse/JDK-8217250 >> > > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8217250.00/ >> > > >> > > Thank you! >> > > Lutz >> > > >> > > >> > >> > >> From vladimir.kozlov at oracle.com Mon Jan 21 17:44:55 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 21 Jan 2019 09:44:55 -0800 Subject: RFR: 8217445: [JVMCI] incorrect management of JVMCI compilation failure reason string In-Reply-To: <6E1B238A-8546-4163-A3E5-D155AF18EB47@oracle.com> References: <6E1B238A-8546-4163-A3E5-D155AF18EB47@oracle.com> Message-ID: <65d4fb89-4f30-8ace-f455-ba2393d4832f@oracle.com> Hi Doug, Looks good. Thank you for fixing it. Vladimir On 1/21/19 4:57 AM, Doug Simon wrote: > The CompileTask::_failure_reason field assumes it is only ever assigned a compile-time constant string value (i.e. never needs to be freed). This is not the case when the value is derived from a JVMCI exception message. This patch adds support for managing a C heap allocated value in this field. > > https://bugs.openjdk.java.net/browse/JDK-8217445 > http://cr.openjdk.java.net/~dnsimon/8217445 > > -Doug > From vladimir.kozlov at oracle.com Mon Jan 21 17:48:36 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 21 Jan 2019 09:48:36 -0800 Subject: [13] RFR(S): 8217447: Develop flag TraceICs is broken In-Reply-To: <1690f02c-7452-07ac-4055-94760ea3609c@oracle.com> References: <1690f02c-7452-07ac-4055-94760ea3609c@oracle.com> Message-ID: <6d9dfbea-c600-1717-bcad-85f8a9462b32@oracle.com> Looks good. Thanks, Vladimir On 1/21/19 5:02 AM, Tobias Hartmann wrote: > Hi, > > please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8217447 > http://cr.openjdk.java.net/~thartmann/8217447/webrev.00/ > > While working on the value type calling convention, I've noticed that -XX:+TraceICs is broken. The > problem is that info.cached_metadata() can be NULL for optimized calls (the assert right before even > verifies that). > > I've also removed the ":" from the output. > > Before: > IC at 0x00007f8020ae948b: monomorphic to compiled (rcvr klass) NULL: > > After: > IC at 0x00007f8020ae948b: monomorphic to compiled (rcvr klass = NULL) > > Thanks, > Tobias > From aph at redhat.com Mon Jan 21 17:51:38 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 21 Jan 2019 17:51:38 +0000 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: <37a39d6a-6b1d-2d96-9808-9141359114c0@redhat.com> References: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> <07582b62-ccdf-97c8-5bd9-f441b488fa03@bell-sw.com> <79558b49-6375-f0d4-1278-f66a4f470b13@redhat.com> <75d28ca7-9e80-4bd4-11a6-c858048e4380@bell-sw.com> <37a39d6a-6b1d-2d96-9808-9141359114c0@redhat.com> Message-ID: <0ff14be6-f98f-89af-2eea-6eb635d8bd14@redhat.com> On 1/21/19 2:58 PM, Andrew Dinn wrote: >> Once again, this is to address your question around testing for this >> intrinsic and patch. We are working on testing and review complex >> intrinsics to handle the wider problem of ensuring better quality of >> AArch64 intrinsics. We?ll follow up in a different email on that. > Well, one thing that needs to form part of that discussion is the > potential benefit of these patches vs the cost of producing, reviewing > and maintaining them. Included in the equation for the benefits is the > number of users it will help and the criticality of the problem they > face without the patch. On the costs side we need to factor in the > effort needed to clearly document complex code compared with the > potential cost of someone having to pick it up later and also the > potential, even with good documentation, of the resulting code becoming > a fly trap for developer and/or maintainer time. We do. I was concerned about the complexity of the Boyer-Moore- Horspool algorithm at the time but was persuaded to admit it. These days I'd push back more: the last year or two of the AArch64 project has hardened my attitude. Rob Pike's 5 Rules of Programming Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is. Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest. Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.) Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures. ... More at https://users.ece.utexas.edu/~adnan/pike.html -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Mon Jan 21 17:51:18 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 21 Jan 2019 09:51:18 -0800 Subject: [13] RFR(S): 8217291: Failure of ::realloc() should be handled correctly in adlc/forms.cpp In-Reply-To: <8bf33a6e-3d53-ae39-d301-ea097d14088d@oracle.com> References: <984d33e8-1aab-6fd5-9f45-64b4b08421f2@oracle.com> <8bf33a6e-3d53-ae39-d301-ea097d14088d@oracle.com> Message-ID: <19ce3914-23c8-e588-6dc9-36bad3cc2f69@oracle.com> +1 Thanks, Vladimir On 1/21/19 4:20 AM, Nils Eliasson wrote: > Looks good! > > // Nils > > On 2019-01-21 10:47, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8217291 >> http://cr.openjdk.java.net/~thartmann/8217291/webrev.00/ >> >> Similar to the fix for JDK-8212779 [1], I've introduced a wrapper method for re-allocation that >> handles failures by printing a message and exiting. >> >> Thanks, >> Tobias >> >> [1] http://hg.openjdk.java.net/jdk/jdk/rev/a3aa8d5380d9 From martin.doerr at sap.com Mon Jan 21 18:07:13 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 21 Jan 2019 18:07:13 +0000 Subject: RFR(M): 8217459: [PPC64] Cleanup non-vector version of CRC32 Message-ID: Hi, PPC64 currently contains static tables for CRC32/CRC32C calculations. We only need some of them depending on Endianess and on whether vector instructions are available or not. We can get rid of quite some code when we generate these constants at startup as we already do for the vector version. In addition, we can save one register in the vector case because we can use one constants pointer for all related constants. Webrev: http://cr.openjdk.java.net/~mdoerr/8217459_ppc64_crc_consts/webrev.00/ Please review. Best regards, Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobias.hartmann at oracle.com Mon Jan 21 18:15:38 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 21 Jan 2019 19:15:38 +0100 Subject: [13] RFR(S): 8217291: Failure of ::realloc() should be handled correctly in adlc/forms.cpp In-Reply-To: <19ce3914-23c8-e588-6dc9-36bad3cc2f69@oracle.com> References: <984d33e8-1aab-6fd5-9f45-64b4b08421f2@oracle.com> <8bf33a6e-3d53-ae39-d301-ea097d14088d@oracle.com> <19ce3914-23c8-e588-6dc9-36bad3cc2f69@oracle.com> Message-ID: Thanks Vladimir. Best regards, Tobias On 21.01.19 18:51, Vladimir Kozlov wrote: > +1 > > Thanks, > Vladimir > > On 1/21/19 4:20 AM, Nils Eliasson wrote: >> Looks good! >> >> // Nils >> >> On 2019-01-21 10:47, Tobias Hartmann wrote: >>> Hi, >>> >>> please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8217291 >>> http://cr.openjdk.java.net/~thartmann/8217291/webrev.00/ >>> >>> Similar to the fix for JDK-8212779 [1], I've introduced a wrapper method for re-allocation that >>> handles failures by printing a message and exiting. >>> >>> Thanks, >>> Tobias >>> >>> [1] http://hg.openjdk.java.net/jdk/jdk/rev/a3aa8d5380d9 From tobias.hartmann at oracle.com Mon Jan 21 18:15:21 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 21 Jan 2019 19:15:21 +0100 Subject: [13] RFR(S): 8217447: Develop flag TraceICs is broken In-Reply-To: <6d9dfbea-c600-1717-bcad-85f8a9462b32@oracle.com> References: <1690f02c-7452-07ac-4055-94760ea3609c@oracle.com> <6d9dfbea-c600-1717-bcad-85f8a9462b32@oracle.com> Message-ID: Thanks Vladimir. Best regards, Tobias On 21.01.19 18:48, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 1/21/19 5:02 AM, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8217447 >> http://cr.openjdk.java.net/~thartmann/8217447/webrev.00/ >> >> While working on the value type calling convention, I've noticed that -XX:+TraceICs is broken. The >> problem is that info.cached_metadata() can be NULL for optimized calls (the assert right before even >> verifies that). >> >> I've also removed the ":" from the output. >> >> Before: >> IC at 0x00007f8020ae948b: monomorphic to compiled (rcvr klass) NULL: >> >> After: >> IC at 0x00007f8020ae948b: monomorphic to compiled (rcvr klass = NULL) >> >> Thanks, >> Tobias >> From felix.yang at huawei.com Tue Jan 22 01:17:47 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 22 Jan 2019 01:17:47 +0000 Subject: [RFR] 8217359: C2 compiler triggers SIGSEGV after tranformation in ConvI2LNode::Ideal In-Reply-To: <32982b31-3a91-58fb-a6b8-b1cd9f7cdb41@oracle.com> References: <32982b31-3a91-58fb-a6b8-b1cd9f7cdb41@oracle.com> Message-ID: Hi, Thanks for reviewing. The regression test is added. New webrev: http://cr.openjdk.java.net/~fyang/8217359/webrev.01/ This is committed to the submit repo: http://hg.openjdk.java.net/jdk/submit/rev/7345adfbc913 The email I got shows that it passed the Oralce internal tests: ================================================= Build Details: 2019-01-21-1210078.felix.yang.source 0 Failed Tests Mach5 Tasks Results Summary ? EXECUTED_WITH_FAILURE: 0 ? NA: 0 ? KILLED: 0 ? UNABLE_TO_RUN: 0 ? PASSED: 76 ? FAILED: 0 ================================================= OK to push? Thanks for your help, Felix > > Hi Felix, > > Could you please add the regression test as jtreg test? > > Otherwise, the fix looks reasonable to me. Nice analysis! > > Thanks, > Tobias From fairoz.matte at oracle.com Tue Jan 22 03:35:16 2019 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Mon, 21 Jan 2019 19:35:16 -0800 (PST) Subject: [13] RFR(S): 8209951 : Problematic sparc intrinsic: com.sun.crypto.provider.CipherBlockChaining Message-ID: Hi, Please review the following patch, JBS bug - https://bugs.openjdk.java.net/browse/JDK-8209951 Webrev - http://cr.openjdk.java.net/~fmatte/8209951/webrev.00/ During the call to assembled stub code generate_cipherBlockChaining_decryptAESCrypt_Parallel() there was reference to G6 register used for temporary storage of F50, as G6 is not saved on stack it was resulting in garbage during retrieval. Solution is to use unused local register (L6) for temporary storage and retrieval of F50. Thanks, Fairoz From tobias.hartmann at oracle.com Tue Jan 22 08:00:21 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 22 Jan 2019 09:00:21 +0100 Subject: RFR: 8217445: [JVMCI] incorrect management of JVMCI compilation failure reason string In-Reply-To: <6E1B238A-8546-4163-A3E5-D155AF18EB47@oracle.com> References: <6E1B238A-8546-4163-A3E5-D155AF18EB47@oracle.com> Message-ID: <0998110b-082b-82e3-521b-555af94d5827@oracle.com> Hi Doug, looks good to me too. Best regards, Tobias On 21.01.19 13:57, Doug Simon wrote: > The CompileTask::_failure_reason field assumes it is only ever assigned a compile-time constant string value (i.e. never needs to be freed). This is not the case when the value is derived from a JVMCI exception message. This patch adds support for managing a C heap allocated value in this field. > > https://bugs.openjdk.java.net/browse/JDK-8217445 > http://cr.openjdk.java.net/~dnsimon/8217445 > > -Doug > From tobias.hartmann at oracle.com Tue Jan 22 08:04:10 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 22 Jan 2019 09:04:10 +0100 Subject: [RFR] 8217359: C2 compiler triggers SIGSEGV after tranformation in ConvI2LNode::Ideal In-Reply-To: References: <32982b31-3a91-58fb-a6b8-b1cd9f7cdb41@oracle.com> Message-ID: Hi Felix, this looks good to me, thanks for adding the test! A second review would be good. In the meantime, please request approval for integration into JDK 12 according to: http://openjdk.java.net/jeps/3#Fix-Request-Process Thanks, Tobias On 22.01.19 02:17, Yangfei (Felix) wrote: > Hi, > > Thanks for reviewing. The regression test is added. > New webrev: http://cr.openjdk.java.net/~fyang/8217359/webrev.01/ > This is committed to the submit repo: http://hg.openjdk.java.net/jdk/submit/rev/7345adfbc913 > > The email I got shows that it passed the Oralce internal tests: > ================================================= > Build Details: 2019-01-21-1210078.felix.yang.source > 0 Failed Tests > Mach5 Tasks Results Summary > ? EXECUTED_WITH_FAILURE: 0 > ? NA: 0 > ? KILLED: 0 > ? UNABLE_TO_RUN: 0 > ? PASSED: 76 > ? FAILED: 0 > ================================================= > > OK to push? > > Thanks for your help, > Felix > >> >> Hi Felix, >> >> Could you please add the regression test as jtreg test? >> >> Otherwise, the fix looks reasonable to me. Nice analysis! >> >> Thanks, >> Tobias > From tobias.hartmann at oracle.com Tue Jan 22 08:22:16 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 22 Jan 2019 09:22:16 +0100 Subject: [13] RFR(S): 8209951 : Problematic sparc intrinsic: com.sun.crypto.provider.CipherBlockChaining In-Reply-To: References: Message-ID: <9d49a648-50e6-30cd-5705-2997f658d8e8@oracle.com> Hi Fairoz, this looks good to me. Thanks, Tobias On 22.01.19 04:35, Fairoz Matte wrote: > Hi, > > Please review the following patch, > JBS bug - https://bugs.openjdk.java.net/browse/JDK-8209951 > Webrev - http://cr.openjdk.java.net/~fmatte/8209951/webrev.00/ > > During the call to assembled stub code generate_cipherBlockChaining_decryptAESCrypt_Parallel() > there was reference to G6 register used for temporary storage of F50, > as G6 is not saved on stack it was resulting in garbage during retrieval. > > Solution is to use unused local register (L6) for temporary storage and retrieval of F50. > > Thanks, > Fairoz > From Nick.Gasson at arm.com Tue Jan 22 09:10:15 2019 From: Nick.Gasson at arm.com (Nick Gasson (Arm Technology China)) Date: Tue, 22 Jan 2019 09:10:15 +0000 Subject: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered In-Reply-To: References: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> <4a09e8b7-9990-aa66-0afb-bf4e41cab831@arm.com> <79118967-c5b6-ca5c-7c6b-4adb80a4ed60@arm.com> Message-ID: <62b9e1c3-7c76-c3a2-0a8e-4e3ce4f79d9b@arm.com> Hi, On 21/01/2019 20:27, Andrew Haley wrote: > > OK, if that's your position: you're writing the patch. Using cmpxhg > everywhere will make that rather twisted code much easier to read. > Please see the updated webrev to use cmpxchg in both the lock and unlock functions: http://cr.openjdk.java.net/~ngasson/8217368/webrev.1/ Also includes Derek's cleanup suggestions (although some of them are not applicable now). Testing I've done on this: * Ran jtreg with assertions enabled (+UseLSE) * Ran jcstress with both +UseLSE and -UseLSE * Ran the JMH LockUnlock benchmarks with -UseBiasedLocking to check for performance regressions. The directory below contains the the generated assembly from each webrev and current hg tip for this simple method: http://cr.openjdk.java.net/~ngasson/8217368/generated/ private Object obj = new Object(); public int x; private void incX() { synchronized (obj) { x++; } } The output of webrev.1 looks OK to me. Any other suggestions of things to test? Thanks, Nick From dmitry.chuyko at bell-sw.com Tue Jan 22 09:31:26 2019 From: dmitry.chuyko at bell-sw.com (Dmitry Chuyko) Date: Tue, 22 Jan 2019 12:31:26 +0300 Subject: [aarch64-port-dev ] RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics In-Reply-To: <7b071ae1-7bf5-9d9a-f5ef-2b5d26d57de3@bell-sw.com> References: <8b95459f-4acd-729b-5174-670460b76c58@redhat.com> <7b071ae1-7bf5-9d9a-f5ef-2b5d26d57de3@bell-sw.com> Message-ID: <5da5933d-aa7f-ad0a-2c60-f3c0e500465a@bell-sw.com> TX2 data for the patch: 64 B. 1.5x speedup 512 B. 2x speedup 1 MB. 2.2x speedup! -Dmitry On 1/21/19 5:11 PM, Dmitry Chuyko wrote: > Adler32 may be chosen as HDFS checksum. Hadoop uses 512 byte blocks by > default. > > I see some speedups on Cavium Thunder X (1st gen, TX2 data later) with > provided patch: > > 64 B. 8% > 512 B. 10% > 1 MB. 10%. > > > ......................... From aph at redhat.com Tue Jan 22 09:36:13 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 22 Jan 2019 09:36:13 +0000 Subject: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered In-Reply-To: <62b9e1c3-7c76-c3a2-0a8e-4e3ce4f79d9b@arm.com> References: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> <4a09e8b7-9990-aa66-0afb-bf4e41cab831@arm.com> <79118967-c5b6-ca5c-7c6b-4adb80a4ed60@arm.com> <62b9e1c3-7c76-c3a2-0a8e-4e3ce4f79d9b@arm.com> Message-ID: Hi, On 1/22/19 9:10 AM, Nick Gasson (Arm Technology China) wrote: > > Please see the updated webrev to use cmpxchg in both the lock and unlock > functions: > > http://cr.openjdk.java.net/~ngasson/8217368/webrev.1/ > > Also includes Derek's cleanup suggestions (although some of them are not > applicable now). > > Testing I've done on this: > > * Ran jtreg with assertions enabled (+UseLSE) > > * Ran jcstress with both +UseLSE and -UseLSE > > * Ran the JMH LockUnlock benchmarks with -UseBiasedLocking to check for > performance regressions. > > The directory below contains the the generated assembly from each webrev > and current hg tip for this simple method: > > http://cr.openjdk.java.net/~ngasson/8217368/generated/ Excellent, thanks for that. Otherwise I'd have had to generate these myself. > private Object obj = new Object(); > public int x; > > private void incX() { > synchronized (obj) { > x++; > } > } > > The output of webrev.1 looks OK to me. Any other suggestions of things > to test? That looks right, thanks. It's extremely difficult to test this stuff in practice. Does any of the above stress test recursive locking in the presence of many threads? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From Nick.Gasson at arm.com Tue Jan 22 10:15:34 2019 From: Nick.Gasson at arm.com (Nick Gasson (Arm Technology China)) Date: Tue, 22 Jan 2019 10:15:34 +0000 Subject: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered In-Reply-To: References: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> <4a09e8b7-9990-aa66-0afb-bf4e41cab831@arm.com> <79118967-c5b6-ca5c-7c6b-4adb80a4ed60@arm.com> <62b9e1c3-7c76-c3a2-0a8e-4e3ce4f79d9b@arm.com> Message-ID: Hi Andrew On 22/01/2019 17:36, Andrew Haley wrote: > Does any of the above stress test recursive locking in the presence of many threads? > I can't immediately find anything in jcstress that does this (although I might not be looking in the right place). If you do: make test TEST="micro:LockUnlock.testRecursiveSynchronizationNoBias" MICRO="OPTIONS=-t 10" It will run that recursive JMH case with 10 threads. In this case the lock will be inflated and as we don't have a fast-path for recursive-monitor we will call into the runtime for each recursive monitorenter/exit. The JMH test isn't checking for correctness but we at least don't hit any assertions in a fastdebug build. Thanks, Nick From shade at redhat.com Tue Jan 22 10:27:34 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 22 Jan 2019 11:27:34 +0100 Subject: RFR [12] (XS): http://cr.openjdk.java.net/~shade/8217467/webrev.01/ Message-ID: <2cf6bd9e-d73f-c4b3-725d-aba3f5ed08c3@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8217467 Fix: http://cr.openjdk.java.net/~shade/8217467/webrev.01/ This is found and verified by Shenandoah CTW tests that verifies barrier placement. Base64 intrinsic is new, and only enabled on modern hardware (I think you need AVX512). I'd like to push this fix to jdk12. Testing: Shenandoah CTW tests, hotspot tier1 (includes compiler/intrinsics/base64), jdk-submit12 (running) Thanks, -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From shade at redhat.com Tue Jan 22 10:28:45 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 22 Jan 2019 11:28:45 +0100 Subject: RFR [12] 8217467 (XS): Access barriers are missing in C2 intrinsic for Base64 In-Reply-To: <2cf6bd9e-d73f-c4b3-725d-aba3f5ed08c3@redhat.com> References: <2cf6bd9e-d73f-c4b3-725d-aba3f5ed08c3@redhat.com> Message-ID: <99f5f7c8-0747-2cae-f8e5-e7d05358efcf@redhat.com> (correct title) On 1/22/19 11:27 AM, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8217467 > > Fix: > http://cr.openjdk.java.net/~shade/8217467/webrev.01/ > > This is found and verified by Shenandoah CTW tests that verifies barrier placement. Base64 intrinsic > is new, and only enabled on modern hardware (I think you need AVX512). I'd like to push this fix to > jdk12. > > Testing: Shenandoah CTW tests, hotspot tier1 (includes compiler/intrinsics/base64), jdk-submit12 > (running) > > Thanks, > -Aleksey > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From Pengfei.Li at arm.com Tue Jan 22 10:32:00 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Tue, 22 Jan 2019 10:32:00 +0000 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: <75d28ca7-9e80-4bd4-11a6-c858048e4380@bell-sw.com> References: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> <07582b62-ccdf-97c8-5bd9-f441b488fa03@bell-sw.com> <79558b49-6375-f0d4-1278-f66a4f470b13@redhat.com> <75d28ca7-9e80-4bd4-11a6-c858048e4380@bell-sw.com> Message-ID: Hi Dmitrij, I (not a reviewer) tested your single line fix and it looks ok to me. Also I bump the priority of the JBS (https://bugs.openjdk.java.net/browse/JDK-8215792) to P2 and hope the fix could be in JDK 12. (The door of JDK 12 will be closed soon) > Indeed it is hard to review complex algorithms. The Boyer-Moore comments > you referenced were updated as part of the original webrev to describe > changes in algorithm E, which is in macroAssembler_aarch64.cpp. I once > asked to validate the level of comments with you during pow function review > [3]. If this is the level of comments you find reasonable, I?ll be happy to > improve it here and elsewhere to this level. When I was trying to fix this bug, I found it pretty easy to get lost among branches in the code. And other engineers from Arm looking at the AArch64 intrinsics have the similar feeling. So I'd also strongly recommend you write more explanations in comments. In this String.indexOf(str) intrinsic, as there are a lot of paths for different length conditions, I have another suggestion of adding the conditions of path A to G you wrote in your last email into comments. -- Thanks, Pengfei From tobias.hartmann at oracle.com Tue Jan 22 10:33:10 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 22 Jan 2019 11:33:10 +0100 Subject: RFR [12] 8217467 (XS): Access barriers are missing in C2 intrinsic for Base64 In-Reply-To: <99f5f7c8-0747-2cae-f8e5-e7d05358efcf@redhat.com> References: <2cf6bd9e-d73f-c4b3-725d-aba3f5ed08c3@redhat.com> <99f5f7c8-0747-2cae-f8e5-e7d05358efcf@redhat.com> Message-ID: Hi Aleksey, looks good to me. Best regards, Tobias On 22.01.19 11:28, Aleksey Shipilev wrote: > (correct title) > > On 1/22/19 11:27 AM, Aleksey Shipilev wrote: >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8217467 >> >> Fix: >> http://cr.openjdk.java.net/~shade/8217467/webrev.01/ >> >> This is found and verified by Shenandoah CTW tests that verifies barrier placement. Base64 intrinsic >> is new, and only enabled on modern hardware (I think you need AVX512). I'd like to push this fix to >> jdk12. >> >> Testing: Shenandoah CTW tests, hotspot tier1 (includes compiler/intrinsics/base64), jdk-submit12 >> (running) >> >> Thanks, >> -Aleksey >> > > From adinn at redhat.com Tue Jan 22 10:44:56 2019 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 22 Jan 2019 10:44:56 +0000 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: References: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> <07582b62-ccdf-97c8-5bd9-f441b488fa03@bell-sw.com> <79558b49-6375-f0d4-1278-f66a4f470b13@redhat.com> <75d28ca7-9e80-4bd4-11a6-c858048e4380@bell-sw.com> Message-ID: <3950542d-bc5a-3937-27e8-8b48d6f6e875@redhat.com> On 22/01/2019 10:32, Pengfei Li (Arm Technology China) wrote: > I (not a reviewer) tested your single line fix and it looks ok to > me. > > Also I bump the priority of the JBS > (https://bugs.openjdk.java.net/browse/JDK-8215792) to P2 and hope the > fix could be in JDK 12. (The door of JDK 12 will be closed soon) That's not really needed while we are in Rampdown Phase 1. However, I agree that P2 is actually appropriate for this bug. The fix can be pushed to the jdk12 repo. However, the bug needs to have its fix version set accordingly (which I have just done). regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From rkennke at redhat.com Tue Jan 22 10:48:50 2019 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 22 Jan 2019 11:48:50 +0100 Subject: RFR [12] 8217467 (XS): Access barriers are missing in C2 intrinsic for Base64 In-Reply-To: <99f5f7c8-0747-2cae-f8e5-e7d05358efcf@redhat.com> References: <2cf6bd9e-d73f-c4b3-725d-aba3f5ed08c3@redhat.com> <99f5f7c8-0747-2cae-f8e5-e7d05358efcf@redhat.com> Message-ID: Looks good. Thanks! Roman > (correct title) > > On 1/22/19 11:27 AM, Aleksey Shipilev wrote: >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8217467 >> >> Fix: >> http://cr.openjdk.java.net/~shade/8217467/webrev.01/ >> >> This is found and verified by Shenandoah CTW tests that verifies barrier placement. Base64 intrinsic >> is new, and only enabled on modern hardware (I think you need AVX512). I'd like to push this fix to >> jdk12. >> >> Testing: Shenandoah CTW tests, hotspot tier1 (includes compiler/intrinsics/base64), jdk-submit12 >> (running) >> >> Thanks, >> -Aleksey >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From tobias.hartmann at oracle.com Tue Jan 22 10:54:19 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 22 Jan 2019 11:54:19 +0100 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: <3950542d-bc5a-3937-27e8-8b48d6f6e875@redhat.com> References: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> <07582b62-ccdf-97c8-5bd9-f441b488fa03@bell-sw.com> <79558b49-6375-f0d4-1278-f66a4f470b13@redhat.com> <75d28ca7-9e80-4bd4-11a6-c858048e4380@bell-sw.com> <3950542d-bc5a-3937-27e8-8b48d6f6e875@redhat.com> Message-ID: <59adabc9-3b9f-c96b-3220-bed4998c4e7b@oracle.com> Hi, On 22.01.19 11:44, Andrew Dinn wrote: > That's not really needed while we are in Rampdown Phase 1. However, I > agree that P2 is actually appropriate for this bug. Actually, it *is* required because we are in Rampdown Phase 2 now: https://mail.openjdk.java.net/pipermail/jdk-dev/2019-January/002537.html and therefore only P1 and P2 bugs with approval can be integrated: http://openjdk.java.net/jeps/3 > The fix can be pushed to the jdk12 repo. However, the bug needs to have > its fix version set accordingly (which I have just done). Yes and approval is required! http://openjdk.java.net/jeps/3#Fix-Request-Process Thanks, Tobias From Pengfei.Li at arm.com Tue Jan 22 11:03:20 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Tue, 22 Jan 2019 11:03:20 +0000 Subject: [aarch64-port-dev ] RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics In-Reply-To: References: Message-ID: Hi Andrew Haley, > Fair enough; it does look like an improvement. However, please show us the > actual numbers, especially at small sizes. Also, how much is the > Adler32 checksum actually used? Is it something we care about? I updated my JMH case (http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java) with some small sizes added. Please see the results below. Before patch: Benchmark (count) Mode Cnt Score Error Units TestAdler32.testAdler32Update 64 avgt 15 0.047 ? 0.001 us/op TestAdler32.testAdler32Update 128 avgt 15 0.084 ? 0.001 us/op TestAdler32.testAdler32Update 256 avgt 15 0.157 ? 0.001 us/op TestAdler32.testAdler32Update 512 avgt 15 0.313 ? 0.001 us/op TestAdler32.testAdler32Update 1024 avgt 15 0.607 ? 0.002 us/op TestAdler32.testAdler32Update 2048 avgt 15 1.195 ? 0.003 us/op TestAdler32.testAdler32Update 4096 avgt 15 2.371 ? 0.005 us/op TestAdler32.testAdler32Update 8192 avgt 15 4.936 ? 0.018 us/op TestAdler32.testAdler32Update 16384 avgt 15 9.729 ? 0.116 us/op TestAdler32.testAdler32Update 32768 avgt 15 19.332 ? 0.081 us/op TestAdler32.testAdler32Update 65536 avgt 15 38.180 ? 0.098 us/op After patch: Benchmark (count) Mode Cnt Score Error Units TestAdler32.testAdler32Update 64 avgt 15 0.026 ? 0.001 us/op TestAdler32.testAdler32Update 128 avgt 15 0.039 ? 0.001 us/op TestAdler32.testAdler32Update 256 avgt 15 0.067 ? 0.001 us/op TestAdler32.testAdler32Update 512 avgt 15 0.124 ? 0.001 us/op TestAdler32.testAdler32Update 1024 avgt 15 0.232 ? 0.001 us/op TestAdler32.testAdler32Update 2048 avgt 15 0.445 ? 0.001 us/op TestAdler32.testAdler32Update 4096 avgt 15 0.873 ? 0.002 us/op TestAdler32.testAdler32Update 8192 avgt 15 1.770 ? 0.010 us/op TestAdler32.testAdler32Update 16384 avgt 15 3.658 ? 0.101 us/op TestAdler32.testAdler32Update 32768 avgt 15 7.221 ? 0.043 us/op TestAdler32.testAdler32Update 65536 avgt 15 14.353 ? 0.035 us/op Dmitry Chuyko has just said it's used in Hadoop HDFS. I either don't know if any other applications, besides zlib, are using Adler-32. -- Thanks, Pengfei From adinn at redhat.com Tue Jan 22 11:10:36 2019 From: adinn at redhat.com (Andrew Dinn) Date: Tue, 22 Jan 2019 11:10:36 +0000 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: <59adabc9-3b9f-c96b-3220-bed4998c4e7b@oracle.com> References: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> <07582b62-ccdf-97c8-5bd9-f441b488fa03@bell-sw.com> <79558b49-6375-f0d4-1278-f66a4f470b13@redhat.com> <75d28ca7-9e80-4bd4-11a6-c858048e4380@bell-sw.com> <3950542d-bc5a-3937-27e8-8b48d6f6e875@redhat.com> <59adabc9-3b9f-c96b-3220-bed4998c4e7b@oracle.com> Message-ID: On 22/01/2019 10:54, Tobias Hartmann wrote: > On 22.01.19 11:44, Andrew Dinn wrote: >> That's not really needed while we are in Rampdown Phase 1. However, I >> agree that P2 is actually appropriate for this bug. > > Actually, it *is* required because we are in Rampdown Phase 2 now: > https://mail.openjdk.java.net/pipermail/jdk-dev/2019-January/002537.html Oops, yes. Sorry. I just found that post in my Trash folder! > and therefore only P1 and P2 bugs with approval can be integrated: > http://openjdk.java.net/jeps/3 > >> The fix can be pushed to the jdk12 repo. However, the bug needs to have >> its fix version set accordingly (which I have just done). > > Yes and approval is required! > http://openjdk.java.net/jeps/3#Fix-Request-Process Hmm, ok. Well, although this is definitely a bug I don't think it is critical as it happens in relatively rare circumstances. So, I think it needs pushing to jdk13 and then backporting to jdk12 after initial release. I have reset the fix version to jdk13. regards, Andrew Dinn ----------- Senior Principal Software Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From dmitrij.pochepko at bell-sw.com Tue Jan 22 11:35:28 2019 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Tue, 22 Jan 2019 14:35:28 +0300 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: References: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> <07582b62-ccdf-97c8-5bd9-f441b488fa03@bell-sw.com> <79558b49-6375-f0d4-1278-f66a4f470b13@redhat.com> <75d28ca7-9e80-4bd4-11a6-c858048e4380@bell-sw.com> <3950542d-bc5a-3937-27e8-8b48d6f6e875@redhat.com> <59adabc9-3b9f-c96b-3220-bed4998c4e7b@oracle.com> Message-ID: <39337100-90d6-9f8f-24f3-1a57c3b50cee@bell-sw.com> On 22/01/2019 2:10 PM, Andrew Dinn wrote: > On 22/01/2019 10:54, Tobias Hartmann wrote: >> On 22.01.19 11:44, Andrew Dinn wrote: >>> That's not really needed while we are in Rampdown Phase 1. However, I >>> agree that P2 is actually appropriate for this bug. >> Actually, it *is* required because we are in Rampdown Phase 2 now: >> https://mail.openjdk.java.net/pipermail/jdk-dev/2019-January/002537.html > Oops, yes. Sorry. I just found that post in my Trash folder! > >> and therefore only P1 and P2 bugs with approval can be integrated: >> http://openjdk.java.net/jeps/3 >> >>> The fix can be pushed to the jdk12 repo. However, the bug needs to have >>> its fix version set accordingly (which I have just done). >> Yes and approval is required! >> http://openjdk.java.net/jeps/3#Fix-Request-Process > Hmm, ok. Well, although this is definitely a bug I don't think it is > critical as it happens in relatively rare circumstances. So, I think it > needs pushing to jdk13 and then backporting to jdk12 after initial > release. I have reset the fix version to jdk13. I'll send updated webrev with tests and updated documentation (since I already has it and it doesn't affect code) hopefully in a few hours after final polishing. Thanks, Dmitrij > > regards, > > > Andrew Dinn > ----------- > Senior Principal Software Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From felix.yang at huawei.com Tue Jan 22 12:03:14 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 22 Jan 2019 12:03:14 +0000 Subject: [RFR] 8217359: C2 compiler triggers SIGSEGV after tranformation in ConvI2LNode::Ideal In-Reply-To: References: <32982b31-3a91-58fb-a6b8-b1cd9f7cdb41@oracle.com> Message-ID: Hi, I have updated the JBS accordingly, requesting approval for integration into JDK 12. May I have another reviewer please? Thanks for your help, Felix > Hi Felix, > > this looks good to me, thanks for adding the test! > > A second review would be good. In the meantime, please request approval for > integration into JDK 12 > according to: > http://openjdk.java.net/jeps/3#Fix-Request-Process > > Thanks, > Tobias > > On 22.01.19 02:17, Yangfei (Felix) wrote: > > Hi, > > > > Thanks for reviewing. The regression test is added. > > New webrev: http://cr.openjdk.java.net/~fyang/8217359/webrev.01/ > > This is committed to the submit repo: > http://hg.openjdk.java.net/jdk/submit/rev/7345adfbc913 > > > > The email I got shows that it passed the Oralce internal tests: > > ================================================= > > Build Details: 2019-01-21-1210078.felix.yang.source > > 0 Failed Tests > > Mach5 Tasks Results Summary > > ? EXECUTED_WITH_FAILURE: 0 > > ? NA: 0 > > ? KILLED: 0 > > ? UNABLE_TO_RUN: 0 > > ? PASSED: 76 > > ? FAILED: 0 > > ================================================= > > > > OK to push? > > > > Thanks for your help, > > Felix > > > >> > >> Hi Felix, > >> > >> Could you please add the regression test as jtreg test? > >> > >> Otherwise, the fix looks reasonable to me. Nice analysis! > >> > >> Thanks, > >> Tobias > > From rwestrel at redhat.com Tue Jan 22 14:01:05 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 22 Jan 2019 15:01:05 +0100 Subject: RFR [12] 8217467 (XS): Access barriers are missing in C2 intrinsic for Base64 In-Reply-To: <99f5f7c8-0747-2cae-f8e5-e7d05358efcf@redhat.com> References: <2cf6bd9e-d73f-c4b3-725d-aba3f5ed08c3@redhat.com> <99f5f7c8-0747-2cae-f8e5-e7d05358efcf@redhat.com> Message-ID: <874la094gu.fsf@redhat.com> >> http://cr.openjdk.java.net/~shade/8217467/webrev.01/ Looks good to me too. Roland. From claes.redestad at oracle.com Tue Jan 22 16:06:33 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 22 Jan 2019 17:06:33 +0100 Subject: RFR: 8217519: Improve RegMask population count calculation Message-ID: Hi, this patch extract the population count used in RegMask::Size() to a utility method in share/utilities/population_count.hpp, as well as adds a test that verifies this produces the same results as the existing lookup table implementation. Bug: https://bugs.openjdk.java.net/browse/JDK-8217519 Webrev: http://cr.openjdk.java.net/~redestad/8217519/open.00/ This reduces instructions retired in RegMask::Size() by 50-60% in some tests and profiles, which equates to a speedup of C2 by ~5% total. This improves startup marginally in my tests. Compiler intrinsics (such as gcc's __builtin_popcount()) would be appealing, but that actually gives worse performance than this patch (on current build configurations/setups available to me). Testing: tier1-3 (ongoing, previous increments of the patch without the gtest has been thoroughly tested) Thanks! /Claes From Alan.Bateman at oracle.com Tue Jan 22 16:12:22 2019 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Tue, 22 Jan 2019 16:12:22 +0000 Subject: RFR: 8207851 JEP Draft: Support ByteBuffer mapped over non-volatile memory In-Reply-To: <708555d0-d3e5-2d2c-f69d-16f76a83f66a@gmail.com> References: <07d4d0fe-3a2a-414c-74ac-69e8527785d9@redhat.com> <09c92b1a-c6da-e16b-bb68-553876e8a6ea@oracle.com> <2a0a385d-81ee-df66-f147-e4dd9aa5b72e@oracle.com> <8b2ab749-20f1-8dd3-3cc7-64db5d45bc7d@redhat.com> <0aae37aa-7797-fde5-63d5-96c8eb961183@oracle.com> <86a1988a-a8d2-b6af-0985-11a94d6d76a5@redhat.com> <69510788-52e6-815b-1ed7-a6f4886d0398@oracle.com> <3e3c4f7d-049e-4aec-c165-f2664e7c98ef@redhat.com> <34cfc530-8517-ac1a-0c04-446dc3dc2436@oracle.com> <21ef0e11-3f3d-e9a4-5dc6-898d4ac18efa@redhat.com> <708555d0-d3e5-2d2c-f69d-16f76a83f66a@gmail.com> Message-ID: <5c8a7e85-bdb4-61f8-54ed-75689d0fcf16@oracle.com> On 18/01/2019 14:28, Peter Levart wrote: > > ...unless you actually want users to construct their own MapMode(s), > like you mentioned is the case with FileChannel.open() and > FileAttribute interface. But there this makes sense because the > backend (FileSystem) is also pluggable, so users can define their own > FileSystem implementations that consume their own FileAttribute(s)... > > Are you proposing to add an spi for MappedByteBuffer's here? That > would be an overkill for this feature, I think... No, we definitely don't want to go there as buffers are closed abstraction. Instead, this is just about allowing the JDK to support additional map modes beyond those specified by FileChannel.map. If you create your own MapMode and call the map method with it then it will be rejected, probably UOE. With the suggestion, a JDK-specific module would define READ_WRITE_SYNC and you could pass that mode to FileChannel.map. There's a bit of plumbing needed make that work but there are examples of this already (e.g. socket options and file open options). -Alan. From tobias.hartmann at oracle.com Tue Jan 22 16:23:37 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 22 Jan 2019 17:23:37 +0100 Subject: RFR: 8217519: Improve RegMask population count calculation In-Reply-To: References: Message-ID: <31198307-0db5-dd60-ac55-c0a79c35b064@oracle.com> Hi Claes, this looks good to me. Best regards, Tobias On 22.01.19 17:06, Claes Redestad wrote: > Hi, > > this patch extract the population count used in RegMask::Size() to a > utility method in share/utilities/population_count.hpp, as well as > adds a test that verifies this produces the same results as the existing > lookup table implementation. > > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8217519 > Webrev: http://cr.openjdk.java.net/~redestad/8217519/open.00/ > > This reduces instructions retired in RegMask::Size() by 50-60% in some > tests and profiles, which equates to a speedup of C2 by ~5% total. This > improves startup marginally in my tests. > > Compiler intrinsics (such as gcc's __builtin_popcount()) would be > appealing, but that actually gives worse performance than this patch (on > current build configurations/setups available to me). > > Testing: tier1-3 (ongoing, previous increments of the patch without > the gtest has been thoroughly tested) > > Thanks! > > /Claes From claes.redestad at oracle.com Tue Jan 22 16:28:49 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 22 Jan 2019 17:28:49 +0100 Subject: RFR: 8217519: Improve RegMask population count calculation In-Reply-To: <31198307-0db5-dd60-ac55-c0a79c35b064@oracle.com> References: <31198307-0db5-dd60-ac55-c0a79c35b064@oracle.com> Message-ID: <5772e468-e607-24a8-895c-58c222bd2b11@oracle.com> Tobias, thanks for reviewing (and sanity checking this and a few earlier versions)! On 2019-01-22 17:23, Tobias Hartmann wrote: > Hi Claes, > > this looks good to me From vladimir.kozlov at oracle.com Tue Jan 22 16:57:11 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 22 Jan 2019 08:57:11 -0800 Subject: [13] RFR(S): 8209951 : Problematic sparc intrinsic: com.sun.crypto.provider.CipherBlockChaining In-Reply-To: <9d49a648-50e6-30cd-5705-2997f658d8e8@oracle.com> References: <9d49a648-50e6-30cd-5705-2997f658d8e8@oracle.com> Message-ID: <794dcbd2-60e0-e9f4-f9b5-f789e45d373a@oracle.com> Yes, it is good. Thanks, Vladimir On 1/22/19 12:22 AM, Tobias Hartmann wrote: > Hi Fairoz, > > this looks good to me. > > Thanks, > Tobias > > On 22.01.19 04:35, Fairoz Matte wrote: >> Hi, >> >> Please review the following patch, >> JBS bug - https://bugs.openjdk.java.net/browse/JDK-8209951 >> Webrev - http://cr.openjdk.java.net/~fmatte/8209951/webrev.00/ >> >> During the call to assembled stub code generate_cipherBlockChaining_decryptAESCrypt_Parallel() >> there was reference to G6 register used for temporary storage of F50, >> as G6 is not saved on stack it was resulting in garbage during retrieval. >> >> Solution is to use unused local register (L6) for temporary storage and retrieval of F50. >> >> Thanks, >> Fairoz >> From aph at redhat.com Tue Jan 22 16:58:52 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 22 Jan 2019 16:58:52 +0000 Subject: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered In-Reply-To: References: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> <4a09e8b7-9990-aa66-0afb-bf4e41cab831@arm.com> <79118967-c5b6-ca5c-7c6b-4adb80a4ed60@arm.com> <62b9e1c3-7c76-c3a2-0a8e-4e3ce4f79d9b@arm.com> Message-ID: <83dc55db-5e4e-5510-a172-efaeac351593@redhat.com> On 1/22/19 9:36 AM, Andrew Haley wrote: > Please see the updated webrev to use cmpxchg in both the lock and unlock > functions: > > http://cr.openjdk.java.net/~ngasson/8217368/webrev.1/ OK. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Tue Jan 22 17:03:23 2019 From: aph at redhat.com (Andrew Haley) Date: Tue, 22 Jan 2019 17:03:23 +0000 Subject: [aarch64-port-dev ] RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics In-Reply-To: References: Message-ID: <771c5094-aacb-d52c-437f-29aaf5f8f01a@redhat.com> On 1/21/19 10:53 AM, Pengfei Li (Arm Technology China) wrote: > Webrev: http://cr.openjdk.java.net/~pli/rfr/8216259/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8216259 The patch checks out fine, but there's one thing I'd like you to do. Please don't repeat this block of code: 3317 // Below is a vectorized implementation of updating s1 and s2 for 16 bytes. 3318 // We use b1, b2, ..., b16 to denote the 16 bytes loaded in each iteration. 3319 // In non-vectorized code, we update s1 and s2 as: 3320 // s1 <- s1 + b1 3321 // s2 <- s2 + s1 3322 // s1 <- s1 + b2 3323 // s2 <- s2 + b1 3324 // ... 3325 // s1 <- s1 + b16 3326 // s2 <- s2 + s1 3327 // Putting above assignments together, we have: 3328 // s1_new = s1 + b1 + b2 + ... + b16 3329 // s2_new = s2 + (s1 + b1) + (s1 + b1 + b2) + ... + (s1 + b1 + b2 + ... + b16) 3330 // = s2 + s1 * 16 + (b1 * 16 + b2 * 15 + ... + b16 * 1) 3331 // = s2 + s1 * 16 + (b1, b2, ... b16) dot (16, 15, ... 1) 3332 __ ld1(vbytes, __ T16B, Address(__ post(buff, 16))); 3333 3334 // s2 = s2 + s1 * 16 3335 __ add(s2, s2, s1, Assembler::LSL, 4); 3336 3337 // vs1acc = b1 + b2 + b3 + ... + b16 3338 // vs2acc = (b1 * 16) + (b2 * 15) + (b3 * 14) + ... + (b16 * 1) 3339 __ umullv(vs2acc, __ T8B, vtable, vbytes); 3340 __ umlalv(vs2acc, __ T16B, vtable, vbytes); 3341 __ uaddlv(vs1acc, __ T16B, vbytes); 3342 __ uaddlv(vs2acc, __ T8H, vs2acc); 3343 3344 // s1 = s1 + vs1acc, s2 = s2 + vs2acc 3345 __ fmovd(temp0, vs1acc); 3346 __ fmovd(temp1, vs2acc); 3347 __ add(s1, s1, temp0); 3348 __ add(s2, s2, temp1); 3349 3350 __ subs(count, count, 16); 3351 __ br(Assembler::HS, L_nmax_loop); Instead, please put it into a function (e.g. updateBytesCRC32C_inner) and call it from updateBytesCRC32C. There's no point writing all this stuff out twice. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Tue Jan 22 17:04:08 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 22 Jan 2019 09:04:08 -0800 Subject: [RFR] 8217359: C2 compiler triggers SIGSEGV after tranformation in ConvI2LNode::Ideal In-Reply-To: References: <32982b31-3a91-58fb-a6b8-b1cd9f7cdb41@oracle.com> Message-ID: <35e45132-2187-16c8-22fb-17e61a117941@oracle.com> Changes are good. I approved the fix for jdk12 as HotSpot group lead. Thanks, Vladimir On 1/22/19 4:03 AM, Yangfei (Felix) wrote: > Hi, > > I have updated the JBS accordingly, requesting approval for integration into JDK 12. > May I have another reviewer please? > > Thanks for your help, > Felix > > >> Hi Felix, >> >> this looks good to me, thanks for adding the test! >> >> A second review would be good. In the meantime, please request approval for >> integration into JDK 12 >> according to: >> http://openjdk.java.net/jeps/3#Fix-Request-Process >> >> Thanks, >> Tobias >> >> On 22.01.19 02:17, Yangfei (Felix) wrote: >>> Hi, >>> >>> Thanks for reviewing. The regression test is added. >>> New webrev: http://cr.openjdk.java.net/~fyang/8217359/webrev.01/ >>> This is committed to the submit repo: >> http://hg.openjdk.java.net/jdk/submit/rev/7345adfbc913 >>> >>> The email I got shows that it passed the Oralce internal tests: >>> ================================================= >>> Build Details: 2019-01-21-1210078.felix.yang.source >>> 0 Failed Tests >>> Mach5 Tasks Results Summary >>> ? EXECUTED_WITH_FAILURE: 0 >>> ? NA: 0 >>> ? KILLED: 0 >>> ? UNABLE_TO_RUN: 0 >>> ? PASSED: 76 >>> ? FAILED: 0 >>> ================================================= >>> >>> OK to push? >>> >>> Thanks for your help, >>> Felix >>> >>>> >>>> Hi Felix, >>>> >>>> Could you please add the regression test as jtreg test? >>>> >>>> Otherwise, the fix looks reasonable to me. Nice analysis! >>>> >>>> Thanks, >>>> Tobias >>> From vladimir.kozlov at oracle.com Tue Jan 22 17:30:49 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 22 Jan 2019 09:30:49 -0800 Subject: RFR: 8217519: Improve RegMask population count calculation In-Reply-To: <31198307-0db5-dd60-ac55-c0a79c35b064@oracle.com> References: <31198307-0db5-dd60-ac55-c0a79c35b064@oracle.com> Message-ID: <16301529-9bf3-9c92-a15a-251a4cbaa553@oracle.com> Yes, this is good. Thanks, Vladimir On 1/22/19 8:23 AM, Tobias Hartmann wrote: > Hi Claes, > > this looks good to me. > > Best regards, > Tobias > > On 22.01.19 17:06, Claes Redestad wrote: >> Hi, >> >> this patch extract the population count used in RegMask::Size() to a >> utility method in share/utilities/population_count.hpp, as well as >> adds a test that verifies this produces the same results as the existing >> lookup table implementation. >> >> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8217519 >> Webrev: http://cr.openjdk.java.net/~redestad/8217519/open.00/ >> >> This reduces instructions retired in RegMask::Size() by 50-60% in some >> tests and profiles, which equates to a speedup of C2 by ~5% total. This >> improves startup marginally in my tests. >> >> Compiler intrinsics (such as gcc's __builtin_popcount()) would be >> appealing, but that actually gives worse performance than this patch (on >> current build configurations/setups available to me). >> >> Testing: tier1-3 (ongoing, previous increments of the patch without >> the gtest has been thoroughly tested) >> >> Thanks! >> >> /Claes From dmitrij.pochepko at bell-sw.com Tue Jan 22 18:35:12 2019 From: dmitrij.pochepko at bell-sw.com (Dmitrij Pochepko) Date: Tue, 22 Jan 2019 21:35:12 +0300 Subject: RFR(S): 8215792: AArch64: String.indexOf generates incorrect result In-Reply-To: <39337100-90d6-9f8f-24f3-1a57c3b50cee@bell-sw.com> References: <32345571546521566@sas2-ce04c18c415c.qloud-c.yandex.net> <07582b62-ccdf-97c8-5bd9-f441b488fa03@bell-sw.com> <79558b49-6375-f0d4-1278-f66a4f470b13@redhat.com> <75d28ca7-9e80-4bd4-11a6-c858048e4380@bell-sw.com> <3950542d-bc5a-3937-27e8-8b48d6f6e875@redhat.com> <59adabc9-3b9f-c96b-3220-bed4998c4e7b@oracle.com> <39337100-90d6-9f8f-24f3-1a57c3b50cee@bell-sw.com> Message-ID: <56d93deb-252b-38f7-0cda-5b365dc3751e@bell-sw.com> Hi, please take a look at webrev.02: http://cr.openjdk.java.net/~dpochepk/8215792/webrev.02/ webrev.02 has more aarch64 tests and documentation added. Since tests are specifically for aarch64 implementation I've set requires tag to run it on aarch64 only. I ran these tests on linux-aarch64 machine to ensure? everything is fine and on linux-amd64 to ensure these tests are filtered out there. I'm going to add such documentation and tests for other intrinsics as well as separate issues. This patch is for jdk_jdk. I think it should be backported then into jdk12 and jdk11u Thanks, Dmitrij On 22.01.2019 14:35, Dmitrij Pochepko wrote: > > On 22/01/2019 2:10 PM, Andrew Dinn wrote: >> On 22/01/2019 10:54, Tobias Hartmann wrote: >>> On 22.01.19 11:44, Andrew Dinn wrote: >>>> That's not really needed while we are in Rampdown Phase 1. However, I >>>> agree that P2 is actually appropriate for this bug. >>> Actually, it *is* required because we are in Rampdown Phase 2 now: >>> https://mail.openjdk.java.net/pipermail/jdk-dev/2019-January/002537.html >>> >> Oops, yes. Sorry. I just found that post in my Trash folder! >> >>> and therefore only P1 and P2 bugs with approval can be integrated: >>> http://openjdk.java.net/jeps/3 >>> >>>> The fix can be pushed to the jdk12 repo. However, the bug needs to >>>> have >>>> its fix version set accordingly (which I have just done). >>> Yes and approval is required! >>> http://openjdk.java.net/jeps/3#Fix-Request-Process >> Hmm, ok. Well, although this is definitely a bug I don't think it is >> critical as it happens in relatively rare circumstances. So, I think it >> needs pushing to jdk13 and then backporting to jdk12 after initial >> release. I have reset the fix version to jdk13. > > I'll send updated webrev with tests and updated documentation (since I > already has it and it doesn't affect code) hopefully in a few hours > after final polishing. > > > Thanks, > > Dmitrij > >> >> regards, >> >> >> Andrew Dinn >> ----------- >> Senior Principal Software Engineer >> Red Hat UK Ltd >> Registered in England and Wales under Company Registration No. 03798903 >> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander From derekw at marvell.com Tue Jan 22 18:34:54 2019 From: derekw at marvell.com (Derek White) Date: Tue, 22 Jan 2019 18:34:54 +0000 Subject: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive stack locking optimisation not triggered In-Reply-To: <62b9e1c3-7c76-c3a2-0a8e-4e3ce4f79d9b@arm.com> References: <895ba862-6c8e-486a-2eff-99057d692074@arm.com> <4a09e8b7-9990-aa66-0afb-bf4e41cab831@arm.com> <79118967-c5b6-ca5c-7c6b-4adb80a4ed60@arm.com> <62b9e1c3-7c76-c3a2-0a8e-4e3ce4f79d9b@arm.com> Message-ID: Looks great! Thanks Nick, - Derek > -----Original Message----- > From: aarch64-port-dev On > Behalf Of Nick Gasson (Arm Technology China) > Sent: Tuesday, January 22, 2019 4:10 AM > To: Andrew Haley ; hotspot-compiler- > dev at openjdk.java.net compiler > Cc: nd ; aarch64-port-dev at openjdk.java.net > Subject: [EXT] Re: [aarch64-port-dev ] RFR: 8217368: AArch64: C2 recursive > stack locking optimisation not triggered > > External Email > > ---------------------------------------------------------------------- > Hi, > > On 21/01/2019 20:27, Andrew Haley wrote: > > > > OK, if that's your position: you're writing the patch. Using cmpxhg > > everywhere will make that rather twisted code much easier to read. > > > > Please see the updated webrev to use cmpxchg in both the lock and unlock > functions: > > http://cr.openjdk.java.net/~ngasson/8217368/webrev.1/ > > Also includes Derek's cleanup suggestions (although some of them are not > applicable now). > > Testing I've done on this: > > * Ran jtreg with assertions enabled (+UseLSE) > > * Ran jcstress with both +UseLSE and -UseLSE > > * Ran the JMH LockUnlock benchmarks with -UseBiasedLocking to check for > performance regressions. > > The directory below contains the the generated assembly from each webrev > and current hg tip for this simple method: > > http://cr.openjdk.java.net/~ngasson/8217368/generated/ > > private Object obj = new Object(); > public int x; > > private void incX() { > synchronized (obj) { > x++; > } > } > > The output of webrev.1 looks OK to me. Any other suggestions of things to > test? > > Thanks, > Nick From vladimir.x.ivanov at oracle.com Tue Jan 22 19:05:46 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 22 Jan 2019 11:05:46 -0800 Subject: [13] RFR (XS): 8202952: C2: Unexpected dead nodes after matching Message-ID: http://cr.openjdk.java.net/~vlivanov/8202952/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8202952 The crash happens when PhaseCFG encounters a dead MachNode in the graph. The problematic node is a leftover from matching of an instruction with a duplicated memory operand (sarI_mem_CL [1] in that particular case). Address has the following shape [2]: AddP (AddP DecodeN (LShiftL ConvI2L ConI)) ConL It could be subsumed into complex addressing expression, but the constant is too large (doesn't fit into immL32). So, matcher has to compute inner address expression separately and put it into a register. Since memory operand is duplicated, 2 copies are materialized during matching, but as part of ::Expand() one of the copies is eliminated, thus leaving a dead mach node in the IR (for the address expression). The fix is to adjust Matcher::clone_address_expressions() to avoid cloning inner AddP when constant offset is too large. Testing: hs-precheckin-comp, hs-tier1, hs-tier2 Best regards, Vladimir Ivanov [1] instruct sarI_mem_CL(memory dst, rcx_RegI shift, rFlagsReg cr) %{ match(Set dst (StoreI dst (RShiftI (LoadI dst) shift))); [2] o347 AddP === _ o2181 o1768 o1769 [[o349 o371 ]] o1768 AddP === _ o2181 o2181 o1765 [[o347 ]] o2181 DecodeN === _ o287 [[o1768 o1768 o327 o347 o327 ]] #int[int:>=0]:NotNull:exact * o1765 LShiftL === _ o1761 o60 [[o1768 ]] o1761 ConvI2L === _ o1741 [[o1765 ]] #long:maxint-51..maxint-48 o60 ConI === o0 [[o61 o1765 o1434 o2013 o1631 o2017 o1808 60 ]] #int:2 o1769 ConL === o0 [[o347 ]] #long:-8589932784 From vladimir.kozlov at oracle.com Tue Jan 22 19:56:03 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 22 Jan 2019 11:56:03 -0800 Subject: RFR [12] 8217467 (XS): Access barriers are missing in C2 intrinsic for Base64 In-Reply-To: References: <2cf6bd9e-d73f-c4b3-725d-aba3f5ed08c3@redhat.com> <99f5f7c8-0747-2cae-f8e5-e7d05358efcf@redhat.com> Message-ID: <42b2473e-50d4-a438-ad47-f6c3e216ae07@oracle.com> Yes, changes are good. I approved it for push into JDK 12. Thanks, Vladimir On 1/22/19 2:48 AM, Roman Kennke wrote: > Looks good. Thanks! > > Roman > > >> (correct title) >> >> On 1/22/19 11:27 AM, Aleksey Shipilev wrote: >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8217467 >>> >>> Fix: >>> http://cr.openjdk.java.net/~shade/8217467/webrev.01/ >>> >>> This is found and verified by Shenandoah CTW tests that verifies barrier placement. Base64 intrinsic >>> is new, and only enabled on modern hardware (I think you need AVX512). I'd like to push this fix to >>> jdk12. >>> >>> Testing: Shenandoah CTW tests, hotspot tier1 (includes compiler/intrinsics/base64), jdk-submit12 >>> (running) >>> >>> Thanks, >>> -Aleksey >>> >> >> > From vladimir.kozlov at oracle.com Tue Jan 22 19:54:11 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 22 Jan 2019 11:54:11 -0800 Subject: [13] RFR (XS): 8202952: C2: Unexpected dead nodes after matching In-Reply-To: References: Message-ID: <9bf3ac2c-5881-576c-cc64-917cec246f8f@oracle.com> The fix is different from what we discussed. Can you explain how it helps? Thanks, Vladimir K On 1/22/19 11:05 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8202952/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8202952 > > The crash happens when PhaseCFG encounters a dead MachNode in the graph. > The problematic node is a leftover from matching of an instruction with a duplicated memory operand (sarI_mem_CL [1] in > that particular case). > > Address has the following shape [2]: > ? AddP (AddP DecodeN (LShiftL ConvI2L ConI)) ConL > > It could be subsumed into complex addressing expression, but the constant is too large (doesn't fit into immL32). So, > matcher has to compute inner address expression separately and put it into a register. > > Since memory operand is duplicated, 2 copies are materialized during matching, but as part of ::Expand() one of the > copies is eliminated, thus leaving a dead mach node in the IR (for the address expression). > > The fix is to adjust Matcher::clone_address_expressions() to avoid cloning inner AddP when constant offset is too large. > > Testing: hs-precheckin-comp, hs-tier1, hs-tier2 > > Best regards, > Vladimir Ivanov > > [1] instruct sarI_mem_CL(memory dst, rcx_RegI shift, rFlagsReg cr) > %{ > ? match(Set dst (StoreI dst (RShiftI (LoadI dst) shift))); > > > [2] > ?o347 AddP? === _ o2181 o1768 o1769? [[o349 o371 ]] > ??? o1768 AddP? === _ o2181 o2181 o1765? [[o347 ]] > ??????? o2181 DecodeN === _ o287? [[o1768 o1768 o327 o347 o327 ]] #int[int:>=0]:NotNull:exact * > ??????? o1765 LShiftL === _ o1761 o60? [[o1768 ]] > ??????????? o1761 ConvI2L === _ o1741? [[o1765 ]] #long:maxint-51..maxint-48 > ??????????? o60?? ConI? === o0? [[o61 o1765 o1434 o2013 o1631 o2017 o1808? 60 ]]? #int:2 > ??? o1769 ConL? === o0? [[o347 ]]? #long:-8589932784 From vladimir.x.ivanov at oracle.com Tue Jan 22 20:08:22 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 22 Jan 2019 12:08:22 -0800 Subject: [13] RFR (XS): 8202952: C2: Unexpected dead nodes after matching In-Reply-To: <9bf3ac2c-5881-576c-cc64-917cec246f8f@oracle.com> References: <9bf3ac2c-5881-576c-cc64-917cec246f8f@oracle.com> Message-ID: <242b6a41-3db6-2911-1045-1e5eb63ba862@oracle.com> On 22/01/2019 11:54, Vladimir Kozlov wrote: > The fix is different from what we discussed. > Can you explain how it helps? We discussed adding AddP case to _shared_nodes. Proposed fix achieves similar result with a different approach: * Matcher::clone_address_expressions() marks problematic AddP as shared (based on constant value); * DFA() doesn't construct duplicated State for inner AddP (since it's marked as shared); * Matcher doesn't need to materialize duplicated mach nodes, since it matches inner AddP separately; Best regards, Vladimir Ivanov > On 1/22/19 11:05 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8202952/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8202952 >> >> The crash happens when PhaseCFG encounters a dead MachNode in the graph. >> The problematic node is a leftover from matching of an instruction >> with a duplicated memory operand (sarI_mem_CL [1] in that particular >> case). >> >> Address has the following shape [2]: >> ?? AddP (AddP DecodeN (LShiftL ConvI2L ConI)) ConL >> >> It could be subsumed into complex addressing expression, but the >> constant is too large (doesn't fit into immL32). So, matcher has to >> compute inner address expression separately and put it into a register. >> >> Since memory operand is duplicated, 2 copies are materialized during >> matching, but as part of ::Expand() one of the copies is eliminated, >> thus leaving a dead mach node in the IR (for the address expression). >> >> The fix is to adjust Matcher::clone_address_expressions() to avoid >> cloning inner AddP when constant offset is too large. >> >> Testing: hs-precheckin-comp, hs-tier1, hs-tier2 >> >> Best regards, >> Vladimir Ivanov >> >> [1] instruct sarI_mem_CL(memory dst, rcx_RegI shift, rFlagsReg cr) >> %{ >> ?? match(Set dst (StoreI dst (RShiftI (LoadI dst) shift))); >> >> >> [2] >> ??o347 AddP? === _ o2181 o1768 o1769? [[o349 o371 ]] >> ???? o1768 AddP? === _ o2181 o2181 o1765? [[o347 ]] >> ???????? o2181 DecodeN === _ o287? [[o1768 o1768 o327 o347 o327 ]] >> #int[int:>=0]:NotNull:exact * >> ???????? o1765 LShiftL === _ o1761 o60? [[o1768 ]] >> ???????????? o1761 ConvI2L === _ o1741? [[o1765 ]] >> #long:maxint-51..maxint-48 >> ???????????? o60?? ConI? === o0? [[o61 o1765 o1434 o2013 o1631 o2017 >> o1808? 60 ]]? #int:2 >> ???? o1769 ConL? === o0? [[o347 ]]? #long:-8589932784 From shade at redhat.com Tue Jan 22 20:29:46 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 22 Jan 2019 21:29:46 +0100 Subject: RFR [12] 8217467 (XS): Access barriers are missing in C2 intrinsic for Base64 In-Reply-To: <42b2473e-50d4-a438-ad47-f6c3e216ae07@oracle.com> References: <2cf6bd9e-d73f-c4b3-725d-aba3f5ed08c3@redhat.com> <99f5f7c8-0747-2cae-f8e5-e7d05358efcf@redhat.com> <42b2473e-50d4-a438-ad47-f6c3e216ae07@oracle.com> Message-ID: <7b98400f-d363-6c06-4214-4ad934bd9488@redhat.com> Thank you, pushed to jdk/jdk12. -Aleksey On 1/22/19 8:56 PM, Vladimir Kozlov wrote: > Yes, changes are good. I approved it for push into JDK 12. > > Thanks, > Vladimir > > On 1/22/19 2:48 AM, Roman Kennke wrote: >> Looks good. Thanks! >> >> Roman >> >> >>> (correct title) >>> >>> On 1/22/19 11:27 AM, Aleksey Shipilev wrote: >>>> Bug: >>>> ?? https://bugs.openjdk.java.net/browse/JDK-8217467 >>>> >>>> Fix: >>>> ?? http://cr.openjdk.java.net/~shade/8217467/webrev.01/ >>>> >>>> This is found and verified by Shenandoah CTW tests that verifies barrier placement. Base64 >>>> intrinsic >>>> is new, and only enabled on modern hardware (I think you need AVX512). I'd like to push this fix to >>>> jdk12. >>>> >>>> Testing: Shenandoah CTW tests, hotspot tier1 (includes compiler/intrinsics/base64), jdk-submit12 >>>> (running) >>>> >>>> Thanks, >>>> -Aleksey >>>> >>> >>> >> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From vladimir.kozlov at oracle.com Tue Jan 22 20:42:11 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 22 Jan 2019 12:42:11 -0800 Subject: [13] RFR (XS): 8202952: C2: Unexpected dead nodes after matching In-Reply-To: <242b6a41-3db6-2911-1045-1e5eb63ba862@oracle.com> References: <9bf3ac2c-5881-576c-cc64-917cec246f8f@oracle.com> <242b6a41-3db6-2911-1045-1e5eb63ba862@oracle.com> Message-ID: Got it. Good. thanks, Vladimir On 1/22/19 12:08 PM, Vladimir Ivanov wrote: > > On 22/01/2019 11:54, Vladimir Kozlov wrote: >> The fix is different from what we discussed. >> Can you explain how it helps? > > We discussed adding AddP case to _shared_nodes. > > Proposed fix achieves similar result with a different approach: > > ? * Matcher::clone_address_expressions() marks problematic AddP as shared (based on constant value); > > ? * DFA() doesn't construct duplicated State for inner AddP (since it's marked as shared); > > ? * Matcher doesn't need to materialize duplicated mach nodes, since it matches inner AddP separately; > > Best regards, > Vladimir Ivanov > >> On 1/22/19 11:05 AM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8202952/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8202952 >>> >>> The crash happens when PhaseCFG encounters a dead MachNode in the graph. >>> The problematic node is a leftover from matching of an instruction with a duplicated memory operand (sarI_mem_CL [1] >>> in that particular case). >>> >>> Address has the following shape [2]: >>> ?? AddP (AddP DecodeN (LShiftL ConvI2L ConI)) ConL >>> >>> It could be subsumed into complex addressing expression, but the constant is too large (doesn't fit into immL32). So, >>> matcher has to compute inner address expression separately and put it into a register. >>> >>> Since memory operand is duplicated, 2 copies are materialized during matching, but as part of ::Expand() one of the >>> copies is eliminated, thus leaving a dead mach node in the IR (for the address expression). >>> >>> The fix is to adjust Matcher::clone_address_expressions() to avoid cloning inner AddP when constant offset is too large. >>> >>> Testing: hs-precheckin-comp, hs-tier1, hs-tier2 >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] instruct sarI_mem_CL(memory dst, rcx_RegI shift, rFlagsReg cr) >>> %{ >>> ?? match(Set dst (StoreI dst (RShiftI (LoadI dst) shift))); >>> >>> >>> [2] >>> ??o347 AddP? === _ o2181 o1768 o1769? [[o349 o371 ]] >>> ???? o1768 AddP? === _ o2181 o2181 o1765? [[o347 ]] >>> ???????? o2181 DecodeN === _ o287? [[o1768 o1768 o327 o347 o327 ]] #int[int:>=0]:NotNull:exact * >>> ???????? o1765 LShiftL === _ o1761 o60? [[o1768 ]] >>> ???????????? o1761 ConvI2L === _ o1741? [[o1765 ]] #long:maxint-51..maxint-48 >>> ???????????? o60?? ConI? === o0? [[o61 o1765 o1434 o2013 o1631 o2017 o1808? 60 ]]? #int:2 >>> ???? o1769 ConL? === o0? [[o347 ]]? #long:-8589932784 From gromero at linux.vnet.ibm.com Tue Jan 22 22:53:48 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 22 Jan 2019 20:53:48 -0200 Subject: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace In-Reply-To: <8083b8db-c546-29e8-c83a-f06ebd4e624e@linux.vnet.ibm.com> References: <2d4d1747-a83d-5f65-eea3-d982969ae4fd@linux.vnet.ibm.com> <2ac3e91da61b43dcb2d4e45325202264@sap.com> <8083b8db-c546-29e8-c83a-f06ebd4e624e@linux.vnet.ibm.com> Message-ID: <89eeb1bc-950c-9c9f-f49f-aabae7b6637f@linux.vnet.ibm.com> Hi Goetz, On 01/21/2019 09:45 AM, Gustavo Romero wrote: > On 01/21/2019 09:10 AM, Lindenmaier, Goetz wrote: >> also this change looks good. > > Thanks for reviewing it, Goetz! > > I'll ping once the approvals are ok. This change and JDK-8215317 are approved to be pushed to 11u: [0] https://bugs.openjdk.java.net/browse/JDK-8215317 [1] https://bugs.openjdk.java.net/browse/JDK-8213754 Could you please push them at the same time to 11u? Thank you! Best regards, Gustavo > Thank you. > > Regards, > Gustavo > >> Best regards, >> ?? Goetz. >> >>> -----Original Message----- >>> From: Gustavo Romero >>> Sent: Freitag, 18. Januar 2019 16:07 >>> To: hotspot-compiler-dev at openjdk.java.net; Lindenmaier, Goetz >>> ; Doerr, Martin ; >>> vladimir.kozlov at oracle.com; Roger Riggs >>> Cc: Michihiro Horie >>> Subject: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for >>> isDigit/isLowerCase/isUpperCase/isWhitespace >>> >>> Hi, >>> >>> Could the following backport to 11u be reviewed, please? >>> >>> Bug???? : https://bugs.openjdk.java.net/browse/JDK-8213754 >>> Change? : http://hg.openjdk.java.net/jdk/jdk/rev/7384e00d5860 >>> Backport: http://cr.openjdk.java.net/~gromero/8213754_jdk11u/v1/ >>> >>> It adds 4 intrinsics that use instructions introduced by POWER9 in order to >>> speed up methods isDigit, isLowerCase, isUpperCase, and isWhitespace. >>> >>> The change is mostly PPC64-only but it does touch shared code, for >>> instance, in order to adapt the methods in question to be properly >>> intrinsified. It also needs an additional change [0], since one Graal >>> test has to be adapted (a separated RFR to backport [0] was sent to [1]). >>> >>> The change applies almost cleanly: only a small tweak is necessary because >>> the hunk for ppc.ad file relies on some absent text in the 11u code around >>> the change to be applied. That absent text is related to the Superword >>> feature (a non-related feature), which is not backported yet to 11u. >>> >>> This backport was tested on POWER8 and POWER9 and no regressions were >>> observed. >>> >>> This backport was also tested on x86_64 with >>> ./test/hotspot/jtreg/compiler/{c1,c2,intrinsics} plus >>> ./test/hotspot/jtreg/compiler/graalunit (with Graal compiler enabled) with >>> change 8215317 [0] applied and no regressions were observed too. >>> >>> Thank you. >>> >>> Best regards, >>> Gustavo >>> >>> [0] http://cr.openjdk.java.net/~gromero/8215317_jdk11u/v1/ >>> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019- >>> January/032266.html >> > From vladimir.x.ivanov at oracle.com Wed Jan 23 02:14:32 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 22 Jan 2019 18:14:32 -0800 Subject: [13] RFR (XS): 8202952: C2: Unexpected dead nodes after matching In-Reply-To: References: <9bf3ac2c-5881-576c-cc64-917cec246f8f@oracle.com> <242b6a41-3db6-2911-1045-1e5eb63ba862@oracle.com> Message-ID: Thanks, Vladimir. Best regards, Vladimir Ivanov On 22/01/2019 12:42, Vladimir Kozlov wrote: > Got it. Good. > > thanks, > Vladimir > > On 1/22/19 12:08 PM, Vladimir Ivanov wrote: >> >> On 22/01/2019 11:54, Vladimir Kozlov wrote: >>> The fix is different from what we discussed. >>> Can you explain how it helps? >> >> We discussed adding AddP case to _shared_nodes. >> >> Proposed fix achieves similar result with a different approach: >> >> ?? * Matcher::clone_address_expressions() marks problematic AddP as >> shared (based on constant value); >> >> ?? * DFA() doesn't construct duplicated State for inner AddP (since >> it's marked as shared); >> >> ?? * Matcher doesn't need to materialize duplicated mach nodes, since >> it matches inner AddP separately; >> >> Best regards, >> Vladimir Ivanov >> >>> On 1/22/19 11:05 AM, Vladimir Ivanov wrote: >>>> http://cr.openjdk.java.net/~vlivanov/8202952/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8202952 >>>> >>>> The crash happens when PhaseCFG encounters a dead MachNode in the >>>> graph. >>>> The problematic node is a leftover from matching of an instruction >>>> with a duplicated memory operand (sarI_mem_CL [1] in that particular >>>> case). >>>> >>>> Address has the following shape [2]: >>>> ?? AddP (AddP DecodeN (LShiftL ConvI2L ConI)) ConL >>>> >>>> It could be subsumed into complex addressing expression, but the >>>> constant is too large (doesn't fit into immL32). So, matcher has to >>>> compute inner address expression separately and put it into a register. >>>> >>>> Since memory operand is duplicated, 2 copies are materialized during >>>> matching, but as part of ::Expand() one of the copies is eliminated, >>>> thus leaving a dead mach node in the IR (for the address expression). >>>> >>>> The fix is to adjust Matcher::clone_address_expressions() to avoid >>>> cloning inner AddP when constant offset is too large. >>>> >>>> Testing: hs-precheckin-comp, hs-tier1, hs-tier2 >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> [1] instruct sarI_mem_CL(memory dst, rcx_RegI shift, rFlagsReg cr) >>>> %{ >>>> ?? match(Set dst (StoreI dst (RShiftI (LoadI dst) shift))); >>>> >>>> >>>> [2] >>>> ??o347 AddP? === _ o2181 o1768 o1769? [[o349 o371 ]] >>>> ???? o1768 AddP? === _ o2181 o2181 o1765? [[o347 ]] >>>> ???????? o2181 DecodeN === _ o287? [[o1768 o1768 o327 o347 o327 ]] >>>> #int[int:>=0]:NotNull:exact * >>>> ???????? o1765 LShiftL === _ o1761 o60? [[o1768 ]] >>>> ???????????? o1761 ConvI2L === _ o1741? [[o1765 ]] >>>> #long:maxint-51..maxint-48 >>>> ???????????? o60?? ConI? === o0? [[o61 o1765 o1434 o2013 o1631 o2017 >>>> o1808? 60 ]]? #int:2 >>>> ???? o1769 ConL? === o0? [[o347 ]]? #long:-8589932784 From igor.ignatyev at oracle.com Wed Jan 23 02:26:02 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 22 Jan 2019 18:26:02 -0800 Subject: RFR(S) [12] : 8158646 : [jittester] generated tests may not compile by javac Message-ID: <6D91688A-01A0-46E0-A304-9F39E16F574E@oracle.com> http://cr.openjdk.java.net/~iignatyev//8158646/webrev.00/index.html > 64 lines changed: 23 ins; 6 del; 35 mod; Hi all, could you please review this small fix for jit-tester? the bug was caused by TypeList not being fully cleared b/w generation. we only remove classes which starts w/ "Test_", so we don't remove "basic" classes, e.g. Runnable, and don't clean their 'children'. in most cases, this is fine, as each generation will use only its own Test_N_* classes so having Test_M_* (M != N) classes as Runnable's children has no impact besides garbage in memory, however, if we get an error during Test_N generation we will redo generation for the same N, and in such cases, previous children of "basic" classes (read Runnable) cause incompatible types. the fix is to remove "Test_" classes from the children. besides the fix for the bug, the patch also include the following small clean ups: - use DIST_JAR var value instead of 'JAR' string constant in makefile - change default target testbase dir - make sure jaotc is always run w/ X-mixed regardless of "external" vm flags - add -Xcomp to all the generator tests - use tmp directory for class files - check javac error code - optimize getAllParents/getAllChildren to call getAllParents/getAllChildren only if a class hasn't been added yet webrev: http://cr.openjdk.java.net/~iignatyev//8158646/webrev.00/index.html JBS: https://bugs.openjdk.java.net/browse/JDK-8158646 testing: generated 1000 tests, all can be compiled and work fine Thanks, -- Igor From fairoz.matte at oracle.com Wed Jan 23 03:20:09 2019 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Tue, 22 Jan 2019 19:20:09 -0800 (PST) Subject: [13] RFR(S): 8209951 : Problematic sparc intrinsic: com.sun.crypto.provider.CipherBlockChaining In-Reply-To: <794dcbd2-60e0-e9f4-f9b5-f789e45d373a@oracle.com> References: <9d49a648-50e6-30cd-5705-2997f658d8e8@oracle.com> <794dcbd2-60e0-e9f4-f9b5-f789e45d373a@oracle.com> Message-ID: <323b7338-d507-4850-ab53-4a5295d7b62f@default> Thanks Tobias and Vladimir for review. Thanks, Fairoz > -----Original Message----- > From: Vladimir Kozlov > Sent: Tuesday, January 22, 2019 10:27 PM > To: Fairoz Matte ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: [13] RFR(S): 8209951 : Problematic sparc intrinsic: > com.sun.crypto.provider.CipherBlockChaining > > Yes, it is good. > > Thanks, > Vladimir > > On 1/22/19 12:22 AM, Tobias Hartmann wrote: > > Hi Fairoz, > > > > this looks good to me. > > > > Thanks, > > Tobias > > > > On 22.01.19 04:35, Fairoz Matte wrote: > >> Hi, > >> > >> Please review the following patch, > >> JBS bug - https://bugs.openjdk.java.net/browse/JDK-8209951 > >> Webrev - http://cr.openjdk.java.net/~fmatte/8209951/webrev.00/ > >> > >> During the call to assembled stub code > >> generate_cipherBlockChaining_decryptAESCrypt_Parallel() > >> there was reference to G6 register used for temporary storage of F50, > >> as G6 is not saved on stack it was resulting in garbage during retrieval. > >> > >> Solution is to use unused local register (L6) for temporary storage and > retrieval of F50. > >> > >> Thanks, > >> Fairoz > >> From goetz.lindenmaier at sap.com Wed Jan 23 07:19:22 2019 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 23 Jan 2019 07:19:22 +0000 Subject: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace In-Reply-To: <89eeb1bc-950c-9c9f-f49f-aabae7b6637f@linux.vnet.ibm.com> References: <2d4d1747-a83d-5f65-eea3-d982969ae4fd@linux.vnet.ibm.com> <2ac3e91da61b43dcb2d4e45325202264@sap.com> <8083b8db-c546-29e8-c83a-f06ebd4e624e@linux.vnet.ibm.com> <89eeb1bc-950c-9c9f-f49f-aabae7b6637f@linux.vnet.ibm.com> Message-ID: Done ... Best regards, Goetz. > -----Original Message----- > From: Gustavo Romero > Sent: Dienstag, 22. Januar 2019 23:54 > To: Lindenmaier, Goetz ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for > isDigit/isLowerCase/isUpperCase/isWhitespace > > Hi Goetz, > > On 01/21/2019 09:45 AM, Gustavo Romero wrote: > > On 01/21/2019 09:10 AM, Lindenmaier, Goetz wrote: > >> also this change looks good. > > > > Thanks for reviewing it, Goetz! > > > > I'll ping once the approvals are ok. > > This change and JDK-8215317 are approved to be pushed to 11u: > > [0] https://bugs.openjdk.java.net/browse/JDK-8215317 > [1] https://bugs.openjdk.java.net/browse/JDK-8213754 > > Could you please push them at the same time to 11u? > > Thank you! > > Best regards, > Gustavo > > > Thank you. > > > > Regards, > > Gustavo > > > >> Best regards, > >> ?? Goetz. > >> > >>> -----Original Message----- > >>> From: Gustavo Romero > >>> Sent: Freitag, 18. Januar 2019 16:07 > >>> To: hotspot-compiler-dev at openjdk.java.net; Lindenmaier, Goetz > >>> ; Doerr, Martin ; > >>> vladimir.kozlov at oracle.com; Roger Riggs > >>> Cc: Michihiro Horie > >>> Subject: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for > >>> isDigit/isLowerCase/isUpperCase/isWhitespace > >>> > >>> Hi, > >>> > >>> Could the following backport to 11u be reviewed, please? > >>> > >>> Bug???? : https://bugs.openjdk.java.net/browse/JDK-8213754 > >>> Change? : http://hg.openjdk.java.net/jdk/jdk/rev/7384e00d5860 > >>> Backport: http://cr.openjdk.java.net/~gromero/8213754_jdk11u/v1/ > >>> > >>> It adds 4 intrinsics that use instructions introduced by POWER9 in order to > >>> speed up methods isDigit, isLowerCase, isUpperCase, and isWhitespace. > >>> > >>> The change is mostly PPC64-only but it does touch shared code, for > >>> instance, in order to adapt the methods in question to be properly > >>> intrinsified. It also needs an additional change [0], since one Graal > >>> test has to be adapted (a separated RFR to backport [0] was sent to [1]). > >>> > >>> The change applies almost cleanly: only a small tweak is necessary because > >>> the hunk for ppc.ad file relies on some absent text in the 11u code around > >>> the change to be applied. That absent text is related to the Superword > >>> feature (a non-related feature), which is not backported yet to 11u. > >>> > >>> This backport was tested on POWER8 and POWER9 and no regressions > were > >>> observed. > >>> > >>> This backport was also tested on x86_64 with > >>> ./test/hotspot/jtreg/compiler/{c1,c2,intrinsics} plus > >>> ./test/hotspot/jtreg/compiler/graalunit (with Graal compiler enabled) with > >>> change 8215317 [0] applied and no regressions were observed too. > >>> > >>> Thank you. > >>> > >>> Best regards, > >>> Gustavo > >>> > >>> [0] http://cr.openjdk.java.net/~gromero/8215317_jdk11u/v1/ > >>> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019- > >>> January/032266.html > >> > > From nils.eliasson at oracle.com Wed Jan 23 09:23:54 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 23 Jan 2019 10:23:54 +0100 Subject: RFR: 8217519: Improve RegMask population count calculation In-Reply-To: References: Message-ID: <2abb40ba-bfc4-6719-e418-1f1d016c57ec@oracle.com> Excellent! Thanks for fixing! // Nils On 2019-01-22 17:06, Claes Redestad wrote: > Hi, > > this patch extract the population count used in RegMask::Size() to a > utility method in share/utilities/population_count.hpp, as well as > adds a test that verifies this produces the same results as the existing > lookup table implementation. > > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8217519 > Webrev: http://cr.openjdk.java.net/~redestad/8217519/open.00/ > > This reduces instructions retired in RegMask::Size() by 50-60% in some > tests and profiles, which equates to a speedup of C2 by ~5% total. This > improves startup marginally in my tests. > > Compiler intrinsics (such as gcc's __builtin_popcount()) would be > appealing, but that actually gives worse performance than this patch (on > current build configurations/setups available to me). > > Testing: tier1-3 (ongoing, previous increments of the patch without > the gtest has been thoroughly tested) > > Thanks! > > /Claes From claes.redestad at oracle.com Wed Jan 23 09:36:24 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 23 Jan 2019 10:36:24 +0100 Subject: RFR: 8217519: Improve RegMask population count calculation In-Reply-To: <2abb40ba-bfc4-6719-e418-1f1d016c57ec@oracle.com> References: <2abb40ba-bfc4-6719-e418-1f1d016c57ec@oracle.com> Message-ID: <7b45044d-84d5-74ec-22c2-f5e697582264@oracle.com> Nils, Vladimir, Tobias, thanks for reviewing - pushed. /Claes On 2019-01-23 10:23, Nils Eliasson wrote: > Excellent! > > Thanks for fixing! > > // Nils > > On 2019-01-22 17:06, Claes Redestad wrote: >> Hi, >> >> this patch extract the population count used in RegMask::Size() to a >> utility method in share/utilities/population_count.hpp, as well as >> adds a test that verifies this produces the same results as the existing >> lookup table implementation. >> >> Bug:??? https://bugs.openjdk.java.net/browse/JDK-8217519 >> Webrev: http://cr.openjdk.java.net/~redestad/8217519/open.00/ >> >> This reduces instructions retired in RegMask::Size() by 50-60% in some >> tests and profiles, which equates to a speedup of C2 by ~5% total. This >> improves startup marginally in my tests. >> >> Compiler intrinsics (such as gcc's __builtin_popcount()) would be >> appealing, but that actually gives worse performance than this patch (on >> current build configurations/setups available to me). >> >> Testing: tier1-3 (ongoing, previous increments of the patch without >> the gtest has been thoroughly tested) >> >> Thanks! >> >> /Claes From claes.redestad at oracle.com Wed Jan 23 12:00:52 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Wed, 23 Jan 2019 13:00:52 +0100 Subject: RFR: 8217629: RegMask::find_lowest_bit can reuse count_trailing_zeros utility Message-ID: <673cad2b-7414-393e-3f2d-c44ea68e47d5@oracle.com> Hi, reusing the count_trailing_zeros utility from RegMask is a simple cleanup which may enable optimizations on many platforms, like tzcnt on Intel/AMD, and improves inlining. Bug: https://bugs.openjdk.java.net/browse/JDK-8217629 Webrev: http://cr.openjdk.java.net/~redestad/8217629/open.00/ On my startup tests and profiles this reduces instructions spent in C2s register allocator by ~4%, and ~2% on the total. Testing: tier1-3 Thanks! /Claes From gromero at linux.vnet.ibm.com Wed Jan 23 12:11:21 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 23 Jan 2019 10:11:21 -0200 Subject: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for isDigit/isLowerCase/isUpperCase/isWhitespace In-Reply-To: References: <2d4d1747-a83d-5f65-eea3-d982969ae4fd@linux.vnet.ibm.com> <2ac3e91da61b43dcb2d4e45325202264@sap.com> <8083b8db-c546-29e8-c83a-f06ebd4e624e@linux.vnet.ibm.com> <89eeb1bc-950c-9c9f-f49f-aabae7b6637f@linux.vnet.ibm.com> Message-ID: On 01/23/2019 05:19 AM, Lindenmaier, Goetz wrote: > Done ... Thanks a lot, Goetz! Regards, Gustavo > Best regards, > Goetz. > >> -----Original Message----- >> From: Gustavo Romero >> Sent: Dienstag, 22. Januar 2019 23:54 >> To: Lindenmaier, Goetz ; hotspot-compiler- >> dev at openjdk.java.net >> Subject: Re: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for >> isDigit/isLowerCase/isUpperCase/isWhitespace >> >> Hi Goetz, >> >> On 01/21/2019 09:45 AM, Gustavo Romero wrote: >>> On 01/21/2019 09:10 AM, Lindenmaier, Goetz wrote: >>>> also this change looks good. >>> >>> Thanks for reviewing it, Goetz! >>> >>> I'll ping once the approvals are ok. >> >> This change and JDK-8215317 are approved to be pushed to 11u: >> >> [0] https://bugs.openjdk.java.net/browse/JDK-8215317 >> [1] https://bugs.openjdk.java.net/browse/JDK-8213754 >> >> Could you please push them at the same time to 11u? >> >> Thank you! >> >> Best regards, >> Gustavo >> >>> Thank you. >>> >>> Regards, >>> Gustavo >>> >>>> Best regards, >>>> ?? Goetz. >>>> >>>>> -----Original Message----- >>>>> From: Gustavo Romero >>>>> Sent: Freitag, 18. Januar 2019 16:07 >>>>> To: hotspot-compiler-dev at openjdk.java.net; Lindenmaier, Goetz >>>>> ; Doerr, Martin ; >>>>> vladimir.kozlov at oracle.com; Roger Riggs >>>>> Cc: Michihiro Horie >>>>> Subject: [11u backport] RFR(M): 8213754: PPC64: Add Intrinsics for >>>>> isDigit/isLowerCase/isUpperCase/isWhitespace >>>>> >>>>> Hi, >>>>> >>>>> Could the following backport to 11u be reviewed, please? >>>>> >>>>> Bug???? : https://bugs.openjdk.java.net/browse/JDK-8213754 >>>>> Change? : http://hg.openjdk.java.net/jdk/jdk/rev/7384e00d5860 >>>>> Backport: http://cr.openjdk.java.net/~gromero/8213754_jdk11u/v1/ >>>>> >>>>> It adds 4 intrinsics that use instructions introduced by POWER9 in order to >>>>> speed up methods isDigit, isLowerCase, isUpperCase, and isWhitespace. >>>>> >>>>> The change is mostly PPC64-only but it does touch shared code, for >>>>> instance, in order to adapt the methods in question to be properly >>>>> intrinsified. It also needs an additional change [0], since one Graal >>>>> test has to be adapted (a separated RFR to backport [0] was sent to [1]). >>>>> >>>>> The change applies almost cleanly: only a small tweak is necessary because >>>>> the hunk for ppc.ad file relies on some absent text in the 11u code around >>>>> the change to be applied. That absent text is related to the Superword >>>>> feature (a non-related feature), which is not backported yet to 11u. >>>>> >>>>> This backport was tested on POWER8 and POWER9 and no regressions >> were >>>>> observed. >>>>> >>>>> This backport was also tested on x86_64 with >>>>> ./test/hotspot/jtreg/compiler/{c1,c2,intrinsics} plus >>>>> ./test/hotspot/jtreg/compiler/graalunit (with Graal compiler enabled) with >>>>> change 8215317 [0] applied and no regressions were observed too. >>>>> >>>>> Thank you. >>>>> >>>>> Best regards, >>>>> Gustavo >>>>> >>>>> [0] http://cr.openjdk.java.net/~gromero/8215317_jdk11u/v1/ >>>>> [1] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019- >>>>> January/032266.html >>>> >>> > From magnus.ihse.bursie at oracle.com Wed Jan 23 12:55:58 2019 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Wed, 23 Jan 2019 13:55:58 +0100 Subject: RFR(M)(round 2): 8215902: Add support for SoftFloat-3e library In-Reply-To: References: <4497ca084b9f48dbb8f6de1aa35c83653fd7acfb.camel@gmail.com> <7f69fc73-1c10-6b68-d657-c9e758d4bf1d@oracle.com> Message-ID: <3f62f15e-ac5f-94d4-9744-c9cef796a3fa@oracle.com> Hi Jakub, On 2019-01-15 17:31, Jakub Van?k wrote: > Hi Magnus and Erik, > > I have added the link to the repository to README and I have removed > the link to the mailing list thread. I have also recreated the GitHub > repository. Now it is a fork of the mentioned repository with two extra > commits containing README and the build scripts. > > New webrev URL: http://cr.openjdk.java.net/~jakvanek/8215902/webrev.04/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8215902 Sorry for the late reply. This looks very good! Thank you for fixing this, including rebasing the github repo. I'm not sure if you've gotten reviews from the hotspot team for the hotspot source changes, but from a build perspective, this is good to go. /Magnus > > Regards, > > Jakub > > On 2019-01-15 at 15:05 +0100, Magnus Ihse Bursie wrote: >> On 2018-12-25 16:19, Jakub Van?k wrote: >>> Hi, >>> >>> please review this webrev. It is a successor of the softfloat-3 >>> [patch] >>> thread (first email >>> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2018-November/031311.html >>> ) >>> >>> Changes since the last patch (v6): >>> >>> - renamed --with-softloat* to --with-sflt* (it is more compact and >>> it >>> corresponds to the old --with-sflt-lib=... option) >>> >>> - license is now obtained via --with-sflt-license switch (so it is >>> not >>> included in OpenJDK source tree) >>> >>> - updated documentation (slight rewording, added the license >>> option) >>> >>> - checks for default --with/--without behavior are in place again >>> (I forgot them when I changed the way the library is detected) >>> >>> - added a simple testcase - I found a disrepancy between softfloat >>> and >>> system function behavior. When a float with bits 0x003FFFFF is >>> added to 0x00000001, the correct result is 0x00400000, but the >>> default software floating point implementation returns >>> 0x00000000. >>> However I'm not sure where to put this test - now it is in >>> test/hotspot/jtreg/compiler/floatingpoint. >>> >>> - comments in code refer to CR 6757269 and newly JDK-8215902 too. >>> >>> I have created a repository with SoftFloat-3e with build >>> configuration >>> specifically for OpenJDK on armel: >>> https://github.com/ev3dev-lang-java/softfloat-openjdk >>> >>> I can add a link to it to the documentation. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8215902 >>> Webrev: http://cr.openjdk.java.net/~jakvanek/8215902/webrev.02/ >> Hi Jakub, >> >> In general this looks good. >> >> Some comments: >> >> I agree with Erik that you can add a link to your github project; >> compiling SoftFloat is outside the scope of the OpenJDK build >> instructions, but it can sure be helpful to lower the bar for users >> wanting to do that. Just one question: any particular reason you >> didn't >> create your github repo by forking the official >> https://github.com/ucb-bar/berkeley-softfloat-3? That way, it would >> have >> been easy for users to see that you were not adding any malicious or >> suspicious code to the original SoftFloat distribution. >> >> On the other hand, I think the link to >> > http://mail.openjdk.java.net/pipermail/aarch32-port-dev/2016-November/000611.html >> >> is unnecessary and just creates clutter in the documentation. Please >> remove it. >> >> /Magnus >>> CI build: >>> https://ci.adoptopenjdk.net/view/ev3dev/job/openjdk12_build_ev3_linux/67/ >>> >>> Cheers, >>> >>> Jakub >>> >> From jamsheed.c.m at oracle.com Wed Jan 23 14:08:57 2019 From: jamsheed.c.m at oracle.com (Jamsheed) Date: Wed, 23 Jan 2019 19:38:57 +0530 Subject: [12] RFR: 8213825: assert(false) failed: Non-balanced monitor enter/exit! Likely JNI locking Message-ID: Hi, Request for review bug: https://bugs.openjdk.java.net/browse/JDK-8213825 webrev: http://cr.openjdk.java.net/~jcm/8213825/webrev.00/index.html Bug & Fix Desc: if markword load has sfpt as control i/p(i.e synchronizations near a safepoint), it skips sfpt assuming sfptOp wouldn't write to markword memory fix: not to skip sfpt for markword loads. tests: hs-tier1-5,? hs-precheckin-comp Best regards, Jamsheed From nils.eliasson at oracle.com Wed Jan 23 15:16:00 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 23 Jan 2019 16:16:00 +0100 Subject: RFR: 8217629: RegMask::find_lowest_bit can reuse count_trailing_zeros utility In-Reply-To: <673cad2b-7414-393e-3f2d-c44ea68e47d5@oracle.com> References: <673cad2b-7414-393e-3f2d-c44ea68e47d5@oracle.com> Message-ID: Hi Claes, Looks great! Consider it trivial. / Nils On 2019-01-23 13:00, Claes Redestad wrote: > Hi, > > reusing the count_trailing_zeros utility from RegMask is a simple > cleanup which may enable optimizations on many platforms, like tzcnt > on Intel/AMD, and improves inlining. > > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8217629 > Webrev: http://cr.openjdk.java.net/~redestad/8217629/open.00/ > > On my startup tests and profiles this reduces instructions spent in C2s > register allocator by ~4%, and ~2% on the total. > > Testing: tier1-3 > > Thanks! > > /Claes From tobias.hartmann at oracle.com Wed Jan 23 15:28:23 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 23 Jan 2019 16:28:23 +0100 Subject: RFR: 8217629: RegMask::find_lowest_bit can reuse count_trailing_zeros utility In-Reply-To: <673cad2b-7414-393e-3f2d-c44ea68e47d5@oracle.com> References: <673cad2b-7414-393e-3f2d-c44ea68e47d5@oracle.com> Message-ID: Hi Claes, looks good to me too. Best regards, Tobias On 23.01.19 13:00, Claes Redestad wrote: > Hi, > > reusing the count_trailing_zeros utility from RegMask is a simple > cleanup which may enable optimizations on many platforms, like tzcnt > on Intel/AMD, and improves inlining. > > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8217629 > Webrev: http://cr.openjdk.java.net/~redestad/8217629/open.00/ > > On my startup tests and profiles this reduces instructions spent in C2s > register allocator by ~4%, and ~2% on the total. > > Testing: tier1-3 > > Thanks! > > /Claes From shade at redhat.com Wed Jan 23 16:50:00 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 23 Jan 2019 17:50:00 +0100 Subject: RFR (S) 8217639: Minimal and Zero builds fail after JDK-8217519 (Improve RegMask population count calculation) Message-ID: <3bde3396-02a4-4b16-4fc5-257f67a34211@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8217639 Reason: New test references "extern uint8_t byte[] bitsInByte", and that is defined in libadt/vectset.cpp, which is not compiled when C2 is disabled in Minimal and Zero VM builds. I was first considering to enabled libadt build when C2 is disabled, but the more straight-forward fix would be to give the test its own golden data to test against. This would also implicitly test for accidental bugs in bitsInByte matrix in production code. Fix: diff -r c96f9aa1f3d8 -r 29037fc5194d test/hotspot/gtest/utilities/test_population_count.cpp --- a/test/hotspot/gtest/utilities/test_population_count.cpp Wed Jan 23 13:16:16 2019 +0000 +++ b/test/hotspot/gtest/utilities/test_population_count.cpp Wed Jan 23 17:04:25 2019 +0100 @@ -29,18 +29,35 @@ #include "utilities/globalDefinitions.hpp" #include "unittest.hpp" +uint8_t test_popcnt_bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = { + 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, + 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, + 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, + 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, + 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, + 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, + 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, + 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, + 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, + 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8 +}; TEST(population_count, sparse) { - extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE]; // Step through the entire input range from a random starting point, // verify population_count return values against the lookup table // approach used historically uint32_t step = 4711; for (uint32_t value = os::random() % step; value < UINT_MAX - step; value += step) { - uint32_t lookup = bitsInByte[(value >> 24) & 0xff] + - bitsInByte[(value >> 16) & 0xff] + - bitsInByte[(value >> 8) & 0xff] + - bitsInByte[ value & 0xff]; + uint32_t lookup = test_popcnt_bitsInByte[(value >> 24) & 0xff] + + test_popcnt_bitsInByte[(value >> 16) & 0xff] + + test_popcnt_bitsInByte[(value >> 8) & 0xff] + + test_popcnt_bitsInByte[ value & 0xff]; EXPECT_EQ(lookup, population_count(value)) << "value = " << value; Testing: Linux x86_64 {server,zero,minimal} build and gtest:population_count -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From vladimir.kozlov at oracle.com Wed Jan 23 16:57:41 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Jan 2019 08:57:41 -0800 Subject: RFR (S) 8217639: Minimal and Zero builds fail after JDK-8217519 (Improve RegMask population count calculation) In-Reply-To: <3bde3396-02a4-4b16-4fc5-257f67a34211@redhat.com> References: <3bde3396-02a4-4b16-4fc5-257f67a34211@redhat.com> Message-ID: <925d42a1-460d-7a64-d872-603f42375337@oracle.com> Good. I think it is trivial. thanks, Vladimir On 1/23/19 8:50 AM, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8217639 > > Reason: New test references "extern uint8_t byte[] bitsInByte", and that is defined in > libadt/vectset.cpp, which is not compiled when C2 is disabled in Minimal and Zero VM builds. I was > first considering to enabled libadt build when C2 is disabled, but the more straight-forward fix > would be to give the test its own golden data to test against. This would also implicitly test for > accidental bugs in bitsInByte matrix in production code. > > Fix: > > diff -r c96f9aa1f3d8 -r 29037fc5194d test/hotspot/gtest/utilities/test_population_count.cpp > --- a/test/hotspot/gtest/utilities/test_population_count.cpp Wed Jan 23 13:16:16 2019 +0000 > +++ b/test/hotspot/gtest/utilities/test_population_count.cpp Wed Jan 23 17:04:25 2019 +0100 > @@ -29,18 +29,35 @@ > #include "utilities/globalDefinitions.hpp" > #include "unittest.hpp" > > +uint8_t test_popcnt_bitsInByte[BITS_IN_BYTE_ARRAY_SIZE] = { > + 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, > + 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, > + 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, > + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, > + 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, > + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, > + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, > + 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, > + 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, > + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, > + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, > + 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, > + 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, > + 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, > + 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, > + 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8 > +}; > > TEST(population_count, sparse) { > - extern uint8_t bitsInByte[BITS_IN_BYTE_ARRAY_SIZE]; > // Step through the entire input range from a random starting point, > // verify population_count return values against the lookup table > // approach used historically > uint32_t step = 4711; > for (uint32_t value = os::random() % step; value < UINT_MAX - step; value += step) { > - uint32_t lookup = bitsInByte[(value >> 24) & 0xff] + > - bitsInByte[(value >> 16) & 0xff] + > - bitsInByte[(value >> 8) & 0xff] + > - bitsInByte[ value & 0xff]; > + uint32_t lookup = test_popcnt_bitsInByte[(value >> 24) & 0xff] + > + test_popcnt_bitsInByte[(value >> 16) & 0xff] + > + test_popcnt_bitsInByte[(value >> 8) & 0xff] + > + test_popcnt_bitsInByte[ value & 0xff]; > > EXPECT_EQ(lookup, population_count(value)) > << "value = " << value; > > Testing: Linux x86_64 {server,zero,minimal} build and gtest:population_count > > -Aleksey > From vladimir.kozlov at oracle.com Wed Jan 23 17:11:18 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Jan 2019 09:11:18 -0800 Subject: [12] RFR: 8213825: assert(false) failed: Non-balanced monitor enter/exit! Likely JNI locking In-Reply-To: References: Message-ID: Hi Jamsheed, Fix is good. I approved it for JDK 12 push. Thanks, Vladimir On 1/23/19 6:08 AM, Jamsheed wrote: > Hi, > > Request for review > > bug: https://bugs.openjdk.java.net/browse/JDK-8213825 > > webrev: http://cr.openjdk.java.net/~jcm/8213825/webrev.00/index.html > > Bug & Fix Desc: > > if markword load has sfpt as control i/p(i.e synchronizations near a safepoint), it skips sfpt assuming sfptOp wouldn't > write to markword memory > fix: not to skip sfpt for markword loads. > > tests: hs-tier1-5,? hs-precheckin-comp > > Best regards, > > Jamsheed > From vladimir.kozlov at oracle.com Wed Jan 23 17:24:45 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Jan 2019 09:24:45 -0800 Subject: RFR(S) [12] : 8158646 : [jittester] generated tests may not compile by javac In-Reply-To: <6D91688A-01A0-46E0-A304-9F39E16F574E@oracle.com> References: <6D91688A-01A0-46E0-A304-9F39E16F574E@oracle.com> Message-ID: <87177e44-0b60-c385-afb7-eeedd3d29829@oracle.com> On 1/22/19 6:26 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8158646/webrev.00/index.html >> 64 lines changed: 23 ins; 6 del; 35 mod; > > Hi all, > > could you please review this small fix for jit-tester? > > the bug was caused by TypeList not being fully cleared b/w generation. we only remove classes which starts w/ "Test_", so we don't remove "basic" classes, e.g. Runnable, and don't clean their 'children'. in most cases, this is fine, as each generation will use only its own Test_N_* classes so having Test_M_* (M != N) classes as Runnable's children has no impact besides garbage in memory, however, if we get an error during Test_N generation we will redo generation for the same N, and in such cases, previous children of "basic" classes (read Runnable) cause incompatible types. the fix is to remove "Test_" classes from the children. ok > > besides the fix for the bug, the patch also include the following small clean ups: > - use DIST_JAR var value instead of 'JAR' string constant in makefile ok > - change default target testbase dir ok > - make sure jaotc is always run w/ X-mixed regardless of "external" vm flags Also would be nice to add -ea -esa too if they are not used yet. > - add -Xcomp to all the generator tests why you need -Xcomp? > - use tmp directory for class files Will it work on Windows which has issues with tmp dir? There was discussion about it recently. > - check javac error code ok > - optimize getAllParents/getAllChildren to call getAllParents/getAllChildren only if a class hasn't been added yet ok Thanks, Vladimir > > webrev: http://cr.openjdk.java.net/~iignatyev//8158646/webrev.00/index.html > JBS: https://bugs.openjdk.java.net/browse/JDK-8158646 > testing: generated 1000 tests, all can be compiled and work fine > > Thanks, > -- Igor > From derekw at marvell.com Wed Jan 23 17:27:34 2019 From: derekw at marvell.com (Derek White) Date: Wed, 23 Jan 2019 17:27:34 +0000 Subject: Changes to Bellsoft/Marvell method of developing intrinsics Message-ID: AArch64 Community, First I should describe the relationship between myself, Marvell, and Bellsoft. I'm the JVM team lead at Marvell/Cavium, and we work as a virtual team with Bellsoft to help port, analyze, and optimize the aarch64 port of OpenJDK (as well as Hadoop, etc). Bellsoft also contributes to OpenJDK independently. Andrew Dinn has brought up several good points on testing, code quality, and when and where code complexity should be spent in the aarch64 port. I'll describe my general thoughts on code complexity, what Bellsoft does generally for testing before check-ins, as well as describe what we will be doing for new and existing complex intrinsics code. Intrinsics are a category of code that can handle more complexity than usual because the complexity is quite local. A developer can generally ignore the details hiding in the implementation unless actively reviewing or enhancing the intrinsic. But while pockets of complexity are OK, black holes of complexity are not. The effort to understand the intrinsics must be substantially less than then effort to develop it. The nature of intrinsics also make them easier to test in isolation, but the testing has to be sufficient. And I agree that the performance gain of each intrinsic has to justify the work developing and supporting it. Bellsoft's current testing process, before sending a patch for review, is developing testing specific to the patch itself and testing for regressions with JCK and relevant jtreg tests. If the patch is in shared code, it undergoes testing on Linux x86, ARM, AARCH64, Windows, Mac, Solaris x86 and SPARC. Obviously this has not been sufficient to prevent bugs in the more complex intrinsics we've implemented for aarch64 - even with the stellar code review provided by the community. And the effort required to review the intrinsics has been too high. Because of this we will change how we develop patches for complex intrinsics. Before sending the code out for public review, we intend to: * Use an additional "red-team" developer to focus on finding the weak points in the code and develop tests that ensure code coverage testing, test case coverage, etc. This is in addition to the normal testing and test development that the initiating developer is expected to do. * The "red-team" developer will also suggest changes for code clarity and code documentation, and will document the test strategy (what cases are tested, what tests cover what code, how to run tests). * We will include all tests developed as part of the patch, even if some modes may not be practical to run regularly as jtreg tests (for example if some tests take excessive time). This will allow later enhancements or fixes to the intrinsic to go through at least as thorough testing as the original. By breaking the patch development task into two roles we expect to end up with better code quality and make the reviewing task easier. Note that this is the process that we will be using. We don't expect the rest of the community to adopt this, or if they did, agree on exactly how complex a "complex intrinsic" needs to be to warrant this approach. We will also begin back-reviewing existing complex intrinsics. If other members of the community are interested in working on this we can coordinate to ensure coverage. Please let me know if you have any comments on this plan. Thanks, * Derek -------------- next part -------------- An HTML attachment was scrubbed... URL: From igor.ignatyev at oracle.com Wed Jan 23 17:36:55 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 23 Jan 2019 09:36:55 -0800 Subject: RFR(S) [12] : 8216180 : [AOT] compiler/intrinsics/bigInteger/TestMulAdd.java crashed with AOT enabled Message-ID: <28BD4C78-3A3F-4B36-8D93-B9F520B08E34@oracle.com> http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html > 32 lines changed: 32 ins; 0 del; 0 mod; Hi all, could you please review this small patch which exclude TestMulAdd test from execution if java.base is AOT'ed compiled? the test disables some intrinsics, and if it's run w/ AOT'ed java.base there these intrinsics are enabled (which is the most common, if not the only, case) we get crash. the fix introduces new @requires value -- vm.aot.modules which contains comma-separated list of AOT'ed modules and use it to skip this test if java.base is one of them. webrev: http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html JBS: https://bugs.openjdk.java.net/browse/JDK-8216180 testing: compiler/intrinsics/bigInteger tests on linux-x64 w/ JTREG=AOT_MODULES=java.base, TEST_OPTS_AOT_MODULES=java.base and w/o any extra make args Thanks, -- Igor From igor.ignatyev at oracle.com Wed Jan 23 17:46:24 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 23 Jan 2019 09:46:24 -0800 Subject: RFR(S) [12] : 8158646 : [jittester] generated tests may not compile by javac In-Reply-To: <87177e44-0b60-c385-afb7-eeedd3d29829@oracle.com> References: <6D91688A-01A0-46E0-A304-9F39E16F574E@oracle.com> <87177e44-0b60-c385-afb7-eeedd3d29829@oracle.com> Message-ID: <3A45E5A7-1351-47FD-8633-065CF639BF2A@oracle.com> Hi Vladimir, thanks for your review! >> - make sure jaotc is always run w/ X-mixed regardless of "external" vm flags > Also would be nice to add -ea -esa too if they are not used yet. -Xmixed was added to "speed-up" compilation in case external flags has Xcomp. from my point of view, it's better if '-ea -esa' are provided during test runs, as in some cases you might want to run jaotc w/o them. >> - add -Xcomp to all the generator tests > why you need -Xcomp? b/c jit-tester is supposed to compare results of interpreted execution (saved in .gold.* files) w/ the result from compilers, so generated tests must be run w/ Xcomp, otherwise we will comparing one results from interpreter w/ the results from interpreter. >> - use tmp directory for class files > Will it work on Windows which has issues with tmp dir? There was discussion about it recently. we use tmp dir only in the test generator which can be run on any platform, it doesn't have to be run on the same host/platform as actual test execution. in fact the preferred usage model of jit-tester is to pre-generate test corpus and reuse it. Thanks, -- Igor > On Jan 23, 2019, at 9:24 AM, Vladimir Kozlov wrote: > > On 1/22/19 6:26 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8158646/webrev.00/index.html >>> 64 lines changed: 23 ins; 6 del; 35 mod; >> Hi all, >> could you please review this small fix for jit-tester? >> the bug was caused by TypeList not being fully cleared b/w generation. we only remove classes which starts w/ "Test_", so we don't remove "basic" classes, e.g. Runnable, and don't clean their 'children'. in most cases, this is fine, as each generation will use only its own Test_N_* classes so having Test_M_* (M != N) classes as Runnable's children has no impact besides garbage in memory, however, if we get an error during Test_N generation we will redo generation for the same N, and in such cases, previous children of "basic" classes (read Runnable) cause incompatible types. the fix is to remove "Test_" classes from the children. > > ok > >> besides the fix for the bug, the patch also include the following small clean ups: >> - use DIST_JAR var value instead of 'JAR' string constant in makefile > > ok > >> - change default target testbase dir > > ok > >> - make sure jaotc is always run w/ X-mixed regardless of "external" vm flags > > Also would be nice to add -ea -esa too if they are not used yet. > >> - add -Xcomp to all the generator tests > > why you need -Xcomp? > >> - use tmp directory for class files > > Will it work on Windows which has issues with tmp dir? There was discussion about it recently. > >> - check javac error code > > ok > >> - optimize getAllParents/getAllChildren to call getAllParents/getAllChildren only if a class hasn't been added yet > > ok > > Thanks, > Vladimir > >> webrev: http://cr.openjdk.java.net/~iignatyev//8158646/webrev.00/index.html >> JBS: https://bugs.openjdk.java.net/browse/JDK-8158646 >> testing: generated 1000 tests, all can be compiled and work fine >> Thanks, >> -- Igor From vladimir.kozlov at oracle.com Wed Jan 23 18:28:33 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Jan 2019 10:28:33 -0800 Subject: RFR(S) [12] : 8216180 : [AOT] compiler/intrinsics/bigInteger/TestMulAdd.java crashed with AOT enabled In-Reply-To: <28BD4C78-3A3F-4B36-8D93-B9F520B08E34@oracle.com> References: <28BD4C78-3A3F-4B36-8D93-B9F520B08E34@oracle.com> Message-ID: I assume tests don't use -XX:AOTLibrary= flag but load them from default location in JDK. Right? Can we instead skip such tests if any AOT library is loaded? We can check it with PrintAOT or new ouptu or new WB API. Relying on env variable is not robust I think. Thanks, Vladimir On 1/23/19 9:36 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html >> 32 lines changed: 32 ins; 0 del; 0 mod; > > Hi all, > > could you please review this small patch which exclude TestMulAdd test from execution if java.base is AOT'ed compiled? > > the test disables some intrinsics, and if it's run w/ AOT'ed java.base there these intrinsics are enabled (which is the most common, if not the only, case) we get crash. the fix introduces new @requires value -- vm.aot.modules which contains comma-separated list of AOT'ed modules and use it to skip this test if java.base is one of them. > > webrev: http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html > JBS: https://bugs.openjdk.java.net/browse/JDK-8216180 > testing: compiler/intrinsics/bigInteger tests on linux-x64 w/ JTREG=AOT_MODULES=java.base, TEST_OPTS_AOT_MODULES=java.base and w/o any extra make args > > Thanks, > -- Igor > From shade at redhat.com Wed Jan 23 18:32:06 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 23 Jan 2019 19:32:06 +0100 Subject: RFR (S) 8217639: Minimal and Zero builds fail after JDK-8217519 (Improve RegMask population count calculation) In-Reply-To: <925d42a1-460d-7a64-d872-603f42375337@oracle.com> References: <3bde3396-02a4-4b16-4fc5-257f67a34211@redhat.com> <925d42a1-460d-7a64-d872-603f42375337@oracle.com> Message-ID: <1ea00c5e-b01f-76a2-8034-3d9c59058e90@redhat.com> On 1/23/19 5:57 PM, Vladimir Kozlov wrote: > Good. I think it is trivial. Thanks, I think so too. Pushed. -Aleksey -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From vladimir.kozlov at oracle.com Wed Jan 23 18:36:19 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Jan 2019 10:36:19 -0800 Subject: RFR(S) [12] : 8158646 : [jittester] generated tests may not compile by javac In-Reply-To: <3A45E5A7-1351-47FD-8633-065CF639BF2A@oracle.com> References: <6D91688A-01A0-46E0-A304-9F39E16F574E@oracle.com> <87177e44-0b60-c385-afb7-eeedd3d29829@oracle.com> <3A45E5A7-1351-47FD-8633-065CF639BF2A@oracle.com> Message-ID: <8e89ff21-d3d4-3f95-ef54-1aeb55e2aeac@oracle.com> On 1/23/19 9:46 AM, Igor Ignatyev wrote: > Hi Vladimir, > > thanks for your review! > >>> - make sure jaotc is always run w/ X-mixed regardless of "external" vm flags >> Also would be nice to add -ea -esa too if they are not used yet. > -Xmixed was added to "speed-up" compilation in case external flags has Xcomp. from my point of view, it's better if '-ea -esa' are provided during test runs, as in some cases you might want to run jaotc w/o them. I got it about -Xmixed. I put aot tests to noxcomp group for our CI testing. To always use '-ea -esa' with jaotc during testing is good I think. We have them by default in our hs-comp testing tasks definitions. I thought to have them here is also good if this testing does not use flags from task definitions. > >>> - add -Xcomp to all the generator tests >> why you need -Xcomp? > b/c jit-tester is supposed to compare results of interpreted execution (saved in .gold.* files) w/ the result from compilers, so generated tests must be run w/ Xcomp, otherwise we will comparing one results from interpreter w/ the results from interpreter. Got it. > >>> - use tmp directory for class files >> Will it work on Windows which has issues with tmp dir? There was discussion about it recently. > we use tmp dir only in the test generator which can be run on any platform, it doesn't have to be run on the same host/platform as actual test execution. in fact the preferred usage model of jit-tester is to pre-generate test corpus and reuse it. Okay. thanks, Vladimir > > Thanks, > -- Igor > >> On Jan 23, 2019, at 9:24 AM, Vladimir Kozlov wrote: >> >> On 1/22/19 6:26 PM, Igor Ignatyev wrote: >>> http://cr.openjdk.java.net/~iignatyev//8158646/webrev.00/index.html >>>> 64 lines changed: 23 ins; 6 del; 35 mod; >>> Hi all, >>> could you please review this small fix for jit-tester? >>> the bug was caused by TypeList not being fully cleared b/w generation. we only remove classes which starts w/ "Test_", so we don't remove "basic" classes, e.g. Runnable, and don't clean their 'children'. in most cases, this is fine, as each generation will use only its own Test_N_* classes so having Test_M_* (M != N) classes as Runnable's children has no impact besides garbage in memory, however, if we get an error during Test_N generation we will redo generation for the same N, and in such cases, previous children of "basic" classes (read Runnable) cause incompatible types. the fix is to remove "Test_" classes from the children. >> >> ok >> >>> besides the fix for the bug, the patch also include the following small clean ups: >>> - use DIST_JAR var value instead of 'JAR' string constant in makefile >> >> ok >> >>> - change default target testbase dir >> >> ok >> >>> - make sure jaotc is always run w/ X-mixed regardless of "external" vm flags >> >> Also would be nice to add -ea -esa too if they are not used yet. >> >>> - add -Xcomp to all the generator tests >> >> why you need -Xcomp? >> >>> - use tmp directory for class files >> >> Will it work on Windows which has issues with tmp dir? There was discussion about it recently. >> >>> - check javac error code >> >> ok >> >>> - optimize getAllParents/getAllChildren to call getAllParents/getAllChildren only if a class hasn't been added yet >> >> ok >> >> Thanks, >> Vladimir >> >>> webrev: http://cr.openjdk.java.net/~iignatyev//8158646/webrev.00/index.html >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8158646 >>> testing: generated 1000 tests, all can be compiled and work fine >>> Thanks, >>> -- Igor > From igor.ignatyev at oracle.com Wed Jan 23 18:34:59 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 23 Jan 2019 10:34:59 -0800 Subject: RFR(S) [12] : 8216180 : [AOT] compiler/intrinsics/bigInteger/TestMulAdd.java crashed with AOT enabled In-Reply-To: References: <28BD4C78-3A3F-4B36-8D93-B9F520B08E34@oracle.com> Message-ID: > On Jan 23, 2019, at 10:28 AM, Vladimir Kozlov wrote: > > I assume tests don't use -XX:AOTLibrary= flag but load them from default location in JDK. Right? that's correct, the runs where the test fails used libraries from the default location. > Can we instead skip such tests if any AOT library is loaded? We can check it with PrintAOT or new ouptu or new WB API. > Relying on env variable is not robust I think. these env variables are part of run-test "official" contract, so I believe it's safe to use them. the only problem I see w/ such approach is runs w/ jdk-images which include AOT'ed modules in them, but there are no such images, and w/ current state of AOT, they aren't actually possible however if you have strong objections, I can look into other ways to retrieve this information. -- Igor > > Thanks, > Vladimir > > On 1/23/19 9:36 AM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html >>> 32 lines changed: 32 ins; 0 del; 0 mod; >> Hi all, >> could you please review this small patch which exclude TestMulAdd test from execution if java.base is AOT'ed compiled? >> the test disables some intrinsics, and if it's run w/ AOT'ed java.base there these intrinsics are enabled (which is the most common, if not the only, case) we get crash. the fix introduces new @requires value -- vm.aot.modules which contains comma-separated list of AOT'ed modules and use it to skip this test if java.base is one of them. >> webrev: http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html >> JBS: https://bugs.openjdk.java.net/browse/JDK-8216180 >> testing: compiler/intrinsics/bigInteger tests on linux-x64 w/ JTREG=AOT_MODULES=java.base, TEST_OPTS_AOT_MODULES=java.base and w/o any extra make args >> Thanks, >> -- Igor From dean.long at oracle.com Wed Jan 23 20:24:30 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 23 Jan 2019 12:24:30 -0800 Subject: [12] RFR: 8213825: assert(false) failed: Non-balanced monitor enter/exit! Likely JNI locking In-Reply-To: References: Message-ID: <257d62c0-c0f9-394e-1cbb-0f33b3a1d365@oracle.com> Looks good to me too.? Nice job tracking this down, Jamsheed! dl On 1/23/19 9:11 AM, Vladimir Kozlov wrote: > Hi Jamsheed, > > Fix is good. I approved it for JDK 12 push. > > Thanks, > Vladimir > > On 1/23/19 6:08 AM, Jamsheed wrote: >> Hi, >> >> Request for review >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8213825 >> >> webrev: http://cr.openjdk.java.net/~jcm/8213825/webrev.00/index.html >> >> Bug & Fix Desc: >> >> if markword load has sfpt as control i/p(i.e synchronizations near a >> safepoint), it skips sfpt assuming sfptOp wouldn't write to markword >> memory >> fix: not to skip sfpt for markword loads. >> >> tests: hs-tier1-5,? hs-precheckin-comp >> >> Best regards, >> >> Jamsheed >> From igor.ignatyev at oracle.com Wed Jan 23 22:12:49 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 23 Jan 2019 14:12:49 -0800 Subject: RFR(S) [12] : 8158646 : [jittester] generated tests may not compile by javac In-Reply-To: <8e89ff21-d3d4-3f95-ef54-1aeb55e2aeac@oracle.com> References: <6D91688A-01A0-46E0-A304-9F39E16F574E@oracle.com> <87177e44-0b60-c385-afb7-eeedd3d29829@oracle.com> <3A45E5A7-1351-47FD-8633-065CF639BF2A@oracle.com> <8e89ff21-d3d4-3f95-ef54-1aeb55e2aeac@oracle.com> Message-ID: Vladimir, we can always specify '-ea -esa' in our task definitions if we want, but baking them into generated tests will affect all executions of these tests, and seems to be inadequate. as testing jaotc tool isn't the goal of these tests, I'd prefer not to add more jaotc-specific than necessary. what do you think? -- Igor > On Jan 23, 2019, at 10:36 AM, Vladimir Kozlov wrote: > > On 1/23/19 9:46 AM, Igor Ignatyev wrote: >> Hi Vladimir, >> thanks for your review! >>>> - make sure jaotc is always run w/ X-mixed regardless of "external" vm flags >>> Also would be nice to add -ea -esa too if they are not used yet. >> -Xmixed was added to "speed-up" compilation in case external flags has Xcomp. from my point of view, it's better if '-ea -esa' are provided during test runs, as in some cases you might want to run jaotc w/o them. > > I got it about -Xmixed. I put aot tests to noxcomp group for our CI testing. > > To always use '-ea -esa' with jaotc during testing is good I think. We have them by default in our hs-comp testing tasks definitions. I thought to have them here is also good if this testing does not use flags from task definitions. >>>> - add -Xcomp to all the generator tests >>> why you need -Xcomp? >> b/c jit-tester is supposed to compare results of interpreted execution (saved in .gold.* files) w/ the result from compilers, so generated tests must be run w/ Xcomp, otherwise we will comparing one results from interpreter w/ the results from interpreter. > > Got it. > >>>> - use tmp directory for class files >>> Will it work on Windows which has issues with tmp dir? There was discussion about it recently. >> we use tmp dir only in the test generator which can be run on any platform, it doesn't have to be run on the same host/platform as actual test execution. in fact the preferred usage model of jit-tester is to pre-generate test corpus and reuse it. > > Okay. > > thanks, > Vladimir > >> Thanks, >> -- Igor >>> On Jan 23, 2019, at 9:24 AM, Vladimir Kozlov wrote: >>> >>> On 1/22/19 6:26 PM, Igor Ignatyev wrote: >>>> http://cr.openjdk.java.net/~iignatyev//8158646/webrev.00/index.html >>>>> 64 lines changed: 23 ins; 6 del; 35 mod; >>>> Hi all, >>>> could you please review this small fix for jit-tester? >>>> the bug was caused by TypeList not being fully cleared b/w generation. we only remove classes which starts w/ "Test_", so we don't remove "basic" classes, e.g. Runnable, and don't clean their 'children'. in most cases, this is fine, as each generation will use only its own Test_N_* classes so having Test_M_* (M != N) classes as Runnable's children has no impact besides garbage in memory, however, if we get an error during Test_N generation we will redo generation for the same N, and in such cases, previous children of "basic" classes (read Runnable) cause incompatible types. the fix is to remove "Test_" classes from the children. >>> >>> ok >>> >>>> besides the fix for the bug, the patch also include the following small clean ups: >>>> - use DIST_JAR var value instead of 'JAR' string constant in makefile >>> >>> ok >>> >>>> - change default target testbase dir >>> >>> ok >>> >>>> - make sure jaotc is always run w/ X-mixed regardless of "external" vm flags >>> >>> Also would be nice to add -ea -esa too if they are not used yet. >>> >>>> - add -Xcomp to all the generator tests >>> >>> why you need -Xcomp? >>> >>>> - use tmp directory for class files >>> >>> Will it work on Windows which has issues with tmp dir? There was discussion about it recently. >>> >>>> - check javac error code >>> >>> ok >>> >>>> - optimize getAllParents/getAllChildren to call getAllParents/getAllChildren only if a class hasn't been added yet >>> >>> ok >>> >>> Thanks, >>> Vladimir >>> >>>> webrev: http://cr.openjdk.java.net/~iignatyev//8158646/webrev.00/index.html >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8158646 >>>> testing: generated 1000 tests, all can be compiled and work fine >>>> Thanks, >>>> -- Igor From igor.ignatyev at oracle.com Wed Jan 23 22:13:18 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 23 Jan 2019 14:13:18 -0800 Subject: RFR(S) [12] : 8216180 : [AOT] compiler/intrinsics/bigInteger/TestMulAdd.java crashed with AOT enabled In-Reply-To: References: <28BD4C78-3A3F-4B36-8D93-B9F520B08E34@oracle.com> Message-ID: <74C74200-60AD-4C99-913B-A06752EBE965@oracle.com> Vladimir, I gave it a bit more thoughts, and am inclining to agree that replying on env. variables is indeed fragile. so I've decided to go w/ a new WB method --http://cr.openjdk.java.net/~iignatyev//8216180/webrev.01/index.html (testing is in-progress) Thanks, -- Igor > On Jan 23, 2019, at 10:34 AM, Igor Ignatyev wrote: > > > >> On Jan 23, 2019, at 10:28 AM, Vladimir Kozlov wrote: >> >> I assume tests don't use -XX:AOTLibrary= flag but load them from default location in JDK. Right? > that's correct, the runs where the test fails used libraries from the default location. > >> Can we instead skip such tests if any AOT library is loaded? We can check it with PrintAOT or new ouptu or new WB API. >> Relying on env variable is not robust I think. > > these env variables are part of run-test "official" contract, so I believe it's safe to use them. the only problem I see w/ such approach is runs w/ jdk-images which include AOT'ed modules in them, but there are no such images, and w/ current state of AOT, they aren't actually possible however if you have strong objections, I can look into other ways to retrieve this information. > > -- Igor >> >> Thanks, >> Vladimir >> >> On 1/23/19 9:36 AM, Igor Ignatyev wrote: >>> http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html >>>> 32 lines changed: 32 ins; 0 del; 0 mod; >>> Hi all, >>> could you please review this small patch which exclude TestMulAdd test from execution if java.base is AOT'ed compiled? >>> the test disables some intrinsics, and if it's run w/ AOT'ed java.base there these intrinsics are enabled (which is the most common, if not the only, case) we get crash. the fix introduces new @requires value -- vm.aot.modules which contains comma-separated list of AOT'ed modules and use it to skip this test if java.base is one of them. >>> webrev: http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8216180 >>> testing: compiler/intrinsics/bigInteger tests on linux-x64 w/ JTREG=AOT_MODULES=java.base, TEST_OPTS_AOT_MODULES=java.base and w/o any extra make args >>> Thanks, >>> -- Igor > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gromero at linux.vnet.ibm.com Wed Jan 23 22:17:59 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 23 Jan 2019 20:17:59 -0200 Subject: RFR(M): 8217459: [PPC64] Cleanup non-vector version of CRC32 In-Reply-To: References: Message-ID: <243b17be-e1a3-7b68-1e72-9a114552860c@linux.vnet.ibm.com> Hi Martin, On 01/21/2019 04:07 PM, Doerr, Martin wrote: > PPC64 currently contains static tables for CRC32/CRC32C calculations. We only need some of them depending on Endianess and on whether vector instructions are available or not. > We can get rid of quite some code when we generate these constants at startup as we already do for the vector version. > In addition, we can save one register in the vector case because we can use one constants pointer for all related constants. > Webrev: > http://cr.openjdk.java.net/~mdoerr/8217459_ppc64_crc_consts/webrev.00/ Thanks for the clean-up. Change looks good! It's good to see fold_8bit_crc32 and kernel_crc32_1byte going away (I just noted them recently so I missed both in my previous clean-up). And also the static table simplification. I tested the change with different array sizes and byte values with and without vpmsum in the CPU, i.e. has_vpmsumb() = false, and found no issues. Only a nit: should we update the following comment and replace 'timesXtoThe32' by something better, maybe 'table'? That name doesn't look much meaningful in the current context and seems taken from the native code for java.util.zip.CRC32: 3902 /** 3903 * uint32_t crc; 3904 * timesXtoThe32[crc & 0xFF] ^ (crc >> 8); 3905 */ 3906 void MacroAssembler::fold_byte_crc32(Register crc, Register val, Register table, Register tmp) { Best regards, Gustavo From vladimir.kozlov at oracle.com Wed Jan 23 22:32:52 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Jan 2019 14:32:52 -0800 Subject: RFR(S) [12] : 8158646 : [jittester] generated tests may not compile by javac In-Reply-To: References: <6D91688A-01A0-46E0-A304-9F39E16F574E@oracle.com> <87177e44-0b60-c385-afb7-eeedd3d29829@oracle.com> <3A45E5A7-1351-47FD-8633-065CF639BF2A@oracle.com> <8e89ff21-d3d4-3f95-ef54-1aeb55e2aeac@oracle.com> Message-ID: <29ab8f7c-9301-d001-bb6f-7537322abda4@oracle.com> On 1/23/19 2:12 PM, Igor Ignatyev wrote: > Vladimir, > > we can always specify '-ea -esa' in our task definitions if we want, but baking them into generated tests will affect all executions of these tests, and seems to be inadequate. as testing jaotc tool isn't the goal of these tests, I'd prefer not to add more jaotc-specific than necessary. what do you think? I only suggested to add these flags to command line in AotTestGeneratorsFactory.java where jaotc is used. It may help debug intermittent failures if there are issues with AOTed code. But I am fine if these tests run in Mach5 with these flags set in task definition when jaotc is used. Vladimir > > -- Igor > >> On Jan 23, 2019, at 10:36 AM, Vladimir Kozlov wrote: >> >> On 1/23/19 9:46 AM, Igor Ignatyev wrote: >>> Hi Vladimir, >>> thanks for your review! >>>>> - make sure jaotc is always run w/ X-mixed regardless of "external" vm flags >>>> Also would be nice to add -ea -esa too if they are not used yet. >>> -Xmixed was added to "speed-up" compilation in case external flags has Xcomp. from my point of view, it's better if '-ea -esa' are provided during test runs, as in some cases you might want to run jaotc w/o them. >> >> I got it about -Xmixed. I put aot tests to noxcomp group for our CI testing. >> >> To always use '-ea -esa' with jaotc during testing is good I think. We have them by default in our hs-comp testing tasks definitions. I thought to have them here is also good if this testing does not use flags from task definitions. >>>>> - add -Xcomp to all the generator tests >>>> why you need -Xcomp? >>> b/c jit-tester is supposed to compare results of interpreted execution (saved in .gold.* files) w/ the result from compilers, so generated tests must be run w/ Xcomp, otherwise we will comparing one results from interpreter w/ the results from interpreter. >> >> Got it. >> >>>>> - use tmp directory for class files >>>> Will it work on Windows which has issues with tmp dir? There was discussion about it recently. >>> we use tmp dir only in the test generator which can be run on any platform, it doesn't have to be run on the same host/platform as actual test execution. in fact the preferred usage model of jit-tester is to pre-generate test corpus and reuse it. >> >> Okay. >> >> thanks, >> Vladimir >> >>> Thanks, >>> -- Igor >>>> On Jan 23, 2019, at 9:24 AM, Vladimir Kozlov wrote: >>>> >>>> On 1/22/19 6:26 PM, Igor Ignatyev wrote: >>>>> http://cr.openjdk.java.net/~iignatyev//8158646/webrev.00/index.html >>>>>> 64 lines changed: 23 ins; 6 del; 35 mod; >>>>> Hi all, >>>>> could you please review this small fix for jit-tester? >>>>> the bug was caused by TypeList not being fully cleared b/w generation. we only remove classes which starts w/ "Test_", so we don't remove "basic" classes, e.g. Runnable, and don't clean their 'children'. in most cases, this is fine, as each generation will use only its own Test_N_* classes so having Test_M_* (M != N) classes as Runnable's children has no impact besides garbage in memory, however, if we get an error during Test_N generation we will redo generation for the same N, and in such cases, previous children of "basic" classes (read Runnable) cause incompatible types. the fix is to remove "Test_" classes from the children. >>>> >>>> ok >>>> >>>>> besides the fix for the bug, the patch also include the following small clean ups: >>>>> - use DIST_JAR var value instead of 'JAR' string constant in makefile >>>> >>>> ok >>>> >>>>> - change default target testbase dir >>>> >>>> ok >>>> >>>>> - make sure jaotc is always run w/ X-mixed regardless of "external" vm flags >>>> >>>> Also would be nice to add -ea -esa too if they are not used yet. >>>> >>>>> - add -Xcomp to all the generator tests >>>> >>>> why you need -Xcomp? >>>> >>>>> - use tmp directory for class files >>>> >>>> Will it work on Windows which has issues with tmp dir? There was discussion about it recently. >>>> >>>>> - check javac error code >>>> >>>> ok >>>> >>>>> - optimize getAllParents/getAllChildren to call getAllParents/getAllChildren only if a class hasn't been added yet >>>> >>>> ok >>>> >>>> Thanks, >>>> Vladimir >>>> >>>>> webrev: http://cr.openjdk.java.net/~iignatyev//8158646/webrev.00/index.html >>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8158646 >>>>> testing: generated 1000 tests, all can be compiled and work fine >>>>> Thanks, >>>>> -- Igor > From vladimir.kozlov at oracle.com Wed Jan 23 22:41:10 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Jan 2019 14:41:10 -0800 Subject: RFR(S) [12] : 8216180 : [AOT] compiler/intrinsics/bigInteger/TestMulAdd.java crashed with AOT enabled In-Reply-To: <74C74200-60AD-4C99-913B-A06752EBE965@oracle.com> References: <28BD4C78-3A3F-4B36-8D93-B9F520B08E34@oracle.com> <74C74200-60AD-4C99-913B-A06752EBE965@oracle.com> Message-ID: <793023c7-61d9-8bf1-09f9-f046ea7c4d36@oracle.com> It should be AOTLoader::heaps_count(). Otherwise it is very good. Thanks, Vladimir On 1/23/19 2:13 PM, Igor Ignatyev wrote: > Vladimir, > > I gave it a bit more thoughts, and am inclining to agree that replying on env. variables is indeed fragile. so I've > decided to go w/ a new WB method --http://cr.openjdk.java.net/~iignatyev//8216180/webrev.01/index.html > > (testing is in-progress) > > Thanks, > -- Igor > >> On Jan 23, 2019, at 10:34 AM, Igor Ignatyev > wrote: >> >> >> >>> On Jan 23, 2019, at 10:28 AM, Vladimir Kozlov > wrote: >>> >>> I assume tests don't use -XX:AOTLibrary= flag but load them from default location in JDK. Right? >> that's correct, the runs where the test fails used libraries from the default location. >> >>> Can we instead skip such tests if any AOT library is loaded? We can check it with PrintAOT or new ouptu or new WB API. >>> Relying on env variable is not robust I think. >> >> these env variables are part of run-test "official" contract, so I believe it's safe to use them. the only problem I >> see w/ such approach is runs w/ jdk-images which include AOT'ed modules in them, but there are no such images, and w/ >> current state of AOT, they aren't actually possible however if you have strong objections, I can look into other ways >> to retrieve this information. >> >> -- Igor >>> >>> Thanks, >>> Vladimir >>> >>> On 1/23/19 9:36 AM, Igor Ignatyev wrote: >>>> http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html >>>>> 32 lines changed: 32 ins; 0 del; 0 mod; >>>> Hi all, >>>> could you please review this small patch which exclude TestMulAdd test from execution if java.base is AOT'ed compiled? >>>> the test disables some intrinsics, and if it's run w/ AOT'ed java.base there these intrinsics are enabled (which is >>>> the most common, if not the only, case) we get crash. the fix introduces new @requires value -- vm.aot.modules which >>>> contains comma-separated list of AOT'ed modules and use it to skip this test if java.base is one of them. >>>> webrev: http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8216180 >>>> testing: compiler/intrinsics/bigInteger tests on linux-x64 w/ JTREG=AOT_MODULES=java.base, >>>> TEST_OPTS_AOT_MODULES=java.base and w/o any extra make args >>>> Thanks, >>>> -- Igor >> > From igor.ignatyev at oracle.com Wed Jan 23 22:44:27 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 23 Jan 2019 14:44:27 -0800 Subject: RFR(S) [12] : 8216180 : [AOT] compiler/intrinsics/bigInteger/TestMulAdd.java crashed with AOT enabled In-Reply-To: <793023c7-61d9-8bf1-09f9-f046ea7c4d36@oracle.com> References: <28BD4C78-3A3F-4B36-8D93-B9F520B08E34@oracle.com> <74C74200-60AD-4C99-913B-A06752EBE965@oracle.com> <793023c7-61d9-8bf1-09f9-f046ea7c4d36@oracle.com> Message-ID: you meant libraries_count, right? -- Igor > On Jan 23, 2019, at 2:41 PM, Vladimir Kozlov wrote: > > It should be AOTLoader::heaps_count(). Otherwise it is very good. > > Thanks, > Vladimir > > On 1/23/19 2:13 PM, Igor Ignatyev wrote: >> Vladimir, >> I gave it a bit more thoughts, and am inclining to agree that replying on env. variables is indeed fragile. so I've decided to go w/ a new WB method --http://cr.openjdk.java.net/~iignatyev//8216180/webrev.01/index.html >> (testing is in-progress) >> Thanks, >> -- Igor >>> On Jan 23, 2019, at 10:34 AM, Igor Ignatyev > wrote: >>> >>> >>> >>>> On Jan 23, 2019, at 10:28 AM, Vladimir Kozlov > wrote: >>>> >>>> I assume tests don't use -XX:AOTLibrary= flag but load them from default location in JDK. Right? >>> that's correct, the runs where the test fails used libraries from the default location. >>> >>>> Can we instead skip such tests if any AOT library is loaded? We can check it with PrintAOT or new ouptu or new WB API. >>>> Relying on env variable is not robust I think. >>> >>> these env variables are part of run-test "official" contract, so I believe it's safe to use them. the only problem I see w/ such approach is runs w/ jdk-images which include AOT'ed modules in them, but there are no such images, and w/ current state of AOT, they aren't actually possible however if you have strong objections, I can look into other ways to retrieve this information. >>> >>> -- Igor >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 1/23/19 9:36 AM, Igor Ignatyev wrote: >>>>> http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html >>>>>> 32 lines changed: 32 ins; 0 del; 0 mod; >>>>> Hi all, >>>>> could you please review this small patch which exclude TestMulAdd test from execution if java.base is AOT'ed compiled? >>>>> the test disables some intrinsics, and if it's run w/ AOT'ed java.base there these intrinsics are enabled (which is the most common, if not the only, case) we get crash. the fix introduces new @requires value -- vm.aot.modules which contains comma-separated list of AOT'ed modules and use it to skip this test if java.base is one of them. >>>>> webrev: http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html >>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8216180 >>>>> testing: compiler/intrinsics/bigInteger tests on linux-x64 w/ JTREG=AOT_MODULES=java.base, TEST_OPTS_AOT_MODULES=java.base and w/o any extra make args >>>>> Thanks, >>>>> -- Igor >>> From vladimir.kozlov at oracle.com Wed Jan 23 22:51:19 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Jan 2019 14:51:19 -0800 Subject: RFR(S) [12] : 8216180 : [AOT] compiler/intrinsics/bigInteger/TestMulAdd.java crashed with AOT enabled In-Reply-To: References: <28BD4C78-3A3F-4B36-8D93-B9F520B08E34@oracle.com> <74C74200-60AD-4C99-913B-A06752EBE965@oracle.com> <793023c7-61d9-8bf1-09f9-f046ea7c4d36@oracle.com> Message-ID: <666b08d3-6074-833f-4f6f-e3db6d488f5b@oracle.com> No, heaps_count(). Some libraries could be invalid (AOT compilation config was different, for example) and are not used: http://hg.openjdk.java.net/jdk/jdk/file/e3ed96060992/src/hotspot/share/aot/aotLoader.cpp#l190 May be we should just check UseAOT flag? If no AOT libraries are loaded ot they are invalid UseAOT will be set to false. Vladimir On 1/23/19 2:44 PM, Igor Ignatyev wrote: > you meant libraries_count, right? > > -- Igor > >> On Jan 23, 2019, at 2:41 PM, Vladimir Kozlov wrote: >> >> It should be AOTLoader::heaps_count(). Otherwise it is very good. >> >> Thanks, >> Vladimir >> >> On 1/23/19 2:13 PM, Igor Ignatyev wrote: >>> Vladimir, >>> I gave it a bit more thoughts, and am inclining to agree that replying on env. variables is indeed fragile. so I've decided to go w/ a new WB method --http://cr.openjdk.java.net/~iignatyev//8216180/webrev.01/index.html >>> (testing is in-progress) >>> Thanks, >>> -- Igor >>>> On Jan 23, 2019, at 10:34 AM, Igor Ignatyev > wrote: >>>> >>>> >>>> >>>>> On Jan 23, 2019, at 10:28 AM, Vladimir Kozlov > wrote: >>>>> >>>>> I assume tests don't use -XX:AOTLibrary= flag but load them from default location in JDK. Right? >>>> that's correct, the runs where the test fails used libraries from the default location. >>>> >>>>> Can we instead skip such tests if any AOT library is loaded? We can check it with PrintAOT or new ouptu or new WB API. >>>>> Relying on env variable is not robust I think. >>>> >>>> these env variables are part of run-test "official" contract, so I believe it's safe to use them. the only problem I see w/ such approach is runs w/ jdk-images which include AOT'ed modules in them, but there are no such images, and w/ current state of AOT, they aren't actually possible however if you have strong objections, I can look into other ways to retrieve this information. >>>> >>>> -- Igor >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 1/23/19 9:36 AM, Igor Ignatyev wrote: >>>>>> http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html >>>>>>> 32 lines changed: 32 ins; 0 del; 0 mod; >>>>>> Hi all, >>>>>> could you please review this small patch which exclude TestMulAdd test from execution if java.base is AOT'ed compiled? >>>>>> the test disables some intrinsics, and if it's run w/ AOT'ed java.base there these intrinsics are enabled (which is the most common, if not the only, case) we get crash. the fix introduces new @requires value -- vm.aot.modules which contains comma-separated list of AOT'ed modules and use it to skip this test if java.base is one of them. >>>>>> webrev: http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html >>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8216180 >>>>>> testing: compiler/intrinsics/bigInteger tests on linux-x64 w/ JTREG=AOT_MODULES=java.base, TEST_OPTS_AOT_MODULES=java.base and w/o any extra make args >>>>>> Thanks, >>>>>> -- Igor >>>> > From igor.ignatyev at oracle.com Thu Jan 24 00:07:46 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 23 Jan 2019 16:07:46 -0800 Subject: RFR(S) [12] : 8216180 : [AOT] compiler/intrinsics/bigInteger/TestMulAdd.java crashed with AOT enabled In-Reply-To: <666b08d3-6074-833f-4f6f-e3db6d488f5b@oracle.com> References: <28BD4C78-3A3F-4B36-8D93-B9F520B08E34@oracle.com> <74C74200-60AD-4C99-913B-A06752EBE965@oracle.com> <793023c7-61d9-8bf1-09f9-f046ea7c4d36@oracle.com> <666b08d3-6074-833f-4f6f-e3db6d488f5b@oracle.com> Message-ID: <6E46BBB8-2DE4-43E7-A698-7F67F564A27E@oracle.com> UseAOT will be changed to false, only if UseAOT wasn't specified in the command line, so we can't use it reliably to determine if there are any loaded AOT libraries. I've changed WB_AotLibrariesCount to use AOTLoader::heaps_count, retested the fix locally, it works fine. testing it in mach5. Thanks, -- Igor > On Jan 23, 2019, at 2:51 PM, Vladimir Kozlov wrote: > > No, heaps_count(). Some libraries could be invalid (AOT compilation config was different, for example) and are not used: > > http://hg.openjdk.java.net/jdk/jdk/file/e3ed96060992/src/hotspot/share/aot/aotLoader.cpp#l190 > > May be we should just check UseAOT flag? If no AOT libraries are loaded ot they are invalid UseAOT will be set to false. > > Vladimir > > On 1/23/19 2:44 PM, Igor Ignatyev wrote: >> you meant libraries_count, right? >> -- Igor >>> On Jan 23, 2019, at 2:41 PM, Vladimir Kozlov wrote: >>> >>> It should be AOTLoader::heaps_count(). Otherwise it is very good. >>> >>> Thanks, >>> Vladimir >>> >>> On 1/23/19 2:13 PM, Igor Ignatyev wrote: >>>> Vladimir, >>>> I gave it a bit more thoughts, and am inclining to agree that replying on env. variables is indeed fragile. so I've decided to go w/ a new WB method --http://cr.openjdk.java.net/~iignatyev//8216180/webrev.01/index.html >>>> (testing is in-progress) >>>> Thanks, >>>> -- Igor >>>>> On Jan 23, 2019, at 10:34 AM, Igor Ignatyev > wrote: >>>>> >>>>> >>>>> >>>>>> On Jan 23, 2019, at 10:28 AM, Vladimir Kozlov > wrote: >>>>>> >>>>>> I assume tests don't use -XX:AOTLibrary= flag but load them from default location in JDK. Right? >>>>> that's correct, the runs where the test fails used libraries from the default location. >>>>> >>>>>> Can we instead skip such tests if any AOT library is loaded? We can check it with PrintAOT or new ouptu or new WB API. >>>>>> Relying on env variable is not robust I think. >>>>> >>>>> these env variables are part of run-test "official" contract, so I believe it's safe to use them. the only problem I see w/ such approach is runs w/ jdk-images which include AOT'ed modules in them, but there are no such images, and w/ current state of AOT, they aren't actually possible however if you have strong objections, I can look into other ways to retrieve this information. >>>>> >>>>> -- Igor >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 1/23/19 9:36 AM, Igor Ignatyev wrote: >>>>>>> http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html >>>>>>>> 32 lines changed: 32 ins; 0 del; 0 mod; >>>>>>> Hi all, >>>>>>> could you please review this small patch which exclude TestMulAdd test from execution if java.base is AOT'ed compiled? >>>>>>> the test disables some intrinsics, and if it's run w/ AOT'ed java.base there these intrinsics are enabled (which is the most common, if not the only, case) we get crash. the fix introduces new @requires value -- vm.aot.modules which contains comma-separated list of AOT'ed modules and use it to skip this test if java.base is one of them. >>>>>>> webrev: http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html >>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8216180 >>>>>>> testing: compiler/intrinsics/bigInteger tests on linux-x64 w/ JTREG=AOT_MODULES=java.base, TEST_OPTS_AOT_MODULES=java.base and w/o any extra make args >>>>>>> Thanks, >>>>>>> -- Igor >>>>> From igor.veresov at oracle.com Thu Jan 24 00:15:02 2019 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 23 Jan 2019 16:15:02 -0800 Subject: [12] RFR(XS) 8217678: [AOT] jck Math/IncrementExact and Math/DecrementExact tests fail when test classes are AOTed Message-ID: <0C581623-E5D1-4009-8B1B-E21023DE408A@oracle.com> When fixing JDK-8196568 I must?ve thought that a folding of an exact math node would produce a deopt. But obviously it doesn?t. Webrev: http://cr.openjdk.java.net/~iveresov/8217678/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8217678 Please review and approve. Thanks! igor From vladimir.kozlov at oracle.com Thu Jan 24 00:16:29 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Jan 2019 16:16:29 -0800 Subject: RFR(S) [12] : 8216180 : [AOT] compiler/intrinsics/bigInteger/TestMulAdd.java crashed with AOT enabled In-Reply-To: <6E46BBB8-2DE4-43E7-A698-7F67F564A27E@oracle.com> References: <28BD4C78-3A3F-4B36-8D93-B9F520B08E34@oracle.com> <74C74200-60AD-4C99-913B-A06752EBE965@oracle.com> <793023c7-61d9-8bf1-09f9-f046ea7c4d36@oracle.com> <666b08d3-6074-833f-4f6f-e3db6d488f5b@oracle.com> <6E46BBB8-2DE4-43E7-A698-7F67F564A27E@oracle.com> Message-ID: On 1/23/19 4:07 PM, Igor Ignatyev wrote: > UseAOT will be changed to false, only if UseAOT wasn't specified in the command line, so we can't use it reliably to determine if there are any loaded AOT libraries. Okay. > > I've changed WB_AotLibrariesCount to use AOTLoader::heaps_count, retested the fix locally, it works fine. testing it in mach5. Good. Thanks, Vladimir > > Thanks, > -- Igor > >> On Jan 23, 2019, at 2:51 PM, Vladimir Kozlov wrote: >> >> No, heaps_count(). Some libraries could be invalid (AOT compilation config was different, for example) and are not used: >> >> http://hg.openjdk.java.net/jdk/jdk/file/e3ed96060992/src/hotspot/share/aot/aotLoader.cpp#l190 >> >> May be we should just check UseAOT flag? If no AOT libraries are loaded ot they are invalid UseAOT will be set to false. >> >> Vladimir >> >> On 1/23/19 2:44 PM, Igor Ignatyev wrote: >>> you meant libraries_count, right? >>> -- Igor >>>> On Jan 23, 2019, at 2:41 PM, Vladimir Kozlov wrote: >>>> >>>> It should be AOTLoader::heaps_count(). Otherwise it is very good. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 1/23/19 2:13 PM, Igor Ignatyev wrote: >>>>> Vladimir, >>>>> I gave it a bit more thoughts, and am inclining to agree that replying on env. variables is indeed fragile. so I've decided to go w/ a new WB method --http://cr.openjdk.java.net/~iignatyev//8216180/webrev.01/index.html >>>>> (testing is in-progress) >>>>> Thanks, >>>>> -- Igor >>>>>> On Jan 23, 2019, at 10:34 AM, Igor Ignatyev > wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On Jan 23, 2019, at 10:28 AM, Vladimir Kozlov > wrote: >>>>>>> >>>>>>> I assume tests don't use -XX:AOTLibrary= flag but load them from default location in JDK. Right? >>>>>> that's correct, the runs where the test fails used libraries from the default location. >>>>>> >>>>>>> Can we instead skip such tests if any AOT library is loaded? We can check it with PrintAOT or new ouptu or new WB API. >>>>>>> Relying on env variable is not robust I think. >>>>>> >>>>>> these env variables are part of run-test "official" contract, so I believe it's safe to use them. the only problem I see w/ such approach is runs w/ jdk-images which include AOT'ed modules in them, but there are no such images, and w/ current state of AOT, they aren't actually possible however if you have strong objections, I can look into other ways to retrieve this information. >>>>>> >>>>>> -- Igor >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 1/23/19 9:36 AM, Igor Ignatyev wrote: >>>>>>>> http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html >>>>>>>>> 32 lines changed: 32 ins; 0 del; 0 mod; >>>>>>>> Hi all, >>>>>>>> could you please review this small patch which exclude TestMulAdd test from execution if java.base is AOT'ed compiled? >>>>>>>> the test disables some intrinsics, and if it's run w/ AOT'ed java.base there these intrinsics are enabled (which is the most common, if not the only, case) we get crash. the fix introduces new @requires value -- vm.aot.modules which contains comma-separated list of AOT'ed modules and use it to skip this test if java.base is one of them. >>>>>>>> webrev: http://cr.openjdk.java.net/~iignatyev//8216180/webrev.00/index.html >>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8216180 >>>>>>>> testing: compiler/intrinsics/bigInteger tests on linux-x64 w/ JTREG=AOT_MODULES=java.base, TEST_OPTS_AOT_MODULES=java.base and w/o any extra make args >>>>>>>> Thanks, >>>>>>>> -- Igor >>>>>> > From vladimir.kozlov at oracle.com Thu Jan 24 00:20:04 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Jan 2019 16:20:04 -0800 Subject: [12] RFR(XS) 8217678: [AOT] jck Math/IncrementExact and Math/DecrementExact tests fail when test classes are AOTed In-Reply-To: <0C581623-E5D1-4009-8B1B-E21023DE408A@oracle.com> References: <0C581623-E5D1-4009-8B1B-E21023DE408A@oracle.com> Message-ID: <779a9eb8-a5d8-bd52-43e7-fa3382c58faf@oracle.com> Good. Please file push request since it has to be pushed into JDK 12: http://openjdk.java.net/jeps/3#Fix-Request-Process Thanks, Vladimir On 1/23/19 4:15 PM, Igor Veresov wrote: > When fixing JDK-8196568 I must?ve thought that a folding of an exact math node would produce a deopt. But obviously it doesn?t. > Webrev: http://cr.openjdk.java.net/~iveresov/8217678/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8217678 > > > Please review and approve. > > Thanks! > igor > > > From igor.veresov at oracle.com Thu Jan 24 00:33:06 2019 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 23 Jan 2019 16:33:06 -0800 Subject: [12] RFR(XS) 8217678: [AOT] jck Math/IncrementExact and Math/DecrementExact tests fail when test classes are AOTed In-Reply-To: <779a9eb8-a5d8-bd52-43e7-fa3382c58faf@oracle.com> References: <0C581623-E5D1-4009-8B1B-E21023DE408A@oracle.com> <779a9eb8-a5d8-bd52-43e7-fa3382c58faf@oracle.com> Message-ID: <8BA6416C-9FED-4D57-A757-6D4B681BC986@oracle.com> Thanks for the review. I?ve added the ?Fix Request? to the JBS issue. igor > On Jan 23, 2019, at 4:20 PM, Vladimir Kozlov wrote: > > Good. > > Please file push request since it has to be pushed into JDK 12: > http://openjdk.java.net/jeps/3#Fix-Request-Process > > Thanks, > Vladimir > > On 1/23/19 4:15 PM, Igor Veresov wrote: >> When fixing JDK-8196568 I must?ve thought that a folding of an exact math node would produce a deopt. But obviously it doesn?t. >> Webrev: http://cr.openjdk.java.net/~iveresov/8217678/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8217678 >> Please review and approve. >> Thanks! >> igor -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kozlov at oracle.com Thu Jan 24 00:35:17 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Jan 2019 16:35:17 -0800 Subject: [12] RFR(XS) 8217678: [AOT] jck Math/IncrementExact and Math/DecrementExact tests fail when test classes are AOTed In-Reply-To: <8BA6416C-9FED-4D57-A757-6D4B681BC986@oracle.com> References: <0C581623-E5D1-4009-8B1B-E21023DE408A@oracle.com> <779a9eb8-a5d8-bd52-43e7-fa3382c58faf@oracle.com> <8BA6416C-9FED-4D57-A757-6D4B681BC986@oracle.com> Message-ID: Approved. Vladimir On 1/23/19 4:33 PM, Igor Veresov wrote: > Thanks for the review. I?ve added the ?Fix Request? to the JBS issue. > > igor > > > >> On Jan 23, 2019, at 4:20 PM, Vladimir Kozlov > wrote: >> >> Good. >> >> Please file push request since it has to be pushed into JDK 12: >> http://openjdk.java.net/jeps/3#Fix-Request-Process >> >> Thanks, >> Vladimir >> >> On 1/23/19 4:15 PM, Igor Veresov wrote: >>> When fixing JDK-8196568 ?I must?ve thought that a folding of an exact math node would produce a deopt. But obviously >>> it doesn?t. >>> Webrev: http://cr.openjdk.java.net/~iveresov/8217678/webrev.00/ >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8217678 >>> Please review and approve. >>> Thanks! >>> igor > From igor.ignatyev at oracle.com Thu Jan 24 01:08:19 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 23 Jan 2019 17:08:19 -0800 Subject: RFR(T) [12] : 8167276 : jvmci/compilerToVM/MaterializeVirtualObjectTest.java fails with -XX:-EliminateAllocations Message-ID: http://cr.openjdk.java.net/~iignatyev//8167276/webrev.02/index.html > 8 lines changed: 5 ins; 0 del; 3 mod; Hi all, could you please review this tiny patch which excludes MaterializeVirtualObjectTest test from runs w/ disabled EliminateAllocations? webrev: http://cr.openjdk.java.net/~iignatyev//8167276/webrev.02/index.html JBS: https://bugs.openjdk.java.net/browse/JDK-8167276 testing: the test w/ -XX:-EliminateAllocations, XX:+EliminateAllocations and w/o any extra flags Thanks, -- Igor From igor.ignatyev at oracle.com Thu Jan 24 01:10:36 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 23 Jan 2019 17:10:36 -0800 Subject: RFR(T)[12]: 8150757 : [TESTBUG] compiler/ciReplay/TestVM.sh and compiler/ciReplay/TestVM_no_comp_level.sh fail when no compilations are happening Message-ID: http://cr.openjdk.java.net/~iignatyev//8150757/webrev.00/index.html > 7 lines changed: 5 ins; 0 del; 2 mod; Hi all, could you please review this tiny fix for compiler/ciReplay/ tests? these tests try to crash JVM by running '-Xcomp -XX:CICrashAt=1 -version', but if they are run w/ AOT'ed java.base, there is nothing else to compile in '-version', so JVM doesn't crash and the tests fail. the fix replaces usage of -version w/ a class w/ empty main method, so crashes will happen w/ or w/o AOT'ed java.base. webrev: http://cr.openjdk.java.net/~iignatyev//8150757/webrev.00/index.html JBS: https://bugs.openjdk.java.net/browse/JDK-8150757 testing: compiler/ciReplay/ tests w/ and w/o AOT'ed java.base Thanks, -- Igor From felix.yang at huawei.com Thu Jan 24 01:22:47 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Thu, 24 Jan 2019 01:22:47 +0000 Subject: [RFR] 8217359: C2 compiler triggers SIGSEGV after tranformation in ConvI2LNode::Ideal In-Reply-To: <35e45132-2187-16c8-22fb-17e61a117941@oracle.com> References: <32982b31-3a91-58fb-a6b8-b1cd9f7cdb41@oracle.com> <35e45132-2187-16c8-22fb-17e61a117941@oracle.com> Message-ID: Thanks Tobias and Vladimir. This is pushed as : http://hg.openjdk.java.net/jdk/jdk/rev/44f41693631f http://hg.openjdk.java.net/jdk/jdk12/rev/44f41693631f Felix > > Changes are good. > > I approved the fix for jdk12 as HotSpot group lead. > > Thanks, > Vladimir > > > On 1/22/19 4:03 AM, Yangfei (Felix) wrote: > > Hi, > > > > I have updated the JBS accordingly, requesting approval for integration > into JDK 12. > > May I have another reviewer please? > > > > Thanks for your help, > > Felix > > > > > >> Hi Felix, > >> > >> this looks good to me, thanks for adding the test! > >> > >> A second review would be good. In the meantime, please request approval > for > >> integration into JDK 12 > >> according to: > >> http://openjdk.java.net/jeps/3#Fix-Request-Process > >> > >> Thanks, > >> Tobias > >> > >> On 22.01.19 02:17, Yangfei (Felix) wrote: > >>> Hi, > >>> > >>> Thanks for reviewing. The regression test is added. > >>> New webrev: http://cr.openjdk.java.net/~fyang/8217359/webrev.01/ > >>> This is committed to the submit repo: > >> http://hg.openjdk.java.net/jdk/submit/rev/7345adfbc913 > >>> > >>> The email I got shows that it passed the Oralce internal tests: > >>> ================================================= > >>> Build Details: 2019-01-21-1210078.felix.yang.source > >>> 0 Failed Tests > >>> Mach5 Tasks Results Summary > >>> ? EXECUTED_WITH_FAILURE: 0 > >>> ? NA: 0 > >>> ? KILLED: 0 > >>> ? UNABLE_TO_RUN: 0 > >>> ? PASSED: 76 > >>> ? FAILED: 0 > >>> ================================================= > >>> > >>> OK to push? > >>> > >>> Thanks for your help, > >>> Felix > >>> > >>>> > >>>> Hi Felix, > >>>> > >>>> Could you please add the regression test as jtreg test? > >>>> > >>>> Otherwise, the fix looks reasonable to me. Nice analysis! > >>>> > >>>> Thanks, > >>>> Tobias > >>> From vladimir.kozlov at oracle.com Thu Jan 24 01:39:13 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Jan 2019 17:39:13 -0800 Subject: RFR(T) [12] : 8167276 : jvmci/compilerToVM/MaterializeVirtualObjectTest.java fails with -XX:-EliminateAllocations In-Reply-To: References: Message-ID: <305920e9-f487-fe22-4955-2142cc6c5430@oracle.com> Good. Thanks, Vladimir On 1/23/19 5:08 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8167276/webrev.02/index.html >> 8 lines changed: 5 ins; 0 del; 3 mod; > > Hi all, > > could you please review this tiny patch which excludes MaterializeVirtualObjectTest test from runs w/ disabled EliminateAllocations? > > webrev: http://cr.openjdk.java.net/~iignatyev//8167276/webrev.02/index.html > JBS: https://bugs.openjdk.java.net/browse/JDK-8167276 > testing: the test w/ -XX:-EliminateAllocations, XX:+EliminateAllocations and w/o any extra flags > > Thanks, > -- Igor > From vladimir.kozlov at oracle.com Thu Jan 24 01:41:51 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Jan 2019 17:41:51 -0800 Subject: RFR(T)[12]: 8150757 : [TESTBUG] compiler/ciReplay/TestVM.sh and compiler/ciReplay/TestVM_no_comp_level.sh fail when no compilations are happening In-Reply-To: References: Message-ID: <96ea1b63-0bca-3fef-3cd6-89a749dddbbd@oracle.com> Looks good. Thanks, Vladimir On 1/23/19 5:10 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8150757/webrev.00/index.html >> 7 lines changed: 5 ins; 0 del; 2 mod; > > Hi all, > > could you please review this tiny fix for compiler/ciReplay/ tests? these tests try to crash JVM by running '-Xcomp -XX:CICrashAt=1 -version', but if they are run w/ AOT'ed java.base, there is nothing else to compile in '-version', so JVM doesn't crash and the tests fail. the fix replaces usage of -version w/ a class w/ empty main method, so crashes will happen w/ or w/o AOT'ed java.base. > > webrev: http://cr.openjdk.java.net/~iignatyev//8150757/webrev.00/index.html > JBS: https://bugs.openjdk.java.net/browse/JDK-8150757 > testing: compiler/ciReplay/ tests w/ and w/o AOT'ed java.base > > Thanks, > -- Igor > From felix.yang at huawei.com Thu Jan 24 01:57:16 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Thu, 24 Jan 2019 01:57:16 +0000 Subject: [13] RFR (XS): 8202952: C2: Unexpected dead nodes after matching In-Reply-To: References: <9bf3ac2c-5881-576c-cc64-917cec246f8f@oracle.com> <242b6a41-3db6-2911-1045-1e5eb63ba862@oracle.com> Message-ID: Hi, Since JDK 12 has the same issue, will this fix be integrated into this repo? BTW: I have another simple test case that also triggers the bug. I have put the test on the JBS. Thanks, Felix > > Thanks, Vladimir. > > Best regards, > Vladimir Ivanov > > On 22/01/2019 12:42, Vladimir Kozlov wrote: > > Got it. Good. > > > > thanks, > > Vladimir > > > > On 1/22/19 12:08 PM, Vladimir Ivanov wrote: > >> > >> On 22/01/2019 11:54, Vladimir Kozlov wrote: > >>> The fix is different from what we discussed. > >>> Can you explain how it helps? > >> > >> We discussed adding AddP case to _shared_nodes. > >> > >> Proposed fix achieves similar result with a different approach: > >> > >> ?? * Matcher::clone_address_expressions() marks problematic AddP as > >> shared (based on constant value); > >> > >> ?? * DFA() doesn't construct duplicated State for inner AddP (since > >> it's marked as shared); > >> > >> ?? * Matcher doesn't need to materialize duplicated mach nodes, since > >> it matches inner AddP separately; > >> > >> Best regards, > >> Vladimir Ivanov > >> > >>> On 1/22/19 11:05 AM, Vladimir Ivanov wrote: > >>>> http://cr.openjdk.java.net/~vlivanov/8202952/webrev.00/ > >>>> https://bugs.openjdk.java.net/browse/JDK-8202952 > >>>> > >>>> The crash happens when PhaseCFG encounters a dead MachNode in the > >>>> graph. > >>>> The problematic node is a leftover from matching of an instruction > >>>> with a duplicated memory operand (sarI_mem_CL [1] in that particular > >>>> case). > >>>> > >>>> Address has the following shape [2]: > >>>> ?? AddP (AddP DecodeN (LShiftL ConvI2L ConI)) ConL > >>>> > >>>> It could be subsumed into complex addressing expression, but the > >>>> constant is too large (doesn't fit into immL32). So, matcher has to > >>>> compute inner address expression separately and put it into a register. > >>>> > >>>> Since memory operand is duplicated, 2 copies are materialized during > >>>> matching, but as part of ::Expand() one of the copies is eliminated, > >>>> thus leaving a dead mach node in the IR (for the address expression). > >>>> > >>>> The fix is to adjust Matcher::clone_address_expressions() to avoid > >>>> cloning inner AddP when constant offset is too large. > >>>> > >>>> Testing: hs-precheckin-comp, hs-tier1, hs-tier2 > >>>> > >>>> Best regards, > >>>> Vladimir Ivanov > >>>> > >>>> [1] instruct sarI_mem_CL(memory dst, rcx_RegI shift, rFlagsReg cr) > >>>> %{ > >>>> ?? match(Set dst (StoreI dst (RShiftI (LoadI dst) shift))); > >>>> > >>>> > >>>> [2] > >>>> ??o347 AddP? === _ o2181 o1768 o1769? [[o349 o371 ]] > >>>> ???? o1768 AddP? === _ o2181 o2181 o1765? [[o347 ]] > >>>> ???????? o2181 DecodeN === _ o287? [[o1768 o1768 o327 o347 o327 ]] > >>>> #int[int:>=0]:NotNull:exact * > >>>> ???????? o1765 LShiftL === _ o1761 o60? [[o1768 ]] > >>>> ???????????? o1761 ConvI2L === _ o1741? [[o1765 ]] > >>>> #long:maxint-51..maxint-48 > >>>> ???????????? o60?? ConI? === o0? [[o61 o1765 o1434 o2013 o1631 > o2017 > >>>> o1808? 60 ]]? #int:2 > >>>> ???? o1769 ConL? === o0? [[o347 ]]? #long:-8589932784 From vladimir.x.ivanov at oracle.com Thu Jan 24 02:58:50 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 23 Jan 2019 18:58:50 -0800 Subject: [13] RFR (XS): 8202952: C2: Unexpected dead nodes after matching In-Reply-To: References: <9bf3ac2c-5881-576c-cc64-917cec246f8f@oracle.com> <242b6a41-3db6-2911-1045-1e5eb63ba862@oracle.com> Message-ID: <85ed4b30-c83f-53b4-3f9d-59f53f0d71e2@oracle.com> > Since JDK 12 has the same issue, will this fix be integrated into this repo? I don't plan to integrate it into jdk12. My reasoning is: (1) it's a long-standing bug (from day 1 on x64?) with very low likelihood of exposure * was found only recently using fuzzers * no similar crashes reported before (2) JDK 12 is in RDP2 phase and is open only for P1?P2 bug fixes Though the bug technically meets RDP2 criteria, I don't see it as a critical issue for the release in a late development phase. Best regards, Vladimir Ivanov >> On 22/01/2019 12:42, Vladimir Kozlov wrote: >>> Got it. Good. >>> >>> thanks, >>> Vladimir >>> >>> On 1/22/19 12:08 PM, Vladimir Ivanov wrote: >>>> >>>> On 22/01/2019 11:54, Vladimir Kozlov wrote: >>>>> The fix is different from what we discussed. >>>>> Can you explain how it helps? >>>> >>>> We discussed adding AddP case to _shared_nodes. >>>> >>>> Proposed fix achieves similar result with a different approach: >>>> >>>> ?? * Matcher::clone_address_expressions() marks problematic AddP as >>>> shared (based on constant value); >>>> >>>> ?? * DFA() doesn't construct duplicated State for inner AddP (since >>>> it's marked as shared); >>>> >>>> ?? * Matcher doesn't need to materialize duplicated mach nodes, since >>>> it matches inner AddP separately; >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>> On 1/22/19 11:05 AM, Vladimir Ivanov wrote: >>>>>> http://cr.openjdk.java.net/~vlivanov/8202952/webrev.00/ >>>>>> https://bugs.openjdk.java.net/browse/JDK-8202952 >>>>>> >>>>>> The crash happens when PhaseCFG encounters a dead MachNode in the >>>>>> graph. >>>>>> The problematic node is a leftover from matching of an instruction >>>>>> with a duplicated memory operand (sarI_mem_CL [1] in that particular >>>>>> case). >>>>>> >>>>>> Address has the following shape [2]: >>>>>> ?? AddP (AddP DecodeN (LShiftL ConvI2L ConI)) ConL >>>>>> >>>>>> It could be subsumed into complex addressing expression, but the >>>>>> constant is too large (doesn't fit into immL32). So, matcher has to >>>>>> compute inner address expression separately and put it into a register. >>>>>> >>>>>> Since memory operand is duplicated, 2 copies are materialized during >>>>>> matching, but as part of ::Expand() one of the copies is eliminated, >>>>>> thus leaving a dead mach node in the IR (for the address expression). >>>>>> >>>>>> The fix is to adjust Matcher::clone_address_expressions() to avoid >>>>>> cloning inner AddP when constant offset is too large. >>>>>> >>>>>> Testing: hs-precheckin-comp, hs-tier1, hs-tier2 >>>>>> >>>>>> Best regards, >>>>>> Vladimir Ivanov >>>>>> >>>>>> [1] instruct sarI_mem_CL(memory dst, rcx_RegI shift, rFlagsReg cr) >>>>>> %{ >>>>>> ?? match(Set dst (StoreI dst (RShiftI (LoadI dst) shift))); >>>>>> >>>>>> >>>>>> [2] >>>>>> ??o347 AddP? === _ o2181 o1768 o1769? [[o349 o371 ]] >>>>>> ???? o1768 AddP? === _ o2181 o2181 o1765? [[o347 ]] >>>>>> ???????? o2181 DecodeN === _ o287? [[o1768 o1768 o327 o347 o327 ]] >>>>>> #int[int:>=0]:NotNull:exact * >>>>>> ???????? o1765 LShiftL === _ o1761 o60? [[o1768 ]] >>>>>> ???????????? o1761 ConvI2L === _ o1741? [[o1765 ]] >>>>>> #long:maxint-51..maxint-48 >>>>>> ???????????? o60?? ConI? === o0? [[o61 o1765 o1434 o2013 o1631 >> o2017 >>>>>> o1808? 60 ]]? #int:2 >>>>>> ???? o1769 ConL? === o0? [[o347 ]]? #long:-8589932784 From felix.yang at huawei.com Thu Jan 24 03:21:07 2019 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Thu, 24 Jan 2019 03:21:07 +0000 Subject: [13] RFR (XS): 8202952: C2: Unexpected dead nodes after matching In-Reply-To: <85ed4b30-c83f-53b4-3f9d-59f53f0d71e2@oracle.com> References: <9bf3ac2c-5881-576c-cc64-917cec246f8f@oracle.com> <242b6a41-3db6-2911-1045-1e5eb63ba862@oracle.com> <85ed4b30-c83f-53b4-3f9d-59f53f0d71e2@oracle.com> Message-ID: That sounds reasonable to me. BTW: The test I updated on the JBS is also reduced from a fuzzer test. Thanks, Felix > > > > Since JDK 12 has the same issue, will this fix be integrated into this repo? > > I don't plan to integrate it into jdk12. > > My reasoning is: > > (1) it's a long-standing bug (from day 1 on x64?) with very low > likelihood of exposure > * was found only recently using fuzzers > * no similar crashes reported before > > (2) JDK 12 is in RDP2 phase and is open only for P1?P2 bug fixes > > Though the bug technically meets RDP2 criteria, I don't see it as a > critical issue for the release in a late development phase. > > Best regards, > Vladimir Ivanov From igor.ignatyev at oracle.com Thu Jan 24 04:03:59 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 23 Jan 2019 20:03:59 -0800 Subject: RFR(T) [12] : 8217699 : add java/util/concurrent/CountDownLatch/Basic.java to ProblemList-Xcomp Message-ID: <8A057C09-9FDB-49E8-A319-DD29FE98174E@oracle.com> http://cr.openjdk.java.net/~iignatyev//8217699/webrev.00/index.html > 2 lines changed: 1 ins; 0 del; 1 mod; Hi all, could you please review this trivial fix which puts java/util/concurrent/CountDownLatch/Basic.java tests into ProblemList-Xcomp? webrev: http://cr.openjdk.java.net/~iignatyev//8217699/webrev.00/index.html JBS: https://bugs.openjdk.java.net/browse/JDK-8217699 Thanks, -- Igor From vladimir.kozlov at oracle.com Thu Jan 24 04:23:18 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 23 Jan 2019 20:23:18 -0800 Subject: RFR(T) [12] : 8217699 : add java/util/concurrent/CountDownLatch/Basic.java to ProblemList-Xcomp In-Reply-To: <8A057C09-9FDB-49E8-A319-DD29FE98174E@oracle.com> References: <8A057C09-9FDB-49E8-A319-DD29FE98174E@oracle.com> Message-ID: <54EBFBCF-1CD1-4921-A314-E59C9C7FC009@oracle.com> Good. Thanks Vladimir > On Jan 23, 2019, at 8:03 PM, Igor Ignatyev wrote: > > http://cr.openjdk.java.net/~iignatyev//8217699/webrev.00/index.html >> 2 lines changed: 1 ins; 0 del; 1 mod; > > Hi all, > could you please review this trivial fix which puts java/util/concurrent/CountDownLatch/Basic.java tests into ProblemList-Xcomp? > > webrev: http://cr.openjdk.java.net/~iignatyev//8217699/webrev.00/index.html > JBS: https://bugs.openjdk.java.net/browse/JDK-8217699 > > Thanks, > -- Igor From dean.long at oracle.com Thu Jan 24 05:13:07 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 23 Jan 2019 21:13:07 -0800 Subject: RFR(T) [12] : 8167276 : jvmci/compilerToVM/MaterializeVirtualObjectTest.java fails with -XX:-EliminateAllocations In-Reply-To: References: Message-ID: Looks good. dl On 1/23/19 5:08 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8167276/webrev.02/index.html >> 8 lines changed: 5 ins; 0 del; 3 mod; > Hi all, > > could you please review this tiny patch which excludes MaterializeVirtualObjectTest test from runs w/ disabled EliminateAllocations? > > webrev: http://cr.openjdk.java.net/~iignatyev//8167276/webrev.02/index.html > JBS: https://bugs.openjdk.java.net/browse/JDK-8167276 > testing: the test w/ -XX:-EliminateAllocations, XX:+EliminateAllocations and w/o any extra flags > > Thanks, > -- Igor From dean.long at oracle.com Thu Jan 24 05:14:32 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 23 Jan 2019 21:14:32 -0800 Subject: RFR(T)[12]: 8150757 : [TESTBUG] compiler/ciReplay/TestVM.sh and compiler/ciReplay/TestVM_no_comp_level.sh fail when no compilations are happening In-Reply-To: <96ea1b63-0bca-3fef-3cd6-89a749dddbbd@oracle.com> References: <96ea1b63-0bca-3fef-3cd6-89a749dddbbd@oracle.com> Message-ID: +1 dl On 1/23/19 5:41 PM, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 1/23/19 5:10 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8150757/webrev.00/index.html >>> 7 lines changed: 5 ins; 0 del; 2 mod; >> >> Hi all, >> >> could you please review this tiny fix for compiler/ciReplay/ tests? >> these tests try to crash JVM by running '-Xcomp -XX:CICrashAt=1 >> -version', but if they are run w/ AOT'ed java.base, there is nothing >> else to compile in '-version', so JVM doesn't crash and the tests >> fail. the fix replaces usage of -version w/ a class w/ empty main >> method, so crashes will happen w/ or w/o AOT'ed java.base. >> >> webrev: >> http://cr.openjdk.java.net/~iignatyev//8150757/webrev.00/index.html >> JBS: https://bugs.openjdk.java.net/browse/JDK-8150757 >> testing: compiler/ciReplay/ tests w/ and w/o AOT'ed java.base >> >> Thanks, >> -- Igor >> From jamsheed.c.m at oracle.com Thu Jan 24 06:09:25 2019 From: jamsheed.c.m at oracle.com (Jamsheed) Date: Thu, 24 Jan 2019 11:39:25 +0530 Subject: [12] RFR: 8213825: assert(false) failed: Non-balanced monitor enter/exit! Likely JNI locking In-Reply-To: References: Message-ID: Hi Vladimir, Thanks a lot for the review and approval to push in 12. Best regards, Jamsheed On 1/23/19 10:41 PM, Vladimir Kozlov wrote: > Hi Jamsheed, > > Fix is good. I approved it for JDK 12 push. > > Thanks, > Vladimir > > On 1/23/19 6:08 AM, Jamsheed wrote: >> Hi, >> >> Request for review >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8213825 >> >> webrev: http://cr.openjdk.java.net/~jcm/8213825/webrev.00/index.html >> >> Bug & Fix Desc: >> >> if markword load has sfpt as control i/p(i.e synchronizations near a >> safepoint), it skips sfpt assuming sfptOp wouldn't write to markword >> memory >> fix: not to skip sfpt for markword loads. >> >> tests: hs-tier1-5,? hs-precheckin-comp >> >> Best regards, >> >> Jamsheed >> From jamsheed.c.m at oracle.com Thu Jan 24 06:15:26 2019 From: jamsheed.c.m at oracle.com (Jamsheed) Date: Thu, 24 Jan 2019 11:45:26 +0530 Subject: [12] RFR: 8213825: assert(false) failed: Non-balanced monitor enter/exit! Likely JNI locking In-Reply-To: <257d62c0-c0f9-394e-1cbb-0f33b3a1d365@oracle.com> References: <257d62c0-c0f9-394e-1cbb-0f33b3a1d365@oracle.com> Message-ID: Thanks a lot for the review, Dean. Best regards, Jamsheed On 1/24/19 1:54 AM, dean.long at oracle.com wrote: > Looks good to me too.? Nice job tracking this down, Jamsheed! > > dl > > On 1/23/19 9:11 AM, Vladimir Kozlov wrote: >> Hi Jamsheed, >> >> Fix is good. I approved it for JDK 12 push. >> >> Thanks, >> Vladimir >> >> On 1/23/19 6:08 AM, Jamsheed wrote: >>> Hi, >>> >>> Request for review >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8213825 >>> >>> webrev: http://cr.openjdk.java.net/~jcm/8213825/webrev.00/index.html >>> >>> Bug & Fix Desc: >>> >>> if markword load has sfpt as control i/p(i.e synchronizations near a >>> safepoint), it skips sfpt assuming sfptOp wouldn't write to markword >>> memory >>> fix: not to skip sfpt for markword loads. >>> >>> tests: hs-tier1-5,? hs-precheckin-comp >>> >>> Best regards, >>> >>> Jamsheed >>> > From fairoz.matte at oracle.com Thu Jan 24 07:14:03 2019 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Wed, 23 Jan 2019 23:14:03 -0800 (PST) Subject: [13] RFR(S): 8209951 : Problematic sparc intrinsic: com.sun.crypto.provider.CipherBlockChaining In-Reply-To: <323b7338-d507-4850-ab53-4a5295d7b62f@default> References: <9d49a648-50e6-30cd-5705-2997f658d8e8@oracle.com> <794dcbd2-60e0-e9f4-f9b5-f789e45d373a@oracle.com> <323b7338-d507-4850-ab53-4a5295d7b62f@default> Message-ID: Hi, This crash is very random and to exercise AES stability adding a unit testcase. Thanks Sean Coffey for bringing this into my notice. I have updated webrev and kindly review http://cr.openjdk.java.net/~fmatte/8209951/webrev.01/ Note: Crash is only observed on JDK 8 with Sparc Solaris 10 machine after 3_000+ iterations. In the test case there is loop for 5_000 iterations and running in -Xbatch making it more predictable. Thanks, Fairoz > -----Original Message----- > From: Fairoz Matte > Sent: Wednesday, January 23, 2019 8:50 AM > To: Vladimir Kozlov ; hotspot-compiler- > dev at openjdk.java.net > Subject: RE: [13] RFR(S): 8209951 : Problematic sparc intrinsic: > com.sun.crypto.provider.CipherBlockChaining > > Thanks Tobias and Vladimir for review. > > Thanks, > Fairoz > > > -----Original Message----- > > From: Vladimir Kozlov > > Sent: Tuesday, January 22, 2019 10:27 PM > > To: Fairoz Matte ; hotspot-compiler- > > dev at openjdk.java.net > > Subject: Re: [13] RFR(S): 8209951 : Problematic sparc intrinsic: > > com.sun.crypto.provider.CipherBlockChaining > > > > Yes, it is good. > > > > Thanks, > > Vladimir > > > > On 1/22/19 12:22 AM, Tobias Hartmann wrote: > > > Hi Fairoz, > > > > > > this looks good to me. > > > > > > Thanks, > > > Tobias > > > > > > On 22.01.19 04:35, Fairoz Matte wrote: > > >> Hi, > > >> > > >> Please review the following patch, > > >> JBS bug - https://bugs.openjdk.java.net/browse/JDK-8209951 > > >> Webrev - http://cr.openjdk.java.net/~fmatte/8209951/webrev.00/ > > >> > > >> During the call to assembled stub code > > >> generate_cipherBlockChaining_decryptAESCrypt_Parallel() > > >> there was reference to G6 register used for temporary storage of > > >> F50, as G6 is not saved on stack it was resulting in garbage during > retrieval. > > >> > > >> Solution is to use unused local register (L6) for temporary storage > > >> and > > retrieval of F50. > > >> > > >> Thanks, > > >> Fairoz > > >> From tobias.hartmann at oracle.com Thu Jan 24 08:15:07 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 24 Jan 2019 09:15:07 +0100 Subject: [13] RFR(S): 8209951 : Problematic sparc intrinsic: com.sun.crypto.provider.CipherBlockChaining In-Reply-To: References: <9d49a648-50e6-30cd-5705-2997f658d8e8@oracle.com> <794dcbd2-60e0-e9f4-f9b5-f789e45d373a@oracle.com> <323b7338-d507-4850-ab53-4a5295d7b62f@default> Message-ID: Hi Fairoz, still looks good to me but please fix the indentation in the test (lines 56-60, 122). No new webrev required. Thanks, Tobias On 24.01.19 08:14, Fairoz Matte wrote: > Hi, > > This crash is very random and to exercise AES stability adding a unit testcase. > Thanks Sean Coffey for bringing this into my notice. > > I have updated webrev and kindly review > http://cr.openjdk.java.net/~fmatte/8209951/webrev.01/ > > Note: Crash is only observed on JDK 8 with Sparc Solaris 10 machine after 3_000+ iterations. > In the test case there is loop for 5_000 iterations and running in -Xbatch making it more > predictable. > > Thanks, > Fairoz > >> -----Original Message----- >> From: Fairoz Matte >> Sent: Wednesday, January 23, 2019 8:50 AM >> To: Vladimir Kozlov ; hotspot-compiler- >> dev at openjdk.java.net >> Subject: RE: [13] RFR(S): 8209951 : Problematic sparc intrinsic: >> com.sun.crypto.provider.CipherBlockChaining >> >> Thanks Tobias and Vladimir for review. >> >> Thanks, >> Fairoz >> >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: Tuesday, January 22, 2019 10:27 PM >>> To: Fairoz Matte ; hotspot-compiler- >>> dev at openjdk.java.net >>> Subject: Re: [13] RFR(S): 8209951 : Problematic sparc intrinsic: >>> com.sun.crypto.provider.CipherBlockChaining >>> >>> Yes, it is good. >>> >>> Thanks, >>> Vladimir >>> >>> On 1/22/19 12:22 AM, Tobias Hartmann wrote: >>>> Hi Fairoz, >>>> >>>> this looks good to me. >>>> >>>> Thanks, >>>> Tobias >>>> >>>> On 22.01.19 04:35, Fairoz Matte wrote: >>>>> Hi, >>>>> >>>>> Please review the following patch, >>>>> JBS bug - https://bugs.openjdk.java.net/browse/JDK-8209951 >>>>> Webrev - http://cr.openjdk.java.net/~fmatte/8209951/webrev.00/ >>>>> >>>>> During the call to assembled stub code >>>>> generate_cipherBlockChaining_decryptAESCrypt_Parallel() >>>>> there was reference to G6 register used for temporary storage of >>>>> F50, as G6 is not saved on stack it was resulting in garbage during >> retrieval. >>>>> >>>>> Solution is to use unused local register (L6) for temporary storage >>>>> and >>> retrieval of F50. >>>>> >>>>> Thanks, >>>>> Fairoz >>>>> From fairoz.matte at oracle.com Thu Jan 24 08:26:32 2019 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Thu, 24 Jan 2019 00:26:32 -0800 (PST) Subject: [13] RFR(S): 8209951 : Problematic sparc intrinsic: com.sun.crypto.provider.CipherBlockChaining In-Reply-To: References: <9d49a648-50e6-30cd-5705-2997f658d8e8@oracle.com> <794dcbd2-60e0-e9f4-f9b5-f789e45d373a@oracle.com> <323b7338-d507-4850-ab53-4a5295d7b62f@default> Message-ID: <5c604932-29e1-4d34-bf34-1dae31a6c6c4@default> Thanks Tobias, I have adjusted indentation. Thanks, Fairoz > -----Original Message----- > From: Tobias Hartmann > Sent: Thursday, January 24, 2019 1:45 PM > To: Fairoz Matte ; Vladimir Kozlov > ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: [13] RFR(S): 8209951 : Problematic sparc intrinsic: > com.sun.crypto.provider.CipherBlockChaining > > Hi Fairoz, > > still looks good to me but please fix the indentation in the test (lines 56-60, > 122). > No new webrev required. > > Thanks, > Tobias > > On 24.01.19 08:14, Fairoz Matte wrote: > > Hi, > > > > This crash is very random and to exercise AES stability adding a unit > testcase. > > Thanks Sean Coffey for bringing this into my notice. > > > > I have updated webrev and kindly review > > http://cr.openjdk.java.net/~fmatte/8209951/webrev.01/ > > > > Note: Crash is only observed on JDK 8 with Sparc Solaris 10 machine after > 3_000+ iterations. > > In the test case there is loop for 5_000 iterations and running in > > -Xbatch making it more predictable. > > > > Thanks, > > Fairoz > > > >> -----Original Message----- > >> From: Fairoz Matte > >> Sent: Wednesday, January 23, 2019 8:50 AM > >> To: Vladimir Kozlov ; hotspot-compiler- > >> dev at openjdk.java.net > >> Subject: RE: [13] RFR(S): 8209951 : Problematic sparc intrinsic: > >> com.sun.crypto.provider.CipherBlockChaining > >> > >> Thanks Tobias and Vladimir for review. > >> > >> Thanks, > >> Fairoz > >> > >>> -----Original Message----- > >>> From: Vladimir Kozlov > >>> Sent: Tuesday, January 22, 2019 10:27 PM > >>> To: Fairoz Matte ; hotspot-compiler- > >>> dev at openjdk.java.net > >>> Subject: Re: [13] RFR(S): 8209951 : Problematic sparc intrinsic: > >>> com.sun.crypto.provider.CipherBlockChaining > >>> > >>> Yes, it is good. > >>> > >>> Thanks, > >>> Vladimir > >>> > >>> On 1/22/19 12:22 AM, Tobias Hartmann wrote: > >>>> Hi Fairoz, > >>>> > >>>> this looks good to me. > >>>> > >>>> Thanks, > >>>> Tobias > >>>> > >>>> On 22.01.19 04:35, Fairoz Matte wrote: > >>>>> Hi, > >>>>> > >>>>> Please review the following patch, JBS bug - > >>>>> https://bugs.openjdk.java.net/browse/JDK-8209951 > >>>>> Webrev - http://cr.openjdk.java.net/~fmatte/8209951/webrev.00/ > >>>>> > >>>>> During the call to assembled stub code > >>>>> generate_cipherBlockChaining_decryptAESCrypt_Parallel() > >>>>> there was reference to G6 register used for temporary storage of > >>>>> F50, as G6 is not saved on stack it was resulting in garbage > >>>>> during > >> retrieval. > >>>>> > >>>>> Solution is to use unused local register (L6) for temporary > >>>>> storage and > >>> retrieval of F50. > >>>>> > >>>>> Thanks, > >>>>> Fairoz > >>>>> From rwestrel at redhat.com Thu Jan 24 09:43:12 2019 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 24 Jan 2019 10:43:12 +0100 Subject: RFR(S): 8215483: Off heap memory accesses should be vectorized In-Reply-To: <877eg6gaqk.fsf@redhat.com> References: <877eg6gaqk.fsf@redhat.com> Message-ID: <878sza75n3.fsf@redhat.com> > http://cr.openjdk.java.net/~roland/8215483/webrev.00/ Anyone for that one? Roland. From claes.redestad at oracle.com Thu Jan 24 09:58:37 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 24 Jan 2019 10:58:37 +0100 Subject: RFR(T): 8217716: Remove dead code in PhaseChaitin Message-ID: <9437526b-41b2-a486-f3bc-fcfd6aaf082f@oracle.com> Hi, various methods and fields in PhaseChaitin and friends are unused and should be removed. Bug: https://bugs.openjdk.java.net/browse/JDK-8217716 Webrev: http://cr.openjdk.java.net/~redestad/8217716/open.00/ At least one of the unused methods (PhaseChaitin::Pre_Simplify) are linger in product build, so cleaning this up marginally improves static footprint (-4Kb). Testing: tier1+2 Thanks! /Claes From martin.doerr at sap.com Thu Jan 24 10:02:37 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 24 Jan 2019 10:02:37 +0000 Subject: RFR(M): 8217459: [PPC64] Cleanup non-vector version of CRC32 In-Reply-To: <243b17be-e1a3-7b68-1e72-9a114552860c@linux.vnet.ibm.com> References: <243b17be-e1a3-7b68-1e72-9a114552860c@linux.vnet.ibm.com> Message-ID: Hi Gustavo, thank you for reviewing and testing. Seems like many comments were taken from java.util.zip.CRC32. I guess it was intended to refer to it. I think it's not bad to have it this way because it makes it easier to compare both implementations. Maybe Lutz can comment on this and if he would like to keep it this way. Best regards, Martin -----Original Message----- From: Gustavo Romero Sent: Mittwoch, 23. Januar 2019 23:18 To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' Cc: Lindenmaier, Goetz Subject: Re: RFR(M): 8217459: [PPC64] Cleanup non-vector version of CRC32 Hi Martin, On 01/21/2019 04:07 PM, Doerr, Martin wrote: > PPC64 currently contains static tables for CRC32/CRC32C calculations. We only need some of them depending on Endianess and on whether vector instructions are available or not. > We can get rid of quite some code when we generate these constants at startup as we already do for the vector version. > In addition, we can save one register in the vector case because we can use one constants pointer for all related constants. > Webrev: > http://cr.openjdk.java.net/~mdoerr/8217459_ppc64_crc_consts/webrev.00/ Thanks for the clean-up. Change looks good! It's good to see fold_8bit_crc32 and kernel_crc32_1byte going away (I just noted them recently so I missed both in my previous clean-up). And also the static table simplification. I tested the change with different array sizes and byte values with and without vpmsum in the CPU, i.e. has_vpmsumb() = false, and found no issues. Only a nit: should we update the following comment and replace 'timesXtoThe32' by something better, maybe 'table'? That name doesn't look much meaningful in the current context and seems taken from the native code for java.util.zip.CRC32: 3902 /** 3903 * uint32_t crc; 3904 * timesXtoThe32[crc & 0xFF] ^ (crc >> 8); 3905 */ 3906 void MacroAssembler::fold_byte_crc32(Register crc, Register val, Register table, Register tmp) { Best regards, Gustavo From tobias.hartmann at oracle.com Thu Jan 24 10:03:41 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 24 Jan 2019 11:03:41 +0100 Subject: RFR(T): 8217716: Remove dead code in PhaseChaitin In-Reply-To: <9437526b-41b2-a486-f3bc-fcfd6aaf082f@oracle.com> References: <9437526b-41b2-a486-f3bc-fcfd6aaf082f@oracle.com> Message-ID: <1daad406-ec37-f852-a08b-fd5c96349152@oracle.com> Hi Claes, looks good and trivial. Best regards, Tobias On 24.01.19 10:58, Claes Redestad wrote: > Hi, > > various methods and fields in PhaseChaitin and friends are unused and > should be removed. > > Bug:??? https://bugs.openjdk.java.net/browse/JDK-8217716 > Webrev: http://cr.openjdk.java.net/~redestad/8217716/open.00/ > > At least one of the unused methods (PhaseChaitin::Pre_Simplify) > are linger in product build, so cleaning this up marginally improves > static footprint (-4Kb). > > Testing: tier1+2 > > Thanks! > > /Claes From claes.redestad at oracle.com Thu Jan 24 10:04:56 2019 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 24 Jan 2019 11:04:56 +0100 Subject: RFR(T): 8217716: Remove dead code in PhaseChaitin In-Reply-To: <1daad406-ec37-f852-a08b-fd5c96349152@oracle.com> References: <9437526b-41b2-a486-f3bc-fcfd6aaf082f@oracle.com> <1daad406-ec37-f852-a08b-fd5c96349152@oracle.com> Message-ID: <7f7c5785-319f-4700-c248-6f681393fa47@oracle.com> On 2019-01-24 11:03, Tobias Hartmann wrote: > Hi Claes, > > looks good and trivial. Thanks, Tobias! /Claes From Pengfei.Li at arm.com Thu Jan 24 10:29:50 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Thu, 24 Jan 2019 10:29:50 +0000 Subject: [aarch64-port-dev ] RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics In-Reply-To: <771c5094-aacb-d52c-437f-29aaf5f8f01a@redhat.com> References: <771c5094-aacb-d52c-437f-29aaf5f8f01a@redhat.com> Message-ID: Hi Andrew Haley, > Instead, please put it into a function (e.g. updateBytesCRC32C_inner) and call > it from updateBytesCRC32C. There's no point writing all this stuff out twice. I uploaded a new webrev. Is it what you want? http://cr.openjdk.java.net/~pli/rfr/8216259/webrev.01/ -- Thanks, Pengfei From lutz.schmidt at sap.com Thu Jan 24 11:11:37 2019 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Thu, 24 Jan 2019 11:11:37 +0000 Subject: RFR(M): 8217459: [PPC64] Cleanup non-vector version of CRC32 In-Reply-To: References: <243b17be-e1a3-7b68-1e72-9a114552860c@linux.vnet.ibm.com> Message-ID: <51A38D8D-15B8-47FA-AF07-5F4F8D1E0C94@sap.com> Gustavo, Martin, I agree, that comment appears somewhat disconnected from the code. I'm really not sure if it will help a lot in the future to have a grep string that helps finding the related code in java.util.zip.CRC32. In short: change it to something meaningful in the local context. Thanks, Lutz ?On 24.01.19, 11:02, "Doerr, Martin" wrote: Hi Gustavo, thank you for reviewing and testing. Seems like many comments were taken from java.util.zip.CRC32. I guess it was intended to refer to it. I think it's not bad to have it this way because it makes it easier to compare both implementations. Maybe Lutz can comment on this and if he would like to keep it this way. Best regards, Martin -----Original Message----- From: Gustavo Romero Sent: Mittwoch, 23. Januar 2019 23:18 To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' Cc: Lindenmaier, Goetz Subject: Re: RFR(M): 8217459: [PPC64] Cleanup non-vector version of CRC32 Hi Martin, On 01/21/2019 04:07 PM, Doerr, Martin wrote: > PPC64 currently contains static tables for CRC32/CRC32C calculations. We only need some of them depending on Endianess and on whether vector instructions are available or not. > We can get rid of quite some code when we generate these constants at startup as we already do for the vector version. > In addition, we can save one register in the vector case because we can use one constants pointer for all related constants. > Webrev: > http://cr.openjdk.java.net/~mdoerr/8217459_ppc64_crc_consts/webrev.00/ Thanks for the clean-up. Change looks good! It's good to see fold_8bit_crc32 and kernel_crc32_1byte going away (I just noted them recently so I missed both in my previous clean-up). And also the static table simplification. I tested the change with different array sizes and byte values with and without vpmsum in the CPU, i.e. has_vpmsumb() = false, and found no issues. Only a nit: should we update the following comment and replace 'timesXtoThe32' by something better, maybe 'table'? That name doesn't look much meaningful in the current context and seems taken from the native code for java.util.zip.CRC32: 3902 /** 3903 * uint32_t crc; 3904 * timesXtoThe32[crc & 0xFF] ^ (crc >> 8); 3905 */ 3906 void MacroAssembler::fold_byte_crc32(Register crc, Register val, Register table, Register tmp) { Best regards, Gustavo From martin.doerr at sap.com Thu Jan 24 12:11:37 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 24 Jan 2019 12:11:37 +0000 Subject: RFR(M): 8217459: [PPC64] Cleanup non-vector version of CRC32 In-Reply-To: <51A38D8D-15B8-47FA-AF07-5F4F8D1E0C94@sap.com> References: <243b17be-e1a3-7b68-1e72-9a114552860c@linux.vnet.ibm.com> <51A38D8D-15B8-47FA-AF07-5F4F8D1E0C94@sap.com> Message-ID: Hi Lutz and Gustavo, that's fine. Removed the comments which refer to java.util.zip.CRC32 stuff. And while reading through the comments, I found out that kernel_crc32_singleByte is not useful (since we have the ...Reg version). So I just removed it and replaced its only usage by better code (TemplateInterpreterGenerator::generate_CRC32_update_entry). New webrev: http://cr.openjdk.java.net/~mdoerr/8217459_ppc64_crc_consts/webrev.01/ Best regards, Martin -----Original Message----- From: Schmidt, Lutz Sent: Donnerstag, 24. Januar 2019 12:12 To: Doerr, Martin ; Gustavo Romero ; 'hotspot-compiler-dev at openjdk.java.net' Cc: Lindenmaier, Goetz Subject: Re: RFR(M): 8217459: [PPC64] Cleanup non-vector version of CRC32 Gustavo, Martin, I agree, that comment appears somewhat disconnected from the code. I'm really not sure if it will help a lot in the future to have a grep string that helps finding the related code in java.util.zip.CRC32. In short: change it to something meaningful in the local context. Thanks, Lutz ?On 24.01.19, 11:02, "Doerr, Martin" wrote: Hi Gustavo, thank you for reviewing and testing. Seems like many comments were taken from java.util.zip.CRC32. I guess it was intended to refer to it. I think it's not bad to have it this way because it makes it easier to compare both implementations. Maybe Lutz can comment on this and if he would like to keep it this way. Best regards, Martin -----Original Message----- From: Gustavo Romero Sent: Mittwoch, 23. Januar 2019 23:18 To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' Cc: Lindenmaier, Goetz Subject: Re: RFR(M): 8217459: [PPC64] Cleanup non-vector version of CRC32 Hi Martin, On 01/21/2019 04:07 PM, Doerr, Martin wrote: > PPC64 currently contains static tables for CRC32/CRC32C calculations. We only need some of them depending on Endianess and on whether vector instructions are available or not. > We can get rid of quite some code when we generate these constants at startup as we already do for the vector version. > In addition, we can save one register in the vector case because we can use one constants pointer for all related constants. > Webrev: > http://cr.openjdk.java.net/~mdoerr/8217459_ppc64_crc_consts/webrev.00/ Thanks for the clean-up. Change looks good! It's good to see fold_8bit_crc32 and kernel_crc32_1byte going away (I just noted them recently so I missed both in my previous clean-up). And also the static table simplification. I tested the change with different array sizes and byte values with and without vpmsum in the CPU, i.e. has_vpmsumb() = false, and found no issues. Only a nit: should we update the following comment and replace 'timesXtoThe32' by something better, maybe 'table'? That name doesn't look much meaningful in the current context and seems taken from the native code for java.util.zip.CRC32: 3902 /** 3903 * uint32_t crc; 3904 * timesXtoThe32[crc & 0xFF] ^ (crc >> 8); 3905 */ 3906 void MacroAssembler::fold_byte_crc32(Register crc, Register val, Register table, Register tmp) { Best regards, Gustavo From aph at redhat.com Thu Jan 24 12:32:55 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 24 Jan 2019 12:32:55 +0000 Subject: [aarch64-port-dev ] RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics In-Reply-To: References: <771c5094-aacb-d52c-437f-29aaf5f8f01a@redhat.com> Message-ID: <7e8244c4-ba51-b434-425e-4db9f92fa500@redhat.com> On 1/24/19 10:29 AM, Pengfei Li (Arm Technology China) wrote: > Hi Andrew Haley, > >> Instead, please put it into a function (e.g. updateBytesCRC32C_inner) and call >> it from updateBytesCRC32C. There's no point writing all this stuff out twice. > > I uploaded a new webrev. Is it what you want? > http://cr.openjdk.java.net/~pli/rfr/8216259/webrev.01/ Yes, thank you. Ningsheng, once you have commit access to OpenJDK, will you please push this? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From nils.eliasson at oracle.com Thu Jan 24 12:39:20 2019 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Thu, 24 Jan 2019 13:39:20 +0100 Subject: RFR(S): 8215483: Off heap memory accesses should be vectorized In-Reply-To: <878sza75n3.fsf@redhat.com> References: <877eg6gaqk.fsf@redhat.com> <878sza75n3.fsf@redhat.com> Message-ID: Hi Roland, Looks good! Thanks for fixing! // Nils On 2019-01-24 10:43, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~roland/8215483/webrev.00/ > Anyone for that one? > > Roland. From gromero at linux.vnet.ibm.com Thu Jan 24 14:17:04 2019 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Thu, 24 Jan 2019 12:17:04 -0200 Subject: RFR(M): 8217459: [PPC64] Cleanup non-vector version of CRC32 In-Reply-To: References: <243b17be-e1a3-7b68-1e72-9a114552860c@linux.vnet.ibm.com> <51A38D8D-15B8-47FA-AF07-5F4F8D1E0C94@sap.com> Message-ID: Hi Martin, On 01/24/2019 10:11 AM, Doerr, Martin wrote: > Hi Lutz and Gustavo, > > that's fine. Removed the comments which refer to java.util.zip.CRC32 stuff. > > And while reading through the comments, I found out that kernel_crc32_singleByte is not useful (since we have the ...Reg version). So I just removed it and replaced its only usage by better code (TemplateInterpreterGenerator::generate_CRC32_update_entry). > > New webrev: > http://cr.openjdk.java.net/~mdoerr/8217459_ppc64_crc_consts/webrev.01/ Thanks for the updated webrev. The additional clean-up makes the code easier to read/follow too. LGTM. Best regards, Gustavo > Best regards, > Martin > > > -----Original Message----- > From: Schmidt, Lutz > Sent: Donnerstag, 24. Januar 2019 12:12 > To: Doerr, Martin ; Gustavo Romero ; 'hotspot-compiler-dev at openjdk.java.net' > Cc: Lindenmaier, Goetz > Subject: Re: RFR(M): 8217459: [PPC64] Cleanup non-vector version of CRC32 > > Gustavo, Martin, > > I agree, that comment appears somewhat disconnected from the code. > I'm really not sure if it will help a lot in the future to have a > grep string that helps finding the related code in java.util.zip.CRC32. > > In short: change it to something meaningful in the local context. > > Thanks, > Lutz > > ?On 24.01.19, 11:02, "Doerr, Martin" wrote: > > Hi Gustavo, > > thank you for reviewing and testing. > > Seems like many comments were taken from java.util.zip.CRC32. I guess it was intended to refer to it. > I think it's not bad to have it this way because it makes it easier to compare both implementations. > Maybe Lutz can comment on this and if he would like to keep it this way. > > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero > Sent: Mittwoch, 23. Januar 2019 23:18 > To: Doerr, Martin ; 'hotspot-compiler-dev at openjdk.java.net' > Cc: Lindenmaier, Goetz > Subject: Re: RFR(M): 8217459: [PPC64] Cleanup non-vector version of CRC32 > > Hi Martin, > > On 01/21/2019 04:07 PM, Doerr, Martin wrote: > > PPC64 currently contains static tables for CRC32/CRC32C calculations. We only need some of them depending on Endianess and on whether vector instructions are available or not. > > We can get rid of quite some code when we generate these constants at startup as we already do for the vector version. > > In addition, we can save one register in the vector case because we can use one constants pointer for all related constants. > > Webrev: > > http://cr.openjdk.java.net/~mdoerr/8217459_ppc64_crc_consts/webrev.00/ > > Thanks for the clean-up. Change looks good! > > It's good to see fold_8bit_crc32 and kernel_crc32_1byte going away (I just > noted them recently so I missed both in my previous clean-up). And also > the static table simplification. > > I tested the change with different array sizes and byte values with and > without vpmsum in the CPU, i.e. has_vpmsumb() = false, and found no issues. > > Only a nit: should we update the following comment and replace 'timesXtoThe32' > by something better, maybe 'table'? That name doesn't look much meaningful in the > current context and seems taken from the native code for java.util.zip.CRC32: > > 3902 /** > 3903 * uint32_t crc; > 3904 * timesXtoThe32[crc & 0xFF] ^ (crc >> 8); > 3905 */ > 3906 void MacroAssembler::fold_byte_crc32(Register crc, Register val, Register table, Register tmp) { > > > Best regards, > Gustavo > > > > From Ningsheng.Jian at arm.com Thu Jan 24 09:34:22 2019 From: Ningsheng.Jian at arm.com (Ningsheng Jian (Arm Technology China)) Date: Thu, 24 Jan 2019 09:34:22 +0000 Subject: [aarch64-port-dev ] Changes to Bellsoft/Marvell method of developing intrinsics In-Reply-To: References: Message-ID: Hi Derek, > > We will also begin back-reviewing existing complex intrinsics. If other members > of the community are interested in working on this we can coordinate to ensure > coverage. > We (Arm) are happy to co-work on this and Pengfei has just started to investigate some existing complex string intrinsics. Thanks, Ningsheng From aph at redhat.com Thu Jan 24 16:51:45 2019 From: aph at redhat.com (Andrew Haley) Date: Thu, 24 Jan 2019 16:51:45 +0000 Subject: Changes to Bellsoft/Marvell method of developing intrinsics In-Reply-To: References: Message-ID: On 1/23/19 5:27 PM, Derek White wrote: > Because of this we will change how we develop patches for complex > intrinsics. Before sending the code out for public review, we intend > to: > > * Use an additional ?red-team? developer to focus on finding the > weak points in the code and develop tests that ensure code coverage > testing, test case coverage, etc. This is in addition to the normal > testing and test development that the initiating developer is > expected to do. > * The ?red-team? developer will also suggest changes for code > clarity and code documentation, and will document the test strategy > (what cases are tested, what tests cover what code, how to run > tests). > * We will include all tests developed as part of the patch, even > if some modes may not be practical to run regularly as jtreg tests > (for example if some tests take excessive time). This will allow > later enhancements or fixes to the intrinsic to go through at least > as thorough testing as the original. Thank you for that. I would like to add one thing: before doing anything you should openly discuss whether a change should be made at all. We need to know the potential gains, the maintenance costs, and what the alternatives are. For example, it may well be possible to write intrinsics in C++ with a little vector code that will perform nearly as well as hand-carved assembly language. These will be much cheaper to write, easier to maintain, and more reliable, for all the usual reasons to do with high-level languages. We may decide that we're not going to do an optimization even though it will make some operation 10% faster because it's too risky. It's only worth making changes if they really are justified by a significant improvement on real-world workloads. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Thu Jan 24 17:40:24 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 24 Jan 2019 09:40:24 -0800 Subject: [13] RFR(S): 8209951 : Problematic sparc intrinsic: com.sun.crypto.provider.CipherBlockChaining In-Reply-To: References: <9d49a648-50e6-30cd-5705-2997f658d8e8@oracle.com> <794dcbd2-60e0-e9f4-f9b5-f789e45d373a@oracle.com> <323b7338-d507-4850-ab53-4a5295d7b62f@default> Message-ID: <5cd79149-c15c-b6de-7feb-9640b7b455f6@oracle.com> Good. This intrinsic is used only after method become hot and compiled by C2 JIT. You need a lot of iteration to trigger C2 compilation - C2 compiling threshold is 10_000. The test should iterate at least that much (not 5_000). How long test run with 5_000 iterations? May be it is not practical to run 10_000 if it takes 30 min :( Thanks, Vladimir On 1/23/19 11:14 PM, Fairoz Matte wrote: > Hi, > > This crash is very random and to exercise AES stability adding a unit testcase. > Thanks Sean Coffey for bringing this into my notice. > > I have updated webrev and kindly review > http://cr.openjdk.java.net/~fmatte/8209951/webrev.01/ > > Note: Crash is only observed on JDK 8 with Sparc Solaris 10 machine after 3_000+ iterations. > In the test case there is loop for 5_000 iterations and running in -Xbatch making it more > predictable. > > Thanks, > Fairoz > >> -----Original Message----- >> From: Fairoz Matte >> Sent: Wednesday, January 23, 2019 8:50 AM >> To: Vladimir Kozlov ; hotspot-compiler- >> dev at openjdk.java.net >> Subject: RE: [13] RFR(S): 8209951 : Problematic sparc intrinsic: >> com.sun.crypto.provider.CipherBlockChaining >> >> Thanks Tobias and Vladimir for review. >> >> Thanks, >> Fairoz >> >>> -----Original Message----- >>> From: Vladimir Kozlov >>> Sent: Tuesday, January 22, 2019 10:27 PM >>> To: Fairoz Matte ; hotspot-compiler- >>> dev at openjdk.java.net >>> Subject: Re: [13] RFR(S): 8209951 : Problematic sparc intrinsic: >>> com.sun.crypto.provider.CipherBlockChaining >>> >>> Yes, it is good. >>> >>> Thanks, >>> Vladimir >>> >>> On 1/22/19 12:22 AM, Tobias Hartmann wrote: >>>> Hi Fairoz, >>>> >>>> this looks good to me. >>>> >>>> Thanks, >>>> Tobias >>>> >>>> On 22.01.19 04:35, Fairoz Matte wrote: >>>>> Hi, >>>>> >>>>> Please review the following patch, >>>>> JBS bug - https://bugs.openjdk.java.net/browse/JDK-8209951 >>>>> Webrev - http://cr.openjdk.java.net/~fmatte/8209951/webrev.00/ >>>>> >>>>> During the call to assembled stub code >>>>> generate_cipherBlockChaining_decryptAESCrypt_Parallel() >>>>> there was reference to G6 register used for temporary storage of >>>>> F50, as G6 is not saved on stack it was resulting in garbage during >> retrieval. >>>>> >>>>> Solution is to use unused local register (L6) for temporary storage >>>>> and >>> retrieval of F50. >>>>> >>>>> Thanks, >>>>> Fairoz >>>>> From vladimir.kozlov at oracle.com Thu Jan 24 17:50:53 2019 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 24 Jan 2019 09:50:53 -0800 Subject: RFR(S): 8215483: Off heap memory accesses should be vectorized In-Reply-To: References: <877eg6gaqk.fsf@redhat.com> <878sza75n3.fsf@redhat.com> Message-ID: <18b6c29f-3b1d-7970-c14c-e020f8d86c98@oracle.com> Looks good to me too. But it would be nice to have changes explanation in RFE. Why it helps vectorize off heap memory accesses? Thanks, Vladimir On 1/24/19 4:39 AM, Nils Eliasson wrote: > Hi Roland, > > Looks good! > > Thanks for fixing! > > // Nils > > On 2019-01-24 10:43, Roland Westrelin wrote: >>> http://cr.openjdk.java.net/~roland/8215483/webrev.00/ >> Anyone for that one? >> >> Roland. From andrewluotechnologies at outlook.com Thu Jan 24 19:48:01 2019 From: andrewluotechnologies at outlook.com (Andrew Luo) Date: Thu, 24 Jan 2019 19:48:01 +0000 Subject: Enhancing jaotc to automatically find VS2017 linker In-Reply-To: References: Message-ID: Just wanted to check in again on this in case my email got missed over the long weekend (in the US). Let me know if I've sent this to the wrong mailing list... Anyways, after looking into it more myself though, it seems like out-of-process isn't that unusual given that we execute link.exe out of process anyways. Thanks, -Andrew From: hotspot-compiler-dev On Behalf Of Andrew Luo Sent: Friday, January 18, 2019 2:17 PM To: hotspot-compiler-dev at openjdk.java.net Subject: Enhancing jaotc to automatically find VS2017 linker Hi, Has there been any plans to enhance jaotc to support automatically finding the link.exe in VS2017? If not, I am interested in contributing some work to support this. I see that in Linker.java (src/jdk.aot/share/classes/jdk.tools.jaotc/src/jdk/tools/jaotc/Linker.java) we find link.exe using the environment variables VS...COMNTOOLS, but since in VS2017 and forward, this is not defined, it seems another approach is necessary. Microsoft suggests that you use vswhere (https://github.com/Microsoft/vswhere, BSD licensed, included with Visual Studio 2017 15.2 and forward) or their COM API to find the latest VS2017 toolset. Anyways, if everyone agrees we should add VS2017 support, there are a few ways to do this (in order of simplest/easiest to most complex): 1. Check that vswhere exists on the system, if it does, call vswhere (out of process - not sure this is acceptable...) and use that to find the VS2017 link.exe 2. Ship vswhere with the JDK and call it out of process 3. Statically link a copy of vswhere (BSD licensed - is this okay?) into our code and add a JNI stub to call it 4. Call the COM API in a JNI function to get the latest version of VS2017 Personally I prefer (1), but if out-of-process isn't acceptable I'm fine with doing (4) or (3). Let me know if you have any comments/feedback on this proposal. Thanks, -Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From linuxtardis at gmail.com Thu Jan 24 23:17:38 2019 From: linuxtardis at gmail.com (Jakub =?UTF-8?Q?Van=C4=9Bk?=) Date: Fri, 25 Jan 2019 00:17:38 +0100 Subject: RFR(M)(round 2): 8215902: Add support for SoftFloat-3e library In-Reply-To: <3f62f15e-ac5f-94d4-9744-c9cef796a3fa@oracle.com> References: <4497ca084b9f48dbb8f6de1aa35c83653fd7acfb.camel@gmail.com> <7f69fc73-1c10-6b68-d657-c9e758d4bf1d@oracle.com> <3f62f15e-ac5f-94d4-9744-c9cef796a3fa@oracle.com> Message-ID: <12e1cb109842f145edf23b4ea5ef591395188de9.camel@gmail.com> Hi Magnus, thanks for the review! I haven't received a review for the hotspot source changes yet, so I will have to wait. Regards, Jakub On 2019-01-23 at 13:55 +0100, Magnus Ihse Bursie wrote: > Hi Jakub, > > On 2019-01-15 17:31, Jakub Van?k wrote: > > Hi Magnus and Erik, > > > > I have added the link to the repository to README and I have > > removed > > the link to the mailing list thread. I have also recreated the > > GitHub > > repository. Now it is a fork of the mentioned repository with two > > extra > > commits containing README and the build scripts. > > > > New webrev URL: > > http://cr.openjdk.java.net/~jakvanek/8215902/webrev.04/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8215902 > > Sorry for the late reply. > > This looks very good! Thank you for fixing this, including rebasing > the > github repo. > > I'm not sure if you've gotten reviews from the hotspot team for the > hotspot source changes, but from a build perspective, this is good to > go. > > /Magnus > > > > Regards, > > > > Jakub > > > > On 2019-01-15 at 15:05 +0100, Magnus Ihse Bursie wrote: > > > On 2018-12-25 16:19, Jakub Van?k wrote: > > > > Hi, > > > > > > > > please review this webrev. It is a successor of the softfloat-3 > > > > [patch] > > > > thread (first email > > > > > > > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2018-November/031311.html > > > > ) > > > > > > > > Changes since the last patch (v6): > > > > > > > > - renamed --with-softloat* to --with-sflt* (it is more compact > > > > and > > > > it > > > > corresponds to the old --with-sflt-lib=... option) > > > > > > > > - license is now obtained via --with-sflt-license switch (so it > > > > is > > > > not > > > > included in OpenJDK source tree) > > > > > > > > - updated documentation (slight rewording, added the license > > > > option) > > > > > > > > - checks for default --with/--without behavior are in place > > > > again > > > > (I forgot them when I changed the way the library is > > > > detected) > > > > > > > > - added a simple testcase - I found a disrepancy between > > > > softfloat > > > > and > > > > system function behavior. When a float with bits 0x003FFFFF > > > > is > > > > added to 0x00000001, the correct result is 0x00400000, but > > > > the > > > > default software floating point implementation returns > > > > 0x00000000. > > > > However I'm not sure where to put this test - now it is in > > > > test/hotspot/jtreg/compiler/floatingpoint. > > > > > > > > - comments in code refer to CR 6757269 and newly JDK-8215902 > > > > too. > > > > > > > > I have created a repository with SoftFloat-3e with build > > > > configuration > > > > specifically for OpenJDK on armel: > > > > https://github.com/ev3dev-lang-java/softfloat-openjdk > > > > > > > > I can add a link to it to the documentation. > > > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8215902 > > > > Webrev: http://cr.openjdk.java.net/~jakvanek/8215902/webrev.02/ > > > > > > Hi Jakub, > > > > > > In general this looks good. > > > > > > Some comments: > > > > > > I agree with Erik that you can add a link to your github project; > > > compiling SoftFloat is outside the scope of the OpenJDK build > > > instructions, but it can sure be helpful to lower the bar for > > > users > > > wanting to do that. Just one question: any particular reason you > > > didn't > > > create your github repo by forking the official > > > https://github.com/ucb-bar/berkeley-softfloat-3? That way, it > > > would > > > have > > > been easy for users to see that you were not adding any malicious > > > or > > > suspicious code to the original SoftFloat distribution. > > > > > > On the other hand, I think the link to > > > > > > > http://mail.openjdk.java.net/pipermail/aarch32-port-dev/2016-November/000611.html > > > > > > is unnecessary and just creates clutter in the documentation. > > > Please > > > remove it. > > > > > > /Magnus > > > > CI build: > > > > https://ci.adoptopenjdk.net/view/ev3dev/job/openjdk12_build_ev3_linux/67/ > > > > > > > > Cheers, > > > > > > > > Jakub > > > > > > From vladimir.x.ivanov at oracle.com Fri Jan 25 01:24:15 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 24 Jan 2019 17:24:15 -0800 Subject: [13] RFR (S): 8217760: C2: Missing symbolic info on a call from intrinsics when invoked through MethodHandle Message-ID: <7b65363c-25cf-9153-8606-1618241ad50b@oracle.com> http://cr.openjdk.java.net/~vlivanov/8217760/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8217760 If an intrinsic is called through MethodHandle and it contains a call, then it crashes at the call site during resolution due to inconsistent symbolic info: bytecode refers to method handle linker (MH::linkTo*), but the call invokes some concrete method (result of inlining through the linker). The fix is to explicitly attach symbolic info to the call using the machinery introduced by JDK-8072008 [1]. Testing: hs-precheckin-comp, hs-tier1, hs-tier2, hs-tier3 Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8072008 From Ningsheng.Jian at arm.com Fri Jan 25 01:24:16 2019 From: Ningsheng.Jian at arm.com (Ningsheng Jian (Arm Technology China)) Date: Fri, 25 Jan 2019 01:24:16 +0000 Subject: [aarch64-port-dev ] RFR(S): 8216259: AArch64: Vectorize Adler32 intrinsics In-Reply-To: <7e8244c4-ba51-b434-425e-4db9f92fa500@redhat.com> References: <771c5094-aacb-d52c-437f-29aaf5f8f01a@redhat.com> <7e8244c4-ba51-b434-425e-4db9f92fa500@redhat.com> Message-ID: <73dec208-e876-570c-446d-bf9b12303d37@arm.com> Hi Andrew, On 01/24/2019 08:32 PM, Andrew Haley wrote: > On 1/24/19 10:29 AM, Pengfei Li (Arm Technology China) wrote: >> Hi Andrew Haley, >> >>> Instead, please put it into a function (e.g. updateBytesCRC32C_inner) and call >>> it from updateBytesCRC32C. There's no point writing all this stuff out twice. >> >> I uploaded a new webrev. Is it what you want? >> http://cr.openjdk.java.net/~pli/rfr/8216259/webrev.01/ > > Yes, thank you. Ningsheng, once you have commit access to OpenJDK, will > you please push this? > Sure! Thank you! Regards, Ningsheng From igor.veresov at oracle.com Fri Jan 25 01:26:37 2019 From: igor.veresov at oracle.com (Igor Veresov) Date: Thu, 24 Jan 2019 17:26:37 -0800 Subject: Enhancing jaotc to automatically find VS2017 linker In-Reply-To: References: Message-ID: I think (1) sounds reasonable. Bob, what do you think? igor > On Jan 24, 2019, at 11:48 AM, Andrew Luo wrote: > > Just wanted to check in again on this in case my email got missed over the long weekend (in the US). Let me know if I?ve sent this to the wrong mailing list? > > Anyways, after looking into it more myself though, it seems like out-of-process isn?t that unusual given that we execute link.exe out of process anyways. > > Thanks, > > -Andrew > > From: hotspot-compiler-dev On Behalf Of Andrew Luo > Sent: Friday, January 18, 2019 2:17 PM > To: hotspot-compiler-dev at openjdk.java.net > Subject: Enhancing jaotc to automatically find VS2017 linker > > Hi, > > Has there been any plans to enhance jaotc to support automatically finding the link.exe in VS2017? If not, I am interested in contributing some work to support this. > > I see that in Linker.java (src/jdk.aot/share/classes/jdk.tools.jaotc/src/jdk/tools/jaotc/Linker.java) we find link.exe using the environment variables VS?COMNTOOLS, but since in VS2017 and forward, this is not defined, it seems another approach is necessary. Microsoft suggests that you use vswhere (https://github.com/Microsoft/vswhere , BSD licensed, included with Visual Studio 2017 15.2 and forward) or their COM API to find the latest VS2017 toolset. > > Anyways, if everyone agrees we should add VS2017 support, there are a few ways to do this (in order of simplest/easiest to most complex): > > 1. Check that vswhere exists on the system, if it does, call vswhere (out of process ? not sure this is acceptable?) and use that to find the VS2017 link.exe > 2. Ship vswhere with the JDK and call it out of process > 3. Statically link a copy of vswhere (BSD licensed ? is this okay?) into our code and add a JNI stub to call it > 4. Call the COM API in a JNI function to get the latest version of VS2017 > > Personally I prefer (1), but if out-of-process isn?t acceptable I?m fine with doing (4) or (3). > > Let me know if you have any comments/feedback on this proposal. > > Thanks, > > -Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.x.ivanov at oracle.com Fri Jan 25 01:34:03 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 24 Jan 2019 17:34:03 -0800 Subject: [13] RFR (XS): 8191998: C2: inlining through MH linkers drops speculative part of argument types Message-ID: <2c266be3-d068-86ad-a521-3682faa17043@oracle.com> http://cr.openjdk.java.net/~vlivanov/8191998/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8191998 CallGenerator::for_method_handle_inline() casts MH linker (MH::linkTo*) arguments before attempting inlining. If any argument has a speculative type attached, it is lost and can't be used later. The patch preserves speculative part while sharpening the type (if needed) based on static information from the MemberName instance. Testing: hs-precheckin-comp, hs-tier1, hs-tier2. Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Fri Jan 25 01:56:39 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 24 Jan 2019 17:56:39 -0800 Subject: [13] RFR (S): 8192001: C2: inlining through dispatching MH linkers ignores speculative type of the receiver Message-ID: http://cr.openjdk.java.net/~vlivanov/8192001/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8192001 When inlining through MethodHandle calls, C2 can improve inlining decisions by taking speculative types into account (availability of type information is addressed by JDK-8191998 [1]). There's no profiling performed at method handle linker call sites (MethodHandle::linkTo*), but type info can flow from other sources. As an example, consider the following case: class A { void m() { ... } } class B extends A { void m() { ... } } MH = LOOKUP.findVirtual(A.class, "m", ...); void test(A o) throws Throwable { MH.invokeExact(o); } test(new B()); Before (no inlining): 251 12 !b TestMH::test (21 bytes) ... @ 16 TestMH1$A::m (1 bytes) virtual call After (guarded inlining): 251 12 !b TestMH::test (21 bytes) ... @ 16 TestMH1$B1::m (1 bytes) inline (hot) \-> TypeProfile (-1/6701 counts) = TestMH1$B1 Testing: hs-precheckin-comp, hs-tier1, hs-tier2. Best regards, Vladimir Ivanov [1] https://bugs.openjdk.java.net/browse/JDK-8192001 From fairoz.matte at oracle.com Fri Jan 25 02:31:09 2019 From: fairoz.matte at oracle.com (Fa