From HORII at jp.ibm.com Sat Apr 1 10:33:50 2017 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Sat, 1 Apr 2017 19:33:50 +0900 Subject: ppc64le changes for jdk8u Message-ID: Dear all, In advance, I apologize that this mail has various mixed contents. I may need to simplify this mail by dividing to several mails for ease-of-discussion. I would like to introduce changes of jdk8u for ppc64le, which Michi and I created. We are attempting to publish a research document about performance evaluation of POWER systems based on these changes. Under the license of OpenJDK, which is based on GPL, we would like to open these changes. Some are only for jdk8u. Some are under discussion. Please give your comments if you are interested in our changes. All of changesets are generated based on jdk8u152-b01. 1. Enabled ppc64le as the default os.arch When we built jdk8u on ppc64le with the latest jdk8u repository, the default os.arch was ppc64. This change set ppc64le as the default value of os.arch. http://cr.openjdk.java.net/~horii/jdk8u_support_ppc64le/webrev/ http://cr.openjdk.java.net/~horii/jdk8u_support_ppc64le/hotspot/ http://cr.openjdk.java.net/~horii/jdk8u_support_ppc64le/jdk/ 2. backport of CRC32 intrinsics (JDK-8131048, JDK-8164920) CRC32 intrinsics was implemented and enhanced in jdk9 for ppc64le. We backported these optimizations to jdk8u. There are few changed in shared codes. I'm now trying to eliminate them. http://cr.openjdk.java.net/~horii/8166784/webrev.00/ 3. Elimination of memory fences before and after CAS We are discussing to eliminate memory fences in ParallelGC for JDK-8154736. We found that the same patterns in CMS (Probably, we will be able to find them in G1). We provided changes for jdk9 in the previous discussion with many supports from the community. The below changes are for jdk8u for ParallelGC and CMS. http://cr.openjdk.java.net/~horii/jdk8u_gcopt/webrev/ 4. Volatile access optimization In the current implementation, sync is called for each volatile read and lwsync for each volatile write. With this optimization, isync is called for each volatile read and lwsync and sync are called for each volatile write. I'm now writing a proof to validate these implementation. http://cr.openjdk.java.net/~horii/jdk8u_volatilereadopt/webrev/ 5. Tiered Compilation Tiered Compilation will be introduced in jdk9 for ppc64le. We backported Tiered Compilation to jdk8u. The following url includes all of the above changes because we backported this feature after applying them. Michi is now working to isolate dependencies in them. http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/webrev/ http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/hotspot/webrev/ http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/jdk/webrev/ The above includes many changes in shared codes. We just want to add tiered compilation only for ppc64le. Michi is now minimizing changes in shared codes. Here is the current status. http://cr.openjdk.java.net/~horii/c1backport/webrev.01/ 6. Some bypath in JCL We experimentally implemented caches of java.util.Pattern, efficient String.format for a typical date format, and TreeMap optimization to work around typical mistakes for not-optimized Java codes. They do not guarantee specifications, I think. http://cr.openjdk.java.net/~horii/jdk8u_jcl_cheat/webrev/ I tested 5 and 6 on Ubuntu 16.04 as follows. $ sudo apt-get install openjdk-8-jdk openjdk-8-dbg mercurial zip bzip2 unzip tar curl libnuma-dev libasound2-dev libxtst-dev libfreetype6-dev libxrender-dev libcups2-dev libfreetype6-dev -y $ hg clone http://hg.openjdk.java.net/jdk8u/jdk8u $ cd jdk8u $ sh ./get_source.sh $ curl -o jdk8u.patch http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/webrev/jdk8u-dev.patch $ patch -p1 < jdk8u.patch $ cd hotspot $ curl -o hotspot.patch http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/hotspot/webrev/hotspot.patch $ patch -p1 < hotspot.patch $ curl -o jdk1.patch http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/jdk/webrev/jdk.patch $ curl -o jdk2.patch http://cr.openjdk.java.net/~horii/jdk8u_jcl_cheat/webrev/jdk.patch $ patch -p1 < jdk1.patch $ patch -p1 < jdk2.patch $ cd ../ $ sh ./configure --with-freetype-include=/usr/include/freetype2/ --with-freetype-lib=/usr/lib/powerpc64le-linux-gnu/ $ make all Regards, Hiroshi -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Tue Apr 4 06:18:55 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 4 Apr 2017 06:18:55 +0000 Subject: ppc64le changes for jdk8u In-Reply-To: References: Message-ID: <56dbbce3cae0402dbd0e75392b1b260a@sap.com> Hi Hiroshi, thanks for sharing your work and for providing all the webrevs. Here are my comments: 1. I think this one makes sense. Having the same platform name "ppc64le" in jdk8 as in jdk9 sounds good to me. 2. and especially 5. are major feature backport changes. I'm not against doing that, but it would need to be done carefully as jdk8 is pretty much stabilized. 3. I still like the idea of optimizing forwarding in ParallelGC by getting rid of the expensive heavy-weight sync instructions. I think this change should be discussed in jdk10 and if we can get it reviewed and pushed, it should be possible to backport it to 9u and possibly to 8u. 4. Changing only C2 makes it inconsistent and hence incorrect. Making support_IRIW_for_not_multiple_copy_atomic_cpu switchable on PPC64 would do the job. I think this would be nice to have for experiments, especially in jdk10 wrt. JEP 188: Java Memory Model Update [1]. Note that the line in Parse::do_exits() (parse1.cpp) should be changed from: PPC64_ONLY(wrote_volatile() ||) to: PPC64_ONLY((support_IRIW_for_not_multiple_copy_atomic_cpu && wrote_volatile()) ||) Otherwise, you'll get redundant barriers (lwsync). 6. Interesting. Would also be interesting to see performance measurement results. Thanks and best regards, Martin [1] http://openjdk.java.net/jeps/188 From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces at openjdk.java.net] On Behalf Of Hiroshi H Horii Sent: Samstag, 1. April 2017 12:34 To: ppc-aix-port-dev at openjdk.java.net Cc: Michihiro Horie Subject: ppc64le changes for jdk8u Dear all, In advance, I apologize that this mail has various mixed contents. I may need to simplify this mail by dividing to several mails for ease-of-discussion. I would like to introduce changes of jdk8u for ppc64le, which Michi and I created. We are attempting to publish a research document about performance evaluation of POWER systems based on these changes. Under the license of OpenJDK, which is based on GPL, we would like to open these changes. Some are only for jdk8u. Some are under discussion. Please give your comments if you are interested in our changes. All of changesets are generated based on jdk8u152-b01. 1. Enabled ppc64le as the default os.arch When we built jdk8u on ppc64le with the latest jdk8u repository, the default os.arch was ppc64. This change set ppc64le as the default value of os.arch. http://cr.openjdk.java.net/~horii/jdk8u_support_ppc64le/webrev/ http://cr.openjdk.java.net/~horii/jdk8u_support_ppc64le/hotspot/ http://cr.openjdk.java.net/~horii/jdk8u_support_ppc64le/jdk/ 2. backport of CRC32 intrinsics (JDK-8131048, JDK-8164920) CRC32 intrinsics was implemented and enhanced in jdk9 for ppc64le. We backported these optimizations to jdk8u. There are few changed in shared codes. I'm now trying to eliminate them. http://cr.openjdk.java.net/~horii/8166784/webrev.00/ 3. Elimination of memory fences before and after CAS We are discussing to eliminate memory fences in ParallelGC for JDK-8154736. We found that the same patterns in CMS (Probably, we will be able to find them in G1). We provided changes for jdk9 in the previous discussion with many supports from the community. The below changes are for jdk8u for ParallelGC and CMS. http://cr.openjdk.java.net/~horii/jdk8u_gcopt/webrev/ 4. Volatile access optimization In the current implementation, sync is called for each volatile read and lwsync for each volatile write. With this optimization, isync is called for each volatile read and lwsync and sync are called for each volatile write. I'm now writing a proof to validate these implementation. http://cr.openjdk.java.net/~horii/jdk8u_volatilereadopt/webrev/ 5. Tiered Compilation Tiered Compilation will be introduced in jdk9 for ppc64le. We backported Tiered Compilation to jdk8u. The following url includes all of the above changes because we backported this feature after applying them. Michi is now working to isolate dependencies in them. http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/webrev/ http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/hotspot/webrev/ http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/jdk/webrev/ The above includes many changes in shared codes. We just want to add tiered compilation only for ppc64le. Michi is now minimizing changes in shared codes. Here is the current status. http://cr.openjdk.java.net/~horii/c1backport/webrev.01/ 6. Some bypath in JCL We experimentally implemented caches of java.util.Pattern, efficient String.format for a typical date format, and TreeMap optimization to work around typical mistakes for not-optimized Java codes. They do not guarantee specifications, I think. http://cr.openjdk.java.net/~horii/jdk8u_jcl_cheat/webrev/ I tested 5 and 6 on Ubuntu 16.04 as follows. $ sudo apt-get install openjdk-8-jdk openjdk-8-dbg mercurial zip bzip2 unzip tar curl libnuma-dev libasound2-dev libxtst-dev libfreetype6-dev libxrender-dev libcups2-dev libfreetype6-dev -y $ hg clone http://hg.openjdk.java.net/jdk8u/jdk8u $ cd jdk8u $ sh ./get_source.sh $ curl -o jdk8u.patch http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/webrev/jdk8u-dev.patch $ patch -p1 < jdk8u.patch $ cd hotspot $ curl -o hotspot.patch http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/hotspot/webrev/hotspot.patch $ patch -p1 < hotspot.patch $ curl -o jdk1.patch http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/jdk/webrev/jdk.patch $ curl -o jdk2.patch http://cr.openjdk.java.net/~horii/jdk8u_jcl_cheat/webrev/jdk.patch $ patch -p1 < jdk1.patch $ patch -p1 < jdk2.patch $ cd ../ $ sh ./configure --with-freetype-include=/usr/include/freetype2/ --with-freetype-lib=/usr/lib/powerpc64le-linux-gnu/ $ make all Regards, Hiroshi -------------- next part -------------- An HTML attachment was scrubbed... URL: From HORII at jp.ibm.com Thu Apr 6 16:04:20 2017 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Fri, 7 Apr 2017 01:04:20 +0900 Subject: ppc64le changes for jdk8u In-Reply-To: <56dbbce3cae0402dbd0e75392b1b260a@sap.com> References: <56dbbce3cae0402dbd0e75392b1b260a@sap.com> Message-ID: Hi Martin, Thank you for your comments. > 1. I think this one makes sense. Having the same platform name > "ppc64le" in jdk8 as in jdk9 sounds good to me. Thanks. We would like to request a change for this. > 2. and especially 5. are major feature backport changes. I'm not > against doing that, but it would need to be done carefully as jdk8 > is pretty much stabilized. We would like ask a backport of 8144019. > 3. I still like the idea of optimizing forwarding in ParallelGC by > getting rid of the expensive heavy-weight sync instructions. I think > this change should be discussed in jdk10 and if we can get it > reviewed and pushed, it should be possible to backport it to 9u and > possibly to 8u. Thanks. We would like to start discussion for jdk10, first. > 4. Changing only C2 makes it inconsistent and hence incorrect. Right. I need to care C1 also. > Making support_IRIW_for_not_multiple_copy_atomic_cpu switchable on > PPC64 would do the job. I think this would be nice to have for > experiments, especially in jdk10 I agree. We will start discussion for jdk10, first. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo "Doerr, Martin" wrote on 2017/04/04 15:18:55: > From: "Doerr, Martin" > To: Hiroshi H Horii/Japan/IBM at IBMJP, "ppc-aix-port- > dev at openjdk.java.net" > Cc: Michihiro Horie/Japan/IBM at IBMJP > Date: 2017/04/04 15:20 > Subject: RE: ppc64le changes for jdk8u > > Hi Hiroshi, > > thanks for sharing your work and for providing all the webrevs. Here > are my comments: > > 1. I think this one makes sense. Having the same platform name > "ppc64le" in jdk8 as in jdk9 sounds good to me. > > 2. and especially 5. are major feature backport changes. I'm not > against doing that, but it would need to be done carefully as jdk8 > is pretty much stabilized. > > 3. I still like the idea of optimizing forwarding in ParallelGC by > getting rid of the expensive heavy-weight sync instructions. I think > this change should be discussed in jdk10 and if we can get it > reviewed and pushed, it should be possible to backport it to 9u and > possibly to 8u. > > 4. Changing only C2 makes it inconsistent and hence incorrect. > Making support_IRIW_for_not_multiple_copy_atomic_cpu switchable on > PPC64 would do the job. I think this would be nice to have for > experiments, especially in jdk10 wrt. JEP 188: Java Memory Model Update [1]. > Note that the line in Parse::do_exits() (parse1.cpp) should be changed from: > PPC64_ONLY(wrote_volatile() ||) > to: > PPC64_ONLY((support_IRIW_for_not_multiple_copy_atomic_cpu && > wrote_volatile()) ||) > Otherwise, you'll get redundant barriers (lwsync). > > 6. Interesting. Would also be interesting to see performance > measurement results. > > Thanks and best regards, > Martin > > > [1] http://openjdk.java.net/jeps/188 > > > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces at openjdk.java.net ] > On Behalf Of Hiroshi H Horii > Sent: Samstag, 1. April 2017 12:34 > To: ppc-aix-port-dev at openjdk.java.net > Cc: Michihiro Horie > Subject: ppc64le changes for jdk8u > > Dear all, > > In advance, I apologize that this mail has various mixed contents. I > may need to simplify this mail by dividing to several mails for > ease-of-discussion. > > I would like to introduce changes of jdk8u for ppc64le, which Michi > and I created. We are attempting to publish a research document > about performance evaluation of POWER systems based on these > changes. Under the license of OpenJDK, which is based on GPL, we > would like to open these changes. > > Some are only for jdk8u. Some are under discussion. Please give your > comments if you are interested in our changes. All of changesets are > generated based on jdk8u152-b01. > > 1. Enabled ppc64le as the default os.arch > When we built jdk8u on ppc64le with the latest jdk8u repository, the > default os.arch was ppc64. This change set ppc64le as the default > value of os.arch. > http://cr.openjdk.java.net/~horii/jdk8u_support_ppc64le/webrev/ > http://cr.openjdk.java.net/~horii/jdk8u_support_ppc64le/hotspot/ > http://cr.openjdk.java.net/~horii/jdk8u_support_ppc64le/jdk/ > > 2. backport of CRC32 intrinsics (JDK-8131048, JDK-8164920) > CRC32 intrinsics was implemented and enhanced in jdk9 for ppc64le. > We backported these optimizations to jdk8u. There are few changed in > shared codes. I'm now trying to eliminate them. > http://cr.openjdk.java.net/~horii/8166784/webrev.00/ > > 3. Elimination of memory fences before and after CAS > We are discussing to eliminate memory fences in ParallelGC for > JDK-8154736. We found that the same patterns in CMS (Probably, we > will be able to find them in G1). We provided changes for jdk9 in > the previous discussion with many supports from the community. The > below changes are for jdk8u for ParallelGC and CMS. > http://cr.openjdk.java.net/~horii/jdk8u_gcopt/webrev/ > > 4. Volatile access optimization > In the current implementation, sync is called for each volatile read > and lwsync for each volatile write. With this optimization, isync is > called for each volatile read and lwsync and sync are called for > each volatile write. I'm now writing a proof > to validate these implementation. > http://cr.openjdk.java.net/~horii/jdk8u_volatilereadopt/webrev/ > > 5. Tiered Compilation > Tiered Compilation will be introduced in jdk9 for ppc64le. We > backported Tiered Compilation to jdk8u. The following url includes > all of the above changes because we backported this feature after > applying them. Michi is now working to isolate dependencies in them. > http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/webrev/ > http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/hotspot/webrev/ > http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/jdk/webrev/ > > The above includes many changes in shared codes. We just want to add > tiered compilation only for ppc64le. Michi is now minimizing changes > in shared codes. Here is the current status. > http://cr.openjdk.java.net/~horii/c1backport/webrev.01/ > > 6. Some bypath in JCL > We experimentally implemented caches of java.util.Pattern, efficient > String.format for a typical date format, and TreeMap optimization to > work around typical mistakes for not-optimized Java codes. They do > not guarantee specifications, I think. > http://cr.openjdk.java.net/~horii/jdk8u_jcl_cheat/webrev/ > > I tested 5 and 6 on Ubuntu 16.04 as follows. > > $ sudo apt-get install openjdk-8-jdk openjdk-8-dbg mercurial zip > bzip2 unzip tar curl libnuma-dev libasound2-dev libxtst-dev > libfreetype6-dev libxrender-dev libcups2-dev libfreetype6-dev -y > $ hg clone http://hg.openjdk.java.net/jdk8u/jdk8u > $ cd jdk8u > $ sh ./get_source.sh > $ curl -o jdk8u.patch http://cr.openjdk.java.net/~horii/ > jdk8u_support_tiered_ppc64le/webrev/jdk8u-dev.patch > $ patch -p1 < jdk8u.patch > $ cd hotspot > $ curl -o hotspot.patch http://cr.openjdk.java.net/~horii/ > jdk8u_support_tiered_ppc64le/hotspot/webrev/hotspot.patch > $ patch -p1 < hotspot.patch > $ curl -o jdk1.patch http://cr.openjdk.java.net/~horii/ > jdk8u_support_tiered_ppc64le/jdk/webrev/jdk.patch > $ curl -o jdk2.patch http://cr.openjdk.java.net/~horii/ > jdk8u_jcl_cheat/webrev/jdk.patch > $ patch -p1 < jdk1.patch > $ patch -p1 < jdk2.patch > $ cd ../ > $ sh ./configure --with-freetype-include=/usr/include/freetype2/ -- > with-freetype-lib=/usr/lib/powerpc64le-linux-gnu/ > $ make all > > Regards, > Hiroshi -------------- next part -------------- An HTML attachment was scrubbed... URL: From HORIE at jp.ibm.com Fri Apr 7 05:49:55 2017 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Fri, 7 Apr 2017 14:49:55 +0900 Subject: Optimizing byte reverse code for int value Message-ID: Dear all, Would you please review our change for JDK10 on ppc64? Issue: https://bugs.openjdk.java.net/browse/JDK-8178294 Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.00/ This change adds two conversion rules of reversing contiguous 4 bytes for int value. The first conversion rule finds a pattern below and emits a lwz instruction instead. Original: lbz r14,19(r12) lbz r11,17(r12) lbz r10,18(r12) lbz r9,16(r12) extsb r14,r14 rlwinm r10,r10,16,0,15 rlwinm r14,r14,24,0,7 add r14,r10,r14 rlwinm r11,r11,8,0,23 add r12,r11,r9 add r14,r12,r14 Optimization with first conversion rule: lwz r14,16(r12) The second conversion rule finds a pattern below and emits only lfs instruction. Original: lbz r14,19(r12) lbz r11,17(r12) lbz r10,18(r12) lbz r9,16(r12) extsb r14,r14 rlwinm r10,r10,16,0,15 rlwinm r14,r14,24,0,7 add r14,r10,r14 rlwinm r11,r11,8,0,23 add r12,r11,r9 add r14,r12,r14 stw r14,156(r1) lfs f12,156(r1) Optimization with first conversion rule: lfs f12,156(r1) Our motivation comes from the fact that a performance bottleneck exists in byte reversing code in Apache ORC on Tez framework as shown below. https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/impl/SerializationUtils.java We believe this kind of procedures is typical in Java. public float readFloat(InputStream in) throws IOException { readFully(in, readBuffer, 0, 4); int val = (((readBuffer[0] & 0xff) << 0) + ((readBuffer[1] & 0xff) << 8) + ((readBuffer[2] & 0xff) << 16) + ((readBuffer[3] & 0xff) << 24)); return Float.intBitsToFloat(val); } By using our change, we could observe 5% performance improvement in a micro benchmark. (See attached file: ReadFloatTest.java) Best regards, -- Michihiro, IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ReadFloatTest.java Type: application/octet-stream Size: 3532 bytes Desc: not available URL: From martin.doerr at sap.com Fri Apr 7 07:12:34 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 7 Apr 2017 07:12:34 +0000 Subject: Optimizing byte reverse code for int value In-Reply-To: References: Message-ID: <4d45658f669541379b167df2ca6d6e2b@sap.com> Hi Michihiro, thanks for providing the webrev. I appreciate improvements for this bottleneck. After taking a first look over it, it looks good to me for ppc64le. But I think it would break big endian platforms. I suggest replacing the use of loadI by endianness specific code (which could possibly use lwbrx on big endian). Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Freitag, 7. April 2017 07:50 To: ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net Cc: Doerr, Martin ; Simonis, Volker ; volker.simonis at gmail.com; Hiroshi H Horii ; Lindenmaier, Goetz ; Gustavo Bueno Romero Subject: Optimizing byte reverse code for int value Dear all, Would you please review our change for JDK10 on ppc64? Issue: https://bugs.openjdk.java.net/browse/JDK-8178294 Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.00/ This change adds two conversion rules of reversing contiguous 4 bytes for int value. The first conversion rule finds a pattern below and emits a lwz instruction instead. Original: lbz r14,19(r12) lbz r11,17(r12) lbz r10,18(r12) lbz r9,16(r12) extsb r14,r14 rlwinm r10,r10,16,0,15 rlwinm r14,r14,24,0,7 add r14,r10,r14 rlwinm r11,r11,8,0,23 add r12,r11,r9 add r14,r12,r14 Optimization with first conversion rule: lwz r14,16(r12) The second conversion rule finds a pattern below and emits only lfs instruction. Original: lbz r14,19(r12) lbz r11,17(r12) lbz r10,18(r12) lbz r9,16(r12) extsb r14,r14 rlwinm r10,r10,16,0,15 rlwinm r14,r14,24,0,7 add r14,r10,r14 rlwinm r11,r11,8,0,23 add r12,r11,r9 add r14,r12,r14 stw r14,156(r1) lfs f12,156(r1) Optimization with first conversion rule: lfs f12,156(r1) Our motivation comes from the fact that a performance bottleneck exists in byte reversing code in Apache ORC on Tez framework as shown below. https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/impl/SerializationUtils.java We believe this kind of procedures is typical in Java. public float readFloat(InputStream in) throws IOException { readFully(in, readBuffer, 0, 4); int val = (((readBuffer[0] & 0xff) << 0) + ((readBuffer[1] & 0xff) << 8) + ((readBuffer[2] & 0xff) << 16) + ((readBuffer[3] & 0xff) << 24)); return Float.intBitsToFloat(val); } By using our change, we could observe 5% performance improvement in a micro benchmark. (See attached file: ReadFloatTest.java) Best regards, -- Michihiro, IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: From gustavo.scalet at eldorado.org.br Fri Apr 7 14:14:59 2017 From: gustavo.scalet at eldorado.org.br (Gustavo Serra Scalet) Date: Fri, 7 Apr 2017 14:14:59 +0000 Subject: [PPC][Hotspot] Aparently unrelated SharedRuntime::c_calling_convention call fails when implementing new intrinsics Message-ID: Hi, We implemented the MulAdd intrinsic on PPC on JDK 9 and now we're backporting it to 8 but I'm facing an exception which I assume it's a bug elsewhere: # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/sharedRuntime_ppc.cpp:737 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/gut/jdk8u-dev/hotspot/src/cpu/ppc/vm/sharedRuntime_ppc.cpp:737), pid=7631, tid=0x00003fff3454f1a0 # guarantee(i > 0 && sig_bt[i-1] == T_LONG) failed: argument of type (bt) should have been promoted to type (T_LONG,bt) for bt in {T_BOOLEAN, T_CHAR, T_BYTE, T_SHORT, T_INT} # # JRE version: OpenJDK Runtime Environment (8.0) (build 1.8.0-internal-debug-gut_2017_04_05_11_00-b00) # Java VM: OpenJDK 64-Bit Server VM (25.71-b00-debug mixed mode linux-ppc64 compressed oops) # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /home/gut/hs/hotspot/src/cpu/ppc/vm/tst/hs_err_pid7631.log # # Compiler replay data is saved as: # /home/gut/hs/hotspot/src/cpu/ppc/vm/tst/replay_pid7631.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # Current thread is 70365327192480 Dumping core ... Aborted Please take a look at the diff[1] for the new muladd and a test case[2] in java, which has an argument to repeat the main loop. Setting it with a high value such as 1234 is enough to jit the code and run the intrinsic. I also noticed that this check in hotspot/src/cpu/ppc/vm/sharedRuntime_ppc.cpp does not exist in JDK9 due to a changeset[3] that was not backported. But that didn't stop X64 MulAdd intrinsics to work as it is. As I implemented one with the same interface, I don't understand why it's happening now... Thanks in advance [1] https://gist.github.com/gut/3d5f7984ef3114113b224853867bc906#file-add-muladd-ppc-diff [2] https://gist.github.com/gut/3d5f7984ef3114113b224853867bc906#file-testmuladd-java [3] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/d7f63963925f#l3.7 From HORII at jp.ibm.com Fri Apr 7 17:51:09 2017 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Sat, 8 Apr 2017 02:51:09 +0900 Subject: Optimizing byte reverse code for int value In-Reply-To: References: Message-ID: > I suggest replacing the use of loadI by endianness specific code > (which could possibly use lwbrx on big endian). I believe that lwbrx is necessary for BE. > Surely the source code needs fixing. It could be: > > public float readFloat(InputStream in) throws IOException { > readFully(in, aByteBuffer, 0, 4); > int val = aByteBuffer.getInt(0); > > return Float.intBitsToFloat(val); > } In my understanding, ByteBuffer.getInt() does the similar thing. I guess that application does not use ByteBuffer only for calling getInt(). Heap-X-Buffer.java.template public int getInt() { return Bits.getInt(this, ix(nextGetIndex(4)), bigEndian); } Bits.java static int getInt(ByteBuffer bb, int bi, boolean bigEndian) { return bigEndian ? getIntB(bb, bi) : getIntL(bb, bi) ; } static int getIntL(ByteBuffer bb, int bi) { return makeInt(bb._get(bi + 3), bb._get(bi + 2), bb._get(bi + 1), bb._get(bi )); } static private int makeInt(byte b3, byte b2, byte b1, byte b0) { return (((b3 ) << 24) | ((b2 & 0xff) << 16) | ((b1 & 0xff) << 8) | ((b0 & 0xff) )); } Heap-X-Buffer.java.template byte _get(int i) { // package-private return hb[i]; } Direct-X-Buffer.java.template byte _get(int i) { // package-private return unsafe.getByte(address + i); } Regards, Hiroshi Andrew Haley wrote on 2017/04/07 17:36:13: > From: Andrew Haley > To: Michihiro Horie/Japan/IBM at IBMJP, ppc-aix-port- > dev at openjdk.java.net, hotspot-dev at openjdk.java.net > Cc: Hiroshi H Horii/Japan/IBM at IBMJP, volker.simonis at sap.com > Date: 2017/04/07 17:37 > Subject: Re: Optimizing byte reverse code for int value > > On 07/04/17 06:49, Michihiro Horie wrote: > > > > Would you please review our change for JDK10 on ppc64? > > Issue: https://bugs.openjdk.java.net/browse/JDK-8178294 > > Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.00/ > > > > This change adds two conversion rules of reversing contiguous 4 bytes for > > int value. > > The first conversion rule finds a pattern below and emits a lwz instruction > > instead. > > Surely the source code needs fixing. It could be: > > public float readFloat(InputStream in) throws IOException { > readFully(in, aByteBuffer, 0, 4); > int val = aByteBuffer.getInt(0); > > return Float.intBitsToFloat(val); > } > > Then there would be no need for a special ppc64 pattern. > > Andrew. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From HORII at jp.ibm.com Fri Apr 7 18:13:16 2017 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Sat, 8 Apr 2017 03:13:16 +0900 Subject: Optimizing byte reverse code for int value In-Reply-To: <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com> References: <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com> Message-ID: > > I guess that application does not use ByteBuffer only for calling getInt(). > > > > Heap-X-Buffer.java.template > > public int getInt() { > > return Bits.getInt(this, ix(nextGetIndex(4)), bigEndian); > > } > > This is old code. In JDK9 it looks like > > public int getInt() { > return unsafe.getIntUnaligned(hb, > byteOffset(nextGetIndex(4)), bigEndian); > } > > Unsafe.getIntUnaligned is a HotSpot intrinsic. Sorry. You are right. I checked jdk8u... jdk8u doesn't have Unsafe.getIntUnaligned. This change will improve jdk8u. Regards, Hiroshi Andrew Haley wrote on 2017/04/08 02:58:45: > From: Andrew Haley > To: Hiroshi H Horii/Japan/IBM at IBMJP > Cc: hotspot-dev at openjdk.java.net, Michihiro Horie/Japan/IBM at IBMJP, > ppc-aix-port-dev at openjdk.java.net, volker.simonis at sap.com > Date: 2017/04/08 02:59 > Subject: Re: Optimizing byte reverse code for int value > > On 07/04/17 18:51, Hiroshi H Horii wrote: > >> I suggest replacing the use of loadI by endianness specific code > >> (which could possibly use lwbrx on big endian). > > > > I believe that lwbrx is necessary for BE. > > > >> Surely the source code needs fixing. It could be: > >> > >> public float readFloat(InputStream in) throws IOException { > >> readFully(in, aByteBuffer, 0, 4); > >> int val = aByteBuffer.getInt(0); > >> > >> return Float.intBitsToFloat(val); > >> } > > > > In my understanding, ByteBuffer.getInt() does the similar thing. > > It doesn't. > > > I guess that application does not use ByteBuffer only for calling getInt(). > > > > Heap-X-Buffer.java.template > > public int getInt() { > > return Bits.getInt(this, ix(nextGetIndex(4)), bigEndian); > > } > > This is old code. In JDK9 it looks like > > public int getInt() { > return unsafe.getIntUnaligned(hb, > byteOffset(nextGetIndex(4)), bigEndian); > } > > Unsafe.getIntUnaligned is a HotSpot intrinsic. > > Bits.java is not used for this. > > Andrew. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Sat Apr 8 08:02:10 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Sat, 8 Apr 2017 10:02:10 +0200 Subject: ppc64le changes for jdk8u In-Reply-To: References: <56dbbce3cae0402dbd0e75392b1b260a@sap.com> Message-ID: Hi Hiroshi, Martin, Would the normal way of doing this not to do the changes for jdk 10 and, once they have proven to be stable, to backport them to 9 and 8? 8 is a stable release and 9 is closed for anything but p2 bugfixes now; both are not ideal for this kind of development. Also, I think the ppc-aix is too narrow a mailing list for these changes; maybe ask hotspot-dev? Kind Regards, Thomas On Thu, Apr 6, 2017 at 6:04 PM, Hiroshi H Horii wrote: > Hi Martin, > > Thank you for your comments. > > > 1. I think this one makes sense. Having the same platform name > > "ppc64le" in jdk8 as in jdk9 sounds good to me. > > Thanks. We would like to request a change for this. > > > 2. and especially 5. are major feature backport changes. I'm not > > against doing that, but it would need to be done carefully as jdk8 > > is pretty much stabilized. > > We would like ask a backport of 8144019. > > > 3. I still like the idea of optimizing forwarding in ParallelGC by > > getting rid of the expensive heavy-weight sync instructions. I think > > this change should be discussed in jdk10 and if we can get it > > reviewed and pushed, it should be possible to backport it to 9u and > > possibly to 8u. > > Thanks. We would like to start discussion for jdk10, first. > > > 4. Changing only C2 makes it inconsistent and hence incorrect. > > Right. I need to care C1 also. > > > Making support_IRIW_for_not_multiple_copy_atomic_cpu switchable on > > PPC64 would do the job. I think this would be nice to have for > > experiments, especially in jdk10 > > I agree. We will start discussion for jdk10, first. > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > "Doerr, Martin" wrote on 2017/04/04 15:18:55: > > > From: "Doerr, Martin" > > To: Hiroshi H Horii/Japan/IBM at IBMJP, "ppc-aix-port- > > dev at openjdk.java.net" > > Cc: Michihiro Horie/Japan/IBM at IBMJP > > Date: 2017/04/04 15:20 > > Subject: RE: ppc64le changes for jdk8u > > > > > Hi Hiroshi, > > > > thanks for sharing your work and for providing all the webrevs. Here > > are my comments: > > > > 1. I think this one makes sense. Having the same platform name > > "ppc64le" in jdk8 as in jdk9 sounds good to me. > > > > 2. and especially 5. are major feature backport changes. I'm not > > against doing that, but it would need to be done carefully as jdk8 > > is pretty much stabilized. > > > > 3. I still like the idea of optimizing forwarding in ParallelGC by > > getting rid of the expensive heavy-weight sync instructions. I think > > this change should be discussed in jdk10 and if we can get it > > reviewed and pushed, it should be possible to backport it to 9u and > > possibly to 8u. > > > > 4. Changing only C2 makes it inconsistent and hence incorrect. > > Making support_IRIW_for_not_multiple_copy_atomic_cpu switchable on > > PPC64 would do the job. I think this would be nice to have for > > experiments, especially in jdk10 wrt. JEP 188: Java Memory Model Update > [1]. > > Note that the line in Parse::do_exits() (parse1.cpp) should be changed > from: > > PPC64_ONLY(wrote_volatile() ||) > > to: > > PPC64_ONLY((support_IRIW_for_not_multiple_copy_atomic_cpu && > > wrote_volatile()) ||) > > Otherwise, you'll get redundant barriers (lwsync). > > > > 6. Interesting. Would also be interesting to see performance > > measurement results. > > > > Thanks and best regards, > > Martin > > > > > > [1] http://openjdk.java.net/jeps/188 > > > > > > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces at openjdk.java.net > ] > > On Behalf Of Hiroshi H Horii > > Sent: Samstag, 1. April 2017 12:34 > > To: ppc-aix-port-dev at openjdk.java.net > > Cc: Michihiro Horie > > Subject: ppc64le changes for jdk8u > > > > Dear all, > > > > In advance, I apologize that this mail has various mixed contents. I > > may need to simplify this mail by dividing to several mails for > > ease-of-discussion. > > > > I would like to introduce changes of jdk8u for ppc64le, which Michi > > and I created. We are attempting to publish a research document > > about performance evaluation of POWER systems based on these > > changes. Under the license of OpenJDK, which is based on GPL, we > > would like to open these changes. > > > > Some are only for jdk8u. Some are under discussion. Please give your > > comments if you are interested in our changes. All of changesets are > > generated based on jdk8u152-b01. > > > > 1. Enabled ppc64le as the default os.arch > > When we built jdk8u on ppc64le with the latest jdk8u repository, the > > default os.arch was ppc64. This change set ppc64le as the default > > value of os.arch. > > http://cr.openjdk.java.net/~horii/jdk8u_support_ppc64le/webrev/ > > http://cr.openjdk.java.net/~horii/jdk8u_support_ppc64le/hotspot/ > > http://cr.openjdk.java.net/~horii/jdk8u_support_ppc64le/jdk/ > > > > 2. backport of CRC32 intrinsics (JDK-8131048, JDK-8164920) > > CRC32 intrinsics was implemented and enhanced in jdk9 for ppc64le. > > We backported these optimizations to jdk8u. There are few changed in > > shared codes. I'm now trying to eliminate them. > > http://cr.openjdk.java.net/~horii/8166784/webrev.00/ > > > > 3. Elimination of memory fences before and after CAS > > We are discussing to eliminate memory fences in ParallelGC for > > JDK-8154736. We found that the same patterns in CMS (Probably, we > > will be able to find them in G1). We provided changes for jdk9 in > > the previous discussion with many supports from the community. The > > below changes are for jdk8u for ParallelGC and CMS. > > http://cr.openjdk.java.net/~horii/jdk8u_gcopt/webrev/ > > > > 4. Volatile access optimization > > In the current implementation, sync is called for each volatile read > > and lwsync for each volatile write. With this optimization, isync is > > called for each volatile read and lwsync and sync are called for > > each volatile write. I'm now writing a proof > > to validate these implementation. > > http://cr.openjdk.java.net/~horii/jdk8u_volatilereadopt/webrev/ > > > > 5. Tiered Compilation > > Tiered Compilation will be introduced in jdk9 for ppc64le. We > > backported Tiered Compilation to jdk8u. The following url includes > > all of the above changes because we backported this feature after > > applying them. Michi is now working to isolate dependencies in them. > > http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ppc64le/webrev/ > > http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ > ppc64le/hotspot/webrev/ > > http://cr.openjdk.java.net/~horii/jdk8u_support_tiered_ > ppc64le/jdk/webrev/ > > > > The above includes many changes in shared codes. We just want to add > > tiered compilation only for ppc64le. Michi is now minimizing changes > > in shared codes. Here is the current status. > > http://cr.openjdk.java.net/~horii/c1backport/webrev.01/ > > > > 6. Some bypath in JCL > > We experimentally implemented caches of java.util.Pattern, efficient > > String.format for a typical date format, and TreeMap optimization to > > work around typical mistakes for not-optimized Java codes. They do > > not guarantee specifications, I think. > > http://cr.openjdk.java.net/~horii/jdk8u_jcl_cheat/webrev/ > > > > I tested 5 and 6 on Ubuntu 16.04 as follows. > > > > $ sudo apt-get install openjdk-8-jdk openjdk-8-dbg mercurial zip > > bzip2 unzip tar curl libnuma-dev libasound2-dev libxtst-dev > > libfreetype6-dev libxrender-dev libcups2-dev libfreetype6-dev -y > > $ hg clone http://hg.openjdk.java.net/jdk8u/jdk8u > > $ cd jdk8u > > $ sh ./get_source.sh > > $ curl -o jdk8u.patch http://cr.openjdk.java.net/~horii/ > > jdk8u_support_tiered_ppc64le/webrev/jdk8u-dev.patch > > $ patch -p1 < jdk8u.patch > > $ cd hotspot > > $ curl -o hotspot.patch http://cr.openjdk.java.net/~horii/ > > jdk8u_support_tiered_ppc64le/hotspot/webrev/hotspot.patch > > $ patch -p1 < hotspot.patch > > $ curl -o jdk1.patch http://cr.openjdk.java.net/~horii/ > > jdk8u_support_tiered_ppc64le/jdk/webrev/jdk.patch > > $ curl -o jdk2.patch http://cr.openjdk.java.net/~horii/ > > jdk8u_jcl_cheat/webrev/jdk.patch > > $ patch -p1 < jdk1.patch > > $ patch -p1 < jdk2.patch > > $ cd ../ > > $ sh ./configure --with-freetype-include=/usr/include/freetype2/ -- > > with-freetype-lib=/usr/lib/powerpc64le-linux-gnu/ > > $ make all > > > > Regards, > > Hiroshi > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Mon Apr 10 09:46:04 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 10 Apr 2017 09:46:04 +0000 Subject: [PPC][Hotspot] Aparently unrelated SharedRuntime::c_calling_convention call fails when implementing new intrinsics In-Reply-To: References: Message-ID: <03ec681e790f4484a1830de8d475e0ae@sap.com> Hi Gustavo, before change "8086069: Adapt runtime calls to recent intrinsics to pass ints as long", it was required to convert int to long arguments for stub calls as well. This could be done in library_call by: Node* call = NULL; if (CCallingConventionRequiresIntsAsLongs) { Node* xlen_I2L = ConvI2L(xlen); Node* ylen_I2L = ConvI2L(ylen); Node* zlen_I2L = ConvI2L(zlen); call = make_runtime_call(RC_LEAF|RC_NO_FP, OptoRuntime::multiplyToLen_Type(), stubAddr, stubName, TypePtr::BOTTOM, x_start, xlen_I2L XTOP, y_start, ylen_I2L XTOP, z_start, zlen_I2L XTOP); } else { call = make_runtime_call(RC_LEAF|RC_NO_FP, OptoRuntime::multiplyToLen_Type(), stubAddr, stubName, TypePtr::BOTTOM, x_start, xlen, y_start, ylen, z_start, zlen); } In the current jdk9 code, stub calls are no longer performed according to the C calling convention (which requires int to long conversion on PPC64). The current stub code is designed to ignore the high 32 bits. Hence, the requirement for conversion only exists for real C calls, but no longer for stubs. Best regards, Martin -----Original Message----- From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet Sent: Freitag, 7. April 2017 16:15 To: jdk8u-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net Subject: [PPC][Hotspot] Aparently unrelated SharedRuntime::c_calling_convention call fails when implementing new intrinsics Hi, We implemented the MulAdd intrinsic on PPC on JDK 9 and now we're backporting it to 8 but I'm facing an exception which I assume it's a bug elsewhere: # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/sharedRuntime_ppc.cpp:737 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/gut/jdk8u-dev/hotspot/src/cpu/ppc/vm/sharedRuntime_ppc.cpp:737), pid=7631, tid=0x00003fff3454f1a0 # guarantee(i > 0 && sig_bt[i-1] == T_LONG) failed: argument of type (bt) should have been promoted to type (T_LONG,bt) for bt in {T_BOOLEAN, T_CHAR, T_BYTE, T_SHORT, T_INT} # # JRE version: OpenJDK Runtime Environment (8.0) (build 1.8.0-internal-debug-gut_2017_04_05_11_00-b00) # Java VM: OpenJDK 64-Bit Server VM (25.71-b00-debug mixed mode linux-ppc64 compressed oops) # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /home/gut/hs/hotspot/src/cpu/ppc/vm/tst/hs_err_pid7631.log # # Compiler replay data is saved as: # /home/gut/hs/hotspot/src/cpu/ppc/vm/tst/replay_pid7631.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # Current thread is 70365327192480 Dumping core ... Aborted Please take a look at the diff[1] for the new muladd and a test case[2] in java, which has an argument to repeat the main loop. Setting it with a high value such as 1234 is enough to jit the code and run the intrinsic. I also noticed that this check in hotspot/src/cpu/ppc/vm/sharedRuntime_ppc.cpp does not exist in JDK9 due to a changeset[3] that was not backported. But that didn't stop X64 MulAdd intrinsics to work as it is. As I implemented one with the same interface, I don't understand why it's happening now... Thanks in advance [1] https://gist.github.com/gut/3d5f7984ef3114113b224853867bc906#file-add-muladd-ppc-diff [2] https://gist.github.com/gut/3d5f7984ef3114113b224853867bc906#file-testmuladd-java [3] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/d7f63963925f#l3.7 From gustavo.scalet at eldorado.org.br Mon Apr 10 11:53:01 2017 From: gustavo.scalet at eldorado.org.br (Gustavo Serra Scalet) Date: Mon, 10 Apr 2017 11:53:01 +0000 Subject: [PPC][Hotspot] Aparently unrelated SharedRuntime::c_calling_convention call fails when implementing new intrinsics In-Reply-To: <03ec681e790f4484a1830de8d475e0ae@sap.com> References: <03ec681e790f4484a1830de8d475e0ae@sap.com> Message-ID: <3e0cc41ee0f146759d60034d6ca33555@serv030.corp.eldorado.org.br> Hello Martin, Thanks for explaining that! I'll perform these conversions on JDK8 and see how it goes. > -----Original Message----- > From: Doerr, Martin [mailto:martin.doerr at sap.com] > Sent: segunda-feira, 10 de abril de 2017 06:46 > To: Gustavo Serra Scalet ; jdk8u- > dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > Subject: RE: [PPC][Hotspot] Aparently unrelated > SharedRuntime::c_calling_convention call fails when implementing new > intrinsics > > Hi Gustavo, > > before change "8086069: Adapt runtime calls to recent intrinsics to pass > ints as long", it was required to convert int to long arguments for stub > calls as well. > > This could be done in library_call by: > Node* call = NULL; > if (CCallingConventionRequiresIntsAsLongs) { > Node* xlen_I2L = ConvI2L(xlen); > Node* ylen_I2L = ConvI2L(ylen); > Node* zlen_I2L = ConvI2L(zlen); > call = make_runtime_call(RC_LEAF|RC_NO_FP, > OptoRuntime::multiplyToLen_Type(), > stubAddr, stubName, TypePtr::BOTTOM, > x_start, xlen_I2L XTOP, y_start, ylen_I2L > XTOP, z_start, zlen_I2L XTOP); > } > else { > call = make_runtime_call(RC_LEAF|RC_NO_FP, > OptoRuntime::multiplyToLen_Type(), > stubAddr, stubName, TypePtr::BOTTOM, > x_start, xlen, y_start, ylen, z_start, > zlen); > } > > In the current jdk9 code, stub calls are no longer performed according > to the C calling convention (which requires int to long conversion on > PPC64). The current stub code is designed to ignore the high 32 bits. > Hence, the requirement for conversion only exists for real C calls, but > no longer for stubs. > > Best regards, > Martin > > > -----Original Message----- > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev- > bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet > Sent: Freitag, 7. April 2017 16:15 > To: jdk8u-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > Subject: [PPC][Hotspot] Aparently unrelated > SharedRuntime::c_calling_convention call fails when implementing new > intrinsics > > Hi, > > We implemented the MulAdd intrinsic on PPC on JDK 9 and now we're > backporting it to 8 but I'm facing an exception which I assume it's a > bug elsewhere: > # To suppress the following error report, specify this argument > # after -XX: or in .hotspotrc: > SuppressErrorAt=/sharedRuntime_ppc.cpp:737 > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/gut/jdk8u- > dev/hotspot/src/cpu/ppc/vm/sharedRuntime_ppc.cpp:737), pid=7631, > tid=0x00003fff3454f1a0 > # guarantee(i > 0 && sig_bt[i-1] == T_LONG) failed: argument of type > (bt) should have been promoted to type (T_LONG,bt) for bt in {T_BOOLEAN, > T_CHAR, T_BYTE, T_SHORT, T_INT} > # > # JRE version: OpenJDK Runtime Environment (8.0) (build 1.8.0-internal- > debug-gut_2017_04_05_11_00-b00) > # Java VM: OpenJDK 64-Bit Server VM (25.71-b00-debug mixed mode linux- > ppc64 compressed oops) > # Failed to write core dump. Core dumps have been disabled. To enable > core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /home/gut/hs/hotspot/src/cpu/ppc/vm/tst/hs_err_pid7631.log > # > # Compiler replay data is saved as: > # /home/gut/hs/hotspot/src/cpu/ppc/vm/tst/replay_pid7631.log > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # > Current thread is 70365327192480 > Dumping core ... > Aborted > > Please take a look at the diff[1] for the new muladd and a test case[2] > in java, which has an argument to repeat the main loop. Setting it with > a high value such as 1234 is enough to jit the code and run the > intrinsic. > > I also noticed that this check in > hotspot/src/cpu/ppc/vm/sharedRuntime_ppc.cpp does not exist in JDK9 due > to a changeset[3] that was not backported. But that didn't stop X64 > MulAdd intrinsics to work as it is. As I implemented one with the same > interface, I don't understand why it's happening now... > > Thanks in advance > > [1] https://gist.github.com/gut/3d5f7984ef3114113b224853867bc906#file- > add-muladd-ppc-diff > [2] https://gist.github.com/gut/3d5f7984ef3114113b224853867bc906#file- > testmuladd-java > [3] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/d7f63963925f#l3.7 From gromero at linux.vnet.ibm.com Mon Apr 10 13:08:25 2017 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 10 Apr 2017 10:08:25 -0300 Subject: ppc64le changes for jdk8u In-Reply-To: References: Message-ID: <58EB83C9.4090800@linux.vnet.ibm.com> Hi Thomas, Based on what I've discussed with David and Sean [1, 2] my current understanding is that once changes like that proposed by Hiroshi and Michi are on 10 it's possible to request an adhoc / per case request to backport it to 8u without the need to get the changes on 9 before. Regards, Gustavo [1] http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2017-March/002951.html [2] http://mail.openjdk.java.net/pipermail/jdk8u-dev/2017-March/006512.html From thomas.stuefe at gmail.com Mon Apr 10 13:16:56 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 10 Apr 2017 15:16:56 +0200 Subject: ppc64le changes for jdk8u In-Reply-To: <58EB83C9.4090800@linux.vnet.ibm.com> References: <58EB83C9.4090800@linux.vnet.ibm.com> Message-ID: Hi Gustavo, ah, thanks for clarifying! Kind Regards, Thomas On Mon, Apr 10, 2017 at 3:08 PM, Gustavo Romero wrote: > Hi Thomas, > > Based on what I've discussed with David and Sean [1, 2] my current > understanding > is that once changes like that proposed by Hiroshi and Michi are on 10 it's > possible to request an adhoc / per case request to backport it to 8u > without > the need to get the changes on 9 before. > > > Regards, > Gustavo > > [1] http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/ > 2017-March/002951.html > [2] http://mail.openjdk.java.net/pipermail/jdk8u-dev/2017- > March/006512.html > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gustavo.scalet at eldorado.org.br Mon Apr 10 13:58:39 2017 From: gustavo.scalet at eldorado.org.br (Gustavo Serra Scalet) Date: Mon, 10 Apr 2017 13:58:39 +0000 Subject: [PPC][Hotspot] Aparently unrelated SharedRuntime::c_calling_convention call fails when implementing new intrinsics In-Reply-To: <3e0cc41ee0f146759d60034d6ca33555@serv030.corp.eldorado.org.br> References: <03ec681e790f4484a1830de8d475e0ae@sap.com> <3e0cc41ee0f146759d60034d6ca33555@serv030.corp.eldorado.org.br> Message-ID: <880ad44e043e4fd7a6ccc2166e6b0d71@serv030.corp.eldorado.org.br> Wait, there is still something missing I didn't understand: Why would then this kind of stub work on X64? As I understood, I'd need to perform this change on hotspot/src/share/vm/opto/library_call.cpp , which is an arch-independent file. Wouldn't that be a drawback for other archs? Thanks > -----Original Message----- > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev- > bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet > Sent: segunda-feira, 10 de abril de 2017 08:53 > To: Doerr, Martin ; jdk8u-dev at openjdk.java.net; > ppc-aix-port-dev at openjdk.java.net > Subject: RE: [PPC][Hotspot] Aparently unrelated > SharedRuntime::c_calling_convention call fails when implementing new > intrinsics > > Hello Martin, > > Thanks for explaining that! I'll perform these conversions on JDK8 and > see how it goes. > > > -----Original Message----- > > From: Doerr, Martin [mailto:martin.doerr at sap.com] > > Sent: segunda-feira, 10 de abril de 2017 06:46 > > To: Gustavo Serra Scalet ; jdk8u- > > dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > > Subject: RE: [PPC][Hotspot] Aparently unrelated > > SharedRuntime::c_calling_convention call fails when implementing new > > intrinsics > > > > Hi Gustavo, > > > > before change "8086069: Adapt runtime calls to recent intrinsics to > > pass ints as long", it was required to convert int to long arguments > > for stub calls as well. > > > > This could be done in library_call by: > > Node* call = NULL; > > if (CCallingConventionRequiresIntsAsLongs) { > > Node* xlen_I2L = ConvI2L(xlen); > > Node* ylen_I2L = ConvI2L(ylen); > > Node* zlen_I2L = ConvI2L(zlen); > > call = make_runtime_call(RC_LEAF|RC_NO_FP, > > OptoRuntime::multiplyToLen_Type(), > > stubAddr, stubName, TypePtr::BOTTOM, > > x_start, xlen_I2L XTOP, y_start, > > ylen_I2L XTOP, z_start, zlen_I2L XTOP); > > } > > else { > > call = make_runtime_call(RC_LEAF|RC_NO_FP, > > OptoRuntime::multiplyToLen_Type(), > > stubAddr, stubName, TypePtr::BOTTOM, > > x_start, xlen, y_start, ylen, z_start, > > zlen); > > } > > > > In the current jdk9 code, stub calls are no longer performed according > > to the C calling convention (which requires int to long conversion on > > PPC64). The current stub code is designed to ignore the high 32 bits. > > Hence, the requirement for conversion only exists for real C calls, > > but no longer for stubs. > > > > Best regards, > > Martin > > > > > > -----Original Message----- > > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev- > > bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet > > Sent: Freitag, 7. April 2017 16:15 > > To: jdk8u-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > > Subject: [PPC][Hotspot] Aparently unrelated > > SharedRuntime::c_calling_convention call fails when implementing new > > intrinsics > > > > Hi, > > > > We implemented the MulAdd intrinsic on PPC on JDK 9 and now we're > > backporting it to 8 but I'm facing an exception which I assume it's a > > bug elsewhere: > > # To suppress the following error report, specify this argument # > > after -XX: or in .hotspotrc: > > SuppressErrorAt=/sharedRuntime_ppc.cpp:737 > > # > > # A fatal error has been detected by the Java Runtime Environment: > > # > > # Internal Error (/home/gut/jdk8u- > > dev/hotspot/src/cpu/ppc/vm/sharedRuntime_ppc.cpp:737), pid=7631, > > tid=0x00003fff3454f1a0 > > # guarantee(i > 0 && sig_bt[i-1] == T_LONG) failed: argument of type > > (bt) should have been promoted to type (T_LONG,bt) for bt in > > {T_BOOLEAN, T_CHAR, T_BYTE, T_SHORT, T_INT} # # JRE version: OpenJDK > > Runtime Environment (8.0) (build 1.8.0-internal- > > debug-gut_2017_04_05_11_00-b00) > > # Java VM: OpenJDK 64-Bit Server VM (25.71-b00-debug mixed mode linux- > > ppc64 compressed oops) > > # Failed to write core dump. Core dumps have been disabled. To enable > > core dumping, try "ulimit -c unlimited" before starting Java again # # > > An error report file with more information is saved as: > > # /home/gut/hs/hotspot/src/cpu/ppc/vm/tst/hs_err_pid7631.log > > # > > # Compiler replay data is saved as: > > # /home/gut/hs/hotspot/src/cpu/ppc/vm/tst/replay_pid7631.log > > # > > # If you would like to submit a bug report, please visit: > > # http://bugreport.java.com/bugreport/crash.jsp > > # > > Current thread is 70365327192480 > > Dumping core ... > > Aborted > > > > Please take a look at the diff[1] for the new muladd and a test > > case[2] in java, which has an argument to repeat the main loop. > > Setting it with a high value such as 1234 is enough to jit the code > > and run the intrinsic. > > > > I also noticed that this check in > > hotspot/src/cpu/ppc/vm/sharedRuntime_ppc.cpp does not exist in JDK9 > > due to a changeset[3] that was not backported. But that didn't stop > > X64 MulAdd intrinsics to work as it is. As I implemented one with the > > same interface, I don't understand why it's happening now... > > > > Thanks in advance > > > > [1] https://gist.github.com/gut/3d5f7984ef3114113b224853867bc906#file- > > add-muladd-ppc-diff > > [2] https://gist.github.com/gut/3d5f7984ef3114113b224853867bc906#file- > > testmuladd-java > > [3] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/d7f63963925f#l3.7 From martin.doerr at sap.com Mon Apr 10 14:22:08 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 10 Apr 2017 14:22:08 +0000 Subject: [PPC][Hotspot] Aparently unrelated SharedRuntime::c_calling_convention call fails when implementing new intrinsics In-Reply-To: <880ad44e043e4fd7a6ccc2166e6b0d71@serv030.corp.eldorado.org.br> References: <03ec681e790f4484a1830de8d475e0ae@sap.com> <3e0cc41ee0f146759d60034d6ca33555@serv030.corp.eldorado.org.br> <880ad44e043e4fd7a6ccc2166e6b0d71@serv030.corp.eldorado.org.br> Message-ID: <20c35e7b97144fea881e9e4ecf3a77a3@sap.com> Hi Gustavo, CCallingConventionRequiresIntsAsLongs is only true on PPC64 in jdk8u. I think runtime.cpp would also need a change in OptoRuntime::multiplyToLen_Type(): if (CCallingConventionRequiresIntsAsLongs) { fields[argp++] = TypePtr::NOTNULL; // x fields[argp++] = TypeLong::LONG; // xlen fields[argp++] = TypeLong::HALF; // placeholder fields[argp++] = TypePtr::NOTNULL; // y fields[argp++] = TypeLong::LONG; // ylen fields[argp++] = TypeLong::HALF; // placeholder fields[argp++] = TypePtr::NOTNULL; // z fields[argp++] = TypeLong::LONG; // zlen fields[argp++] = TypeLong::HALF; // placeholder } else { fields[argp++] = TypePtr::NOTNULL; // x fields[argp++] = TypeInt::INT; // xlen fields[argp++] = TypePtr::NOTNULL; // y fields[argp++] = TypeInt::INT; // ylen fields[argp++] = TypePtr::NOTNULL; // z fields[argp++] = TypeInt::INT; // zlen } I'm not saying that I like this code, but that's how we had used it in 8. Int to long conversion is needed as long as the stub call convention is not relaxed (as in jdk-8086069). Best regards, Martin -----Original Message----- From: Gustavo Serra Scalet [mailto:gustavo.scalet at eldorado.org.br] Sent: Montag, 10. April 2017 15:59 To: Doerr, Martin ; jdk8u-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net Subject: RE: [PPC][Hotspot] Aparently unrelated SharedRuntime::c_calling_convention call fails when implementing new intrinsics Wait, there is still something missing I didn't understand: Why would then this kind of stub work on X64? As I understood, I'd need to perform this change on hotspot/src/share/vm/opto/library_call.cpp , which is an arch-independent file. Wouldn't that be a drawback for other archs? Thanks > -----Original Message----- > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev- > bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet > Sent: segunda-feira, 10 de abril de 2017 08:53 > To: Doerr, Martin ; jdk8u-dev at openjdk.java.net; > ppc-aix-port-dev at openjdk.java.net > Subject: RE: [PPC][Hotspot] Aparently unrelated > SharedRuntime::c_calling_convention call fails when implementing new > intrinsics > > Hello Martin, > > Thanks for explaining that! I'll perform these conversions on JDK8 and > see how it goes. > > > -----Original Message----- > > From: Doerr, Martin [mailto:martin.doerr at sap.com] > > Sent: segunda-feira, 10 de abril de 2017 06:46 > > To: Gustavo Serra Scalet ; jdk8u- > > dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > > Subject: RE: [PPC][Hotspot] Aparently unrelated > > SharedRuntime::c_calling_convention call fails when implementing new > > intrinsics > > > > Hi Gustavo, > > > > before change "8086069: Adapt runtime calls to recent intrinsics to > > pass ints as long", it was required to convert int to long arguments > > for stub calls as well. > > > > This could be done in library_call by: > > Node* call = NULL; > > if (CCallingConventionRequiresIntsAsLongs) { > > Node* xlen_I2L = ConvI2L(xlen); > > Node* ylen_I2L = ConvI2L(ylen); > > Node* zlen_I2L = ConvI2L(zlen); > > call = make_runtime_call(RC_LEAF|RC_NO_FP, > > OptoRuntime::multiplyToLen_Type(), > > stubAddr, stubName, TypePtr::BOTTOM, > > x_start, xlen_I2L XTOP, y_start, > > ylen_I2L XTOP, z_start, zlen_I2L XTOP); > > } > > else { > > call = make_runtime_call(RC_LEAF|RC_NO_FP, > > OptoRuntime::multiplyToLen_Type(), > > stubAddr, stubName, TypePtr::BOTTOM, > > x_start, xlen, y_start, ylen, z_start, > > zlen); > > } > > > > In the current jdk9 code, stub calls are no longer performed according > > to the C calling convention (which requires int to long conversion on > > PPC64). The current stub code is designed to ignore the high 32 bits. > > Hence, the requirement for conversion only exists for real C calls, > > but no longer for stubs. > > > > Best regards, > > Martin > > > > > > -----Original Message----- > > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev- > > bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet > > Sent: Freitag, 7. April 2017 16:15 > > To: jdk8u-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > > Subject: [PPC][Hotspot] Aparently unrelated > > SharedRuntime::c_calling_convention call fails when implementing new > > intrinsics > > > > Hi, > > > > We implemented the MulAdd intrinsic on PPC on JDK 9 and now we're > > backporting it to 8 but I'm facing an exception which I assume it's a > > bug elsewhere: > > # To suppress the following error report, specify this argument # > > after -XX: or in .hotspotrc: > > SuppressErrorAt=/sharedRuntime_ppc.cpp:737 > > # > > # A fatal error has been detected by the Java Runtime Environment: > > # > > # Internal Error (/home/gut/jdk8u- > > dev/hotspot/src/cpu/ppc/vm/sharedRuntime_ppc.cpp:737), pid=7631, > > tid=0x00003fff3454f1a0 > > # guarantee(i > 0 && sig_bt[i-1] == T_LONG) failed: argument of type > > (bt) should have been promoted to type (T_LONG,bt) for bt in > > {T_BOOLEAN, T_CHAR, T_BYTE, T_SHORT, T_INT} # # JRE version: OpenJDK > > Runtime Environment (8.0) (build 1.8.0-internal- > > debug-gut_2017_04_05_11_00-b00) > > # Java VM: OpenJDK 64-Bit Server VM (25.71-b00-debug mixed mode linux- > > ppc64 compressed oops) > > # Failed to write core dump. Core dumps have been disabled. To enable > > core dumping, try "ulimit -c unlimited" before starting Java again # # > > An error report file with more information is saved as: > > # /home/gut/hs/hotspot/src/cpu/ppc/vm/tst/hs_err_pid7631.log > > # > > # Compiler replay data is saved as: > > # /home/gut/hs/hotspot/src/cpu/ppc/vm/tst/replay_pid7631.log > > # > > # If you would like to submit a bug report, please visit: > > # http://bugreport.java.com/bugreport/crash.jsp > > # > > Current thread is 70365327192480 > > Dumping core ... > > Aborted > > > > Please take a look at the diff[1] for the new muladd and a test > > case[2] in java, which has an argument to repeat the main loop. > > Setting it with a high value such as 1234 is enough to jit the code > > and run the intrinsic. > > > > I also noticed that this check in > > hotspot/src/cpu/ppc/vm/sharedRuntime_ppc.cpp does not exist in JDK9 > > due to a changeset[3] that was not backported. But that didn't stop > > X64 MulAdd intrinsics to work as it is. As I implemented one with the > > same interface, I don't understand why it's happening now... > > > > Thanks in advance > > > > [1] https://gist.github.com/gut/3d5f7984ef3114113b224853867bc906#file- > > add-muladd-ppc-diff > > [2] https://gist.github.com/gut/3d5f7984ef3114113b224853867bc906#file- > > testmuladd-java > > [3] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/d7f63963925f#l3.7 From gustavo.scalet at eldorado.org.br Mon Apr 10 19:47:18 2017 From: gustavo.scalet at eldorado.org.br (Gustavo Serra Scalet) Date: Mon, 10 Apr 2017 19:47:18 +0000 Subject: [PPC][Hotspot] Aparently unrelated SharedRuntime::c_calling_convention call fails when implementing new intrinsics In-Reply-To: <20c35e7b97144fea881e9e4ecf3a77a3@sap.com> References: <03ec681e790f4484a1830de8d475e0ae@sap.com> <3e0cc41ee0f146759d60034d6ca33555@serv030.corp.eldorado.org.br> <880ad44e043e4fd7a6ccc2166e6b0d71@serv030.corp.eldorado.org.br> <20c35e7b97144fea881e9e4ecf3a77a3@sap.com> Message-ID: <6699343c2d054db98cef28af9272a60d@serv030.corp.eldorado.org.br> Hello Martin, Just a short feedback: you got it all right! I could build the backport after changing what you pointed out and it's now working. Thanks a lot once again. > -----Original Message----- > From: Doerr, Martin [mailto:martin.doerr at sap.com] > Sent: segunda-feira, 10 de abril de 2017 11:22 > To: Gustavo Serra Scalet ; jdk8u- > dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > Subject: RE: [PPC][Hotspot] Aparently unrelated > SharedRuntime::c_calling_convention call fails when implementing new > intrinsics > > Hi Gustavo, > > CCallingConventionRequiresIntsAsLongs is only true on PPC64 in jdk8u. > > I think runtime.cpp would also need a change in > OptoRuntime::multiplyToLen_Type(): > if (CCallingConventionRequiresIntsAsLongs) { > fields[argp++] = TypePtr::NOTNULL; // x > fields[argp++] = TypeLong::LONG; // xlen > fields[argp++] = TypeLong::HALF; // placeholder > fields[argp++] = TypePtr::NOTNULL; // y > fields[argp++] = TypeLong::LONG; // ylen > fields[argp++] = TypeLong::HALF; // placeholder > fields[argp++] = TypePtr::NOTNULL; // z > fields[argp++] = TypeLong::LONG; // zlen > fields[argp++] = TypeLong::HALF; // placeholder > } else { > fields[argp++] = TypePtr::NOTNULL; // x > fields[argp++] = TypeInt::INT; // xlen > fields[argp++] = TypePtr::NOTNULL; // y > fields[argp++] = TypeInt::INT; // ylen > fields[argp++] = TypePtr::NOTNULL; // z > fields[argp++] = TypeInt::INT; // zlen > } > > I'm not saying that I like this code, but that's how we had used it in > 8. > > Int to long conversion is needed as long as the stub call convention is > not relaxed (as in jdk-8086069). > > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Serra Scalet [mailto:gustavo.scalet at eldorado.org.br] > Sent: Montag, 10. April 2017 15:59 > To: Doerr, Martin ; jdk8u-dev at openjdk.java.net; > ppc-aix-port-dev at openjdk.java.net > Subject: RE: [PPC][Hotspot] Aparently unrelated > SharedRuntime::c_calling_convention call fails when implementing new > intrinsics > > Wait, there is still something missing I didn't understand: Why would > then this kind of stub work on X64? > > As I understood, I'd need to perform this change on > hotspot/src/share/vm/opto/library_call.cpp , which is an arch- > independent file. Wouldn't that be a drawback for other archs? > > Thanks > > > -----Original Message----- > > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev- > > bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet > > Sent: segunda-feira, 10 de abril de 2017 08:53 > > To: Doerr, Martin ; jdk8u-dev at openjdk.java.net; > > ppc-aix-port-dev at openjdk.java.net > > Subject: RE: [PPC][Hotspot] Aparently unrelated > > SharedRuntime::c_calling_convention call fails when implementing new > > intrinsics > > > > Hello Martin, > > > > Thanks for explaining that! I'll perform these conversions on JDK8 and > > see how it goes. > > > > > -----Original Message----- > > > From: Doerr, Martin [mailto:martin.doerr at sap.com] > > > Sent: segunda-feira, 10 de abril de 2017 06:46 > > > To: Gustavo Serra Scalet ; jdk8u- > > > dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > > > Subject: RE: [PPC][Hotspot] Aparently unrelated > > > SharedRuntime::c_calling_convention call fails when implementing new > > > intrinsics > > > > > > Hi Gustavo, > > > > > > before change "8086069: Adapt runtime calls to recent intrinsics to > > > pass ints as long", it was required to convert int to long arguments > > > for stub calls as well. > > > > > > This could be done in library_call by: > > > Node* call = NULL; > > > if (CCallingConventionRequiresIntsAsLongs) { > > > Node* xlen_I2L = ConvI2L(xlen); > > > Node* ylen_I2L = ConvI2L(ylen); > > > Node* zlen_I2L = ConvI2L(zlen); > > > call = make_runtime_call(RC_LEAF|RC_NO_FP, > > > OptoRuntime::multiplyToLen_Type(), > > > stubAddr, stubName, TypePtr::BOTTOM, > > > x_start, xlen_I2L XTOP, y_start, > > > ylen_I2L XTOP, z_start, zlen_I2L XTOP); > > > } > > > else { > > > call = make_runtime_call(RC_LEAF|RC_NO_FP, > > > OptoRuntime::multiplyToLen_Type(), > > > stubAddr, stubName, TypePtr::BOTTOM, > > > x_start, xlen, y_start, ylen, > > > z_start, zlen); > > > } > > > > > > In the current jdk9 code, stub calls are no longer performed > > > according to the C calling convention (which requires int to long > > > conversion on PPC64). The current stub code is designed to ignore > the high 32 bits. > > > Hence, the requirement for conversion only exists for real C calls, > > > but no longer for stubs. > > > > > > Best regards, > > > Martin > > > > > > > > > -----Original Message----- > > > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev- > > > bounces at openjdk.java.net] On Behalf Of Gustavo Serra Scalet > > > Sent: Freitag, 7. April 2017 16:15 > > > To: jdk8u-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net > > > Subject: [PPC][Hotspot] Aparently unrelated > > > SharedRuntime::c_calling_convention call fails when implementing new > > > intrinsics > > > > > > Hi, > > > > > > We implemented the MulAdd intrinsic on PPC on JDK 9 and now we're > > > backporting it to 8 but I'm facing an exception which I assume it's > > > a bug elsewhere: > > > # To suppress the following error report, specify this argument # > > > after -XX: or in .hotspotrc: > > > SuppressErrorAt=/sharedRuntime_ppc.cpp:737 > > > # > > > # A fatal error has been detected by the Java Runtime Environment: > > > # > > > # Internal Error (/home/gut/jdk8u- > > > dev/hotspot/src/cpu/ppc/vm/sharedRuntime_ppc.cpp:737), pid=7631, > > > tid=0x00003fff3454f1a0 > > > # guarantee(i > 0 && sig_bt[i-1] == T_LONG) failed: argument of > > > type > > > (bt) should have been promoted to type (T_LONG,bt) for bt in > > > {T_BOOLEAN, T_CHAR, T_BYTE, T_SHORT, T_INT} # # JRE version: OpenJDK > > > Runtime Environment (8.0) (build 1.8.0-internal- > > > debug-gut_2017_04_05_11_00-b00) > > > # Java VM: OpenJDK 64-Bit Server VM (25.71-b00-debug mixed mode > > > linux- > > > ppc64 compressed oops) > > > # Failed to write core dump. Core dumps have been disabled. To > > > enable core dumping, try "ulimit -c unlimited" before starting Java > > > again # # An error report file with more information is saved as: > > > # /home/gut/hs/hotspot/src/cpu/ppc/vm/tst/hs_err_pid7631.log > > > # > > > # Compiler replay data is saved as: > > > # /home/gut/hs/hotspot/src/cpu/ppc/vm/tst/replay_pid7631.log > > > # > > > # If you would like to submit a bug report, please visit: > > > # http://bugreport.java.com/bugreport/crash.jsp > > > # > > > Current thread is 70365327192480 > > > Dumping core ... > > > Aborted > > > > > > Please take a look at the diff[1] for the new muladd and a test > > > case[2] in java, which has an argument to repeat the main loop. > > > Setting it with a high value such as 1234 is enough to jit the code > > > and run the intrinsic. > > > > > > I also noticed that this check in > > > hotspot/src/cpu/ppc/vm/sharedRuntime_ppc.cpp does not exist in JDK9 > > > due to a changeset[3] that was not backported. But that didn't stop > > > X64 MulAdd intrinsics to work as it is. As I implemented one with > > > the same interface, I don't understand why it's happening now... > > > > > > Thanks in advance > > > > > > [1] > > > https://gist.github.com/gut/3d5f7984ef3114113b224853867bc906#file- > > > add-muladd-ppc-diff > > > [2] > > > https://gist.github.com/gut/3d5f7984ef3114113b224853867bc906#file- > > > testmuladd-java > > > [3] > > > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/d7f63963925f#l3.7 From HORIE at jp.ibm.com Tue Apr 11 10:36:39 2017 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Tue, 11 Apr 2017 10:36:39 +0000 Subject: Optimizing byte reverse code for int value In-Reply-To: <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com> References: <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com>, Message-ID: An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Tue Apr 11 13:43:04 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 11 Apr 2017 13:43:04 +0000 Subject: Optimizing byte reverse code for int value In-Reply-To: <89abbea5-9998-2e4d-62d3-e1f3e9bbd1d5@redhat.com> References: <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com> <89abbea5-9998-2e4d-62d3-e1f3e9bbd1d5@redhat.com> Message-ID: Hi Andrew, thank you for your helpful comments. I fully agree with you. In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64. SPARC already has them: match(Set dst (ReverseBytesI/L/US/S (LoadI src))); match(Set dst (StoreI dst (ReverseBytesI/L/US/S src))); I think we should add them for jdk10. They should be used when the platform endianness doesn't match the bigEndian parameter in Unsafe methods. Best regards, Martin -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley Sent: Dienstag, 11. April 2017 13:02 To: Michihiro Horie Cc: Simonis, Volker ; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Hiroshi H Horii Subject: Re: Optimizing byte reverse code for int value On 11/04/17 11:36, Michihiro Horie wrote: > Thank you very much for letting us know Unsafe.getIntUnaligned is available in > JDK9. I do agree we should fix Java source code. > We think our byte-reverse optimization would still work on jdk8u as Hiroshi > mentioned. Would you agree on this point? I do, but I do not agree that this patch should necessarily be done in the PowerPC-specific back end. Have you considered it as a generic optimization for all processors? Andrew. From HORIE at jp.ibm.com Tue Apr 11 14:55:26 2017 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Tue, 11 Apr 2017 14:55:26 +0000 Subject: Optimizing byte reverse code for int value In-Reply-To: References: , <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com> <89abbea5-9998-2e4d-62d3-e1f3e9bbd1d5@redhat.com> Message-ID: An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Tue Apr 11 15:12:50 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 11 Apr 2017 15:12:50 +0000 Subject: Optimizing byte reverse code for int value In-Reply-To: References: , <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com> <89abbea5-9998-2e4d-62d3-e1f3e9bbd1d5@redhat.com> Message-ID: <622fa3e77da546dfb5155a1e4afacd7c@sap.com> Hi Michihiro, thanks for the quick reply. I think Andrew?s idea is to optimize in the shared code instead of the platform backends. I haven?t thought about where this could be done. Or would it be possible to backport jdk (especially Unsafe) changes? If the required changes are small enough and we don?t have to touch any public interface, this might be an option, too. We?ll appreciate if you take care of the new match rules for PPC64. Thanks a lot. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Dienstag, 11. April 2017 16:55 To: Doerr, Martin Cc: aph at redhat.com; Hiroshi H Horii ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker Subject: RE: Optimizing byte reverse code for int value Andrew, Martin, Thanks a lot for your helpful feedback. >Have you considered it as a generic optimization for all processors? We would support all processors for our byte-reverse optimization to make it generic. Currently, I just finished adding match rules for little endian and big endian on PPC64, and am testing it in AIX. >In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64. We would like to handle adding match rules for byte reverse load/store instructions on PPC64 for JDK10 if you would not mind. Would it be fine with you? Best regards, -- Michihiro, IBM Research - Tokyo ----- Original message ----- From: "Doerr, Martin" > To: Andrew Haley >, Michihiro Horie/Japan/IBM at IBMJP Cc: "Simonis, Volker" >, "ppc-aix-port-dev at openjdk.java.net" >, "hotspot-dev at openjdk.java.net" >, Hiroshi H Horii/Japan/IBM at IBMJP Subject: RE: Optimizing byte reverse code for int value Date: Tue, Apr 11, 2017 10:44 PM Hi Andrew, thank you for your helpful comments. I fully agree with you. In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64. SPARC already has them: match(Set dst (ReverseBytesI/L/US/S (LoadI src))); match(Set dst (StoreI dst (ReverseBytesI/L/US/S src))); I think we should add them for jdk10. They should be used when the platform endianness doesn't match the bigEndian parameter in Unsafe methods. Best regards, Martin -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley Sent: Dienstag, 11. April 2017 13:02 To: Michihiro Horie > Cc: Simonis, Volker >; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Hiroshi H Horii > Subject: Re: Optimizing byte reverse code for int value On 11/04/17 11:36, Michihiro Horie wrote: > Thank you very much for letting us know Unsafe.getIntUnaligned is available in > JDK9. I do agree we should fix Java source code. > We think our byte-reverse optimization would still work on jdk8u as Hiroshi > mentioned. Would you agree on this point? I do, but I do not agree that this patch should necessarily be done in the PowerPC-specific back end. Have you considered it as a generic optimization for all processors? Andrew. -------------- next part -------------- An HTML attachment was scrubbed... URL: From HORIE at jp.ibm.com Tue Apr 11 15:35:08 2017 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Tue, 11 Apr 2017 15:35:08 +0000 Subject: Optimizing byte reverse code for int value In-Reply-To: <622fa3e77da546dfb5155a1e4afacd7c@sap.com> References: <622fa3e77da546dfb5155a1e4afacd7c@sap.com>, , <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com> <89abbea5-9998-2e4d-62d3-e1f3e9bbd1d5@redhat.com> Message-ID: An HTML attachment was scrubbed... URL: From bruno.rosa at eldorado.org.br Wed Apr 12 19:23:08 2017 From: bruno.rosa at eldorado.org.br (Bruno Alexandre Rosa) Date: Wed, 12 Apr 2017 19:23:08 +0000 Subject: Backport performance enhancement to jdk8 Message-ID: <8da7082df9714e698f2928a36a963665@serv030.corp.eldorado.org.br> Hi, everyone, Recently our team implemented the missing mulAdd and squareToLen intrinsics on ppc64. Our initial approach does not use vector instructions, but we are also looking into that. Running SPECjvm2008's crypto.rsa benchmark on jdk9 we got around 4% performance gain. Backporting it to jdk8 we noticed a more remarkable ~56% gain. We think that this gain justifies a backport. However, we are also aware that patches are backported mainly when they address bugfixes. Would this be acceptable for the community? Regards, Bruno Rosa -------------- next part -------------- An HTML attachment was scrubbed... URL: From HORIE at jp.ibm.com Fri Apr 21 16:18:27 2017 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Fri, 21 Apr 2017 16:18:27 +0000 Subject: Optimizing byte reverse code for int value In-Reply-To: <622fa3e77da546dfb5155a1e4afacd7c@sap.com> References: <622fa3e77da546dfb5155a1e4afacd7c@sap.com>, , <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com> <89abbea5-9998-2e4d-62d3-e1f3e9bbd1d5@redhat.com> Message-ID: An HTML attachment was scrubbed... URL: From HORIE at jp.ibm.com Fri Apr 21 18:52:49 2017 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Sat, 22 Apr 2017 03:52:49 +0900 Subject: Optimizing byte reverse code for int value In-Reply-To: References: , <622fa3e77da546dfb5155a1e4afacd7c@sap.com>, , <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com> <89abbea5-9998-2e4d-62d3-e1f3e9bbd1d5@redhat.com> Message-ID: I noticed my silly mistake that previous change does not work in big endian, although I ran jtreg.. This time I checked also with my micro benchmark ReadFloatTest.java. Would you review the following newest change? Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.02/ (See attached file: ReadFloatTest.java) Best regards, -- Michihiro, IBM Research - Tokyo ----- Original message ----- From: Michihiro Horie/Japan/IBM To: martin.doerr at sap.com Cc: aph at redhat.com, Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, volker.simonis at sap.com, Gustavo Bueno Romero/Brazil/IBM at IBMBR Subject: RE: Optimizing byte reverse code for int value Date: Sat, Apr 22, 2017 1:18 AM Would you review following change for jdk8? Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.01/ Our byte-reverse optimization now works in shared code. I tested it with jtreg on x86, ppc64, and ppc64le. Best regards, -- Michihiro, IBM Research - Tokyo ----- Original message ----- From: "Doerr, Martin" To: Michihiro Horie/Japan/IBM at IBMJP Cc: "aph at redhat.com" , Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-dev at openjdk.java.net" , "ppc-aix-port-dev at openjdk.java.net" , "Simonis, Volker" Subject: RE: Optimizing byte reverse code for int value Date: Wed, Apr 12, 2017 12:13 AM Hi Michihiro, thanks for the quick reply. I think Andrew?s idea is to optimize in the shared code instead of the platform backends. I haven?t thought about where this could be done. Or would it be possible to backport jdk (especially Unsafe) changes? If the required changes are small enough and we don?t have to touch any public interface, this might be an option, too. We?ll appreciate if you take care of the new match rules for PPC64. Thanks a lot. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Dienstag, 11. April 2017 16:55 To: Doerr, Martin Cc: aph at redhat.com; Hiroshi H Horii ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker Subject: RE: Optimizing byte reverse code for int value Andrew, Martin, Thanks a lot for your helpful feedback. >Have you considered it as a generic optimization for all processors? We would support all processors for our byte-reverse optimization to make it generic. Currently, I just finished adding match rules for little endian and big endian on PPC64, and am testing it in AIX. >In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64. We would like to handle adding match rules for byte reverse load/store instructions on PPC64 for JDK10 if you would not mind. Would it be fine with you? Best regards, -- Michihiro, IBM Research - Tokyo ----- Original message ----- From: "Doerr, Martin" To: Andrew Haley , Michihiro Horie/Japan/IBM at IBMJP Cc: "Simonis, Volker" , " ppc-aix-port-dev at openjdk.java.net" , " hotspot-dev at openjdk.java.net" , Hiroshi H Horii/Japan/IBM at IBMJP Subject: RE: Optimizing byte reverse code for int value Date: Tue, Apr 11, 2017 10:44 PM Hi Andrew, thank you for your helpful comments. I fully agree with you. In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64. SPARC already has them: match(Set dst (ReverseBytesI/L/US/S (LoadI src))); match(Set dst (StoreI dst (ReverseBytesI/L/US/S src))); I think we should add them for jdk10. They should be used when the platform endianness doesn't match the bigEndian parameter in Unsafe methods. Best regards, Martin -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley Sent: Dienstag, 11. April 2017 13:02 To: Michihiro Horie Cc: Simonis, Volker ; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Hiroshi H Horii Subject: Re: Optimizing byte reverse code for int value On 11/04/17 11:36, Michihiro Horie wrote: > Thank you very much for letting us know Unsafe.getIntUnaligned is available in > JDK9. I do agree we should fix Java source code. > We think our byte-reverse optimization would still work on jdk8u as Hiroshi > mentioned. Would you agree on this point? I do, but I do not agree that this patch should necessarily be done in the PowerPC-specific back end. Have you considered it as a generic optimization for all processors? Andrew. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ReadFloatTest.java Type: application/octet-stream Size: 3602 bytes Desc: not available URL: From martin.doerr at sap.com Mon Apr 24 09:10:20 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 24 Apr 2017 09:10:20 +0000 Subject: Optimizing byte reverse code for int value In-Reply-To: References: <622fa3e77da546dfb5155a1e4afacd7c@sap.com>, , <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com> <89abbea5-9998-2e4d-62d3-e1f3e9bbd1d5@redhat.com> Message-ID: <2e13a32b56cd4d9f89758f4042602e9a@sap.com> Hi Michihiro, please note that I?m not a jdk8u reviewer. However, I have taken a quick look and I have the following concerns: 1. I think it?s incorrect for Big Endian. 2. The pattern can also match for an unaligned 4 byte address which would break platforms like SPARC. 3. I couldn?t see checks for shift amount and masks. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Freitag, 21. April 2017 18:18 To: Doerr, Martin Cc: aph at redhat.com; Hiroshi H Horii ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; Gustavo Bueno Romero Subject: RE: Optimizing byte reverse code for int value Would you review following change for jdk8? Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.01/ Our byte-reverse optimization now works in shared code. I tested it with jtreg on x86, ppc64, and ppc64le. Best regards, -- Michihiro, IBM Research - Tokyo ----- Original message ----- From: "Doerr, Martin" > To: Michihiro Horie/Japan/IBM at IBMJP Cc: "aph at redhat.com" >, Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-dev at openjdk.java.net" >, "ppc-aix-port-dev at openjdk.java.net" >, "Simonis, Volker" > Subject: RE: Optimizing byte reverse code for int value Date: Wed, Apr 12, 2017 12:13 AM Hi Michihiro, thanks for the quick reply. I think Andrew?s idea is to optimize in the shared code instead of the platform backends. I haven?t thought about where this could be done. Or would it be possible to backport jdk (especially Unsafe) changes? If the required changes are small enough and we don?t have to touch any public interface, this might be an option, too. We?ll appreciate if you take care of the new match rules for PPC64. Thanks a lot. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Dienstag, 11. April 2017 16:55 To: Doerr, Martin > Cc: aph at redhat.com; Hiroshi H Horii >; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker > Subject: RE: Optimizing byte reverse code for int value Andrew, Martin, Thanks a lot for your helpful feedback. >Have you considered it as a generic optimization for all processors? We would support all processors for our byte-reverse optimization to make it generic. Currently, I just finished adding match rules for little endian and big endian on PPC64, and am testing it in AIX. >In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64. We would like to handle adding match rules for byte reverse load/store instructions on PPC64 for JDK10 if you would not mind. Would it be fine with you? Best regards, -- Michihiro, IBM Research - Tokyo ----- Original message ----- From: "Doerr, Martin" > To: Andrew Haley >, Michihiro Horie/Japan/IBM at IBMJP Cc: "Simonis, Volker" >, "ppc-aix-port-dev at openjdk.java.net" >, "hotspot-dev at openjdk.java.net" >, Hiroshi H Horii/Japan/IBM at IBMJP Subject: RE: Optimizing byte reverse code for int value Date: Tue, Apr 11, 2017 10:44 PM Hi Andrew, thank you for your helpful comments. I fully agree with you. In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64. SPARC already has them: match(Set dst (ReverseBytesI/L/US/S (LoadI src))); match(Set dst (StoreI dst (ReverseBytesI/L/US/S src))); I think we should add them for jdk10. They should be used when the platform endianness doesn't match the bigEndian parameter in Unsafe methods. Best regards, Martin -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley Sent: Dienstag, 11. April 2017 13:02 To: Michihiro Horie > Cc: Simonis, Volker >; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Hiroshi H Horii > Subject: Re: Optimizing byte reverse code for int value On 11/04/17 11:36, Michihiro Horie wrote: > Thank you very much for letting us know Unsafe.getIntUnaligned is available in > JDK9. I do agree we should fix Java source code. > We think our byte-reverse optimization would still work on jdk8u as Hiroshi > mentioned. Would you agree on this point? I do, but I do not agree that this patch should necessarily be done in the PowerPC-specific back end. Have you considered it as a generic optimization for all processors? Andrew. -------------- next part -------------- An HTML attachment was scrubbed... URL: From HORIE at jp.ibm.com Wed Apr 26 03:09:33 2017 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Wed, 26 Apr 2017 12:09:33 +0900 Subject: Optimizing byte reverse code for int value In-Reply-To: <2e13a32b56cd4d9f89758f4042602e9a@sap.com> References: <622fa3e77da546dfb5155a1e4afacd7c@sap.com>, , <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com> <89abbea5-9998-2e4d-62d3-e1f3e9bbd1d5@redhat.com> <2e13a32b56cd4d9f89758f4042602e9a@sap.com> Message-ID: Martin, Thanks a lot for your comments. I fixed my code. Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.05/ Best regards, -- Michihiro, IBM Research - Tokyo From: "Doerr, Martin" To: Michihiro Horie/Japan/IBM at IBMJP Cc: "aph at redhat.com" , Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-dev at openjdk.java.net" , "ppc-aix-port-dev at openjdk.java.net" , "Simonis, Volker" , Gustavo Bueno Romero Date: 2017/04/24 18:11 Subject: RE: Optimizing byte reverse code for int value Hi Michihiro, please note that I?m not a jdk8u reviewer. However, I have taken a quick look and I have the following concerns: 1. I think it?s incorrect for Big Endian. 2. The pattern can also match for an unaligned 4 byte address which would break platforms like SPARC. 3. I couldn?t see checks for shift amount and masks. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Freitag, 21. April 2017 18:18 To: Doerr, Martin Cc: aph at redhat.com; Hiroshi H Horii ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; Gustavo Bueno Romero Subject: RE: Optimizing byte reverse code for int value Would you review following change for jdk8? Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.01/ Our byte-reverse optimization now works in shared code. I tested it with jtreg on x86, ppc64, and ppc64le. Best regards, -- Michihiro, IBM Research - Tokyo ----- Original message ----- From: "Doerr, Martin" To: Michihiro Horie/Japan/IBM at IBMJP Cc: "aph at redhat.com" , Hiroshi H Horii/Japan/IBM at IBMJP, " hotspot-dev at openjdk.java.net" , " ppc-aix-port-dev at openjdk.java.net" , "Simonis, Volker" Subject: RE: Optimizing byte reverse code for int value Date: Wed, Apr 12, 2017 12:13 AM Hi Michihiro, thanks for the quick reply. I think Andrew?s idea is to optimize in the shared code instead of the platform backends. I haven?t thought about where this could be done. Or would it be possible to backport jdk (especially Unsafe) changes? If the required changes are small enough and we don?t have to touch any public interface, this might be an option, too. We?ll appreciate if you take care of the new match rules for PPC64. Thanks a lot. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Dienstag, 11. April 2017 16:55 To: Doerr, Martin Cc: aph at redhat.com; Hiroshi H Horii ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker Subject: RE: Optimizing byte reverse code for int value Andrew, Martin, Thanks a lot for your helpful feedback. >Have you considered it as a generic optimization for all processors? We would support all processors for our byte-reverse optimization to make it generic. Currently, I just finished adding match rules for little endian and big endian on PPC64, and am testing it in AIX. >In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64. We would like to handle adding match rules for byte reverse load/store instructions on PPC64 for JDK10 if you would not mind. Would it be fine with you? Best regards, -- Michihiro, IBM Research - Tokyo ----- Original message ----- From: "Doerr, Martin" To: Andrew Haley , Michihiro Horie/Japan/IBM at IBMJP Cc: "Simonis, Volker" , " ppc-aix-port-dev at openjdk.java.net" , " hotspot-dev at openjdk.java.net" , Hiroshi H Horii/Japan/IBM at IBMJP Subject: RE: Optimizing byte reverse code for int value Date: Tue, Apr 11, 2017 10:44 PM Hi Andrew, thank you for your helpful comments. I fully agree with you. In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64. SPARC already has them: match(Set dst (ReverseBytesI/L/US/S (LoadI src))); match(Set dst (StoreI dst (ReverseBytesI/L/US/S src))); I think we should add them for jdk10. They should be used when the platform endianness doesn't match the bigEndian parameter in Unsafe methods. Best regards, Martin -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley Sent: Dienstag, 11. April 2017 13:02 To: Michihiro Horie Cc: Simonis, Volker ; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Hiroshi H Horii Subject: Re: Optimizing byte reverse code for int value On 11/04/17 11:36, Michihiro Horie wrote: > Thank you very much for letting us know Unsafe.getIntUnaligned is available in > JDK9. I do agree we should fix Java source code. > We think our byte-reverse optimization would still work on jdk8u as Hiroshi > mentioned. Would you agree on this point? I do, but I do not agree that this patch should necessarily be done in the PowerPC-specific back end. Have you considered it as a generic optimization for all processors? Andrew. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From martin.doerr at sap.com Wed Apr 26 09:02:49 2017 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 26 Apr 2017 09:02:49 +0000 Subject: Optimizing byte reverse code for int value In-Reply-To: References: <622fa3e77da546dfb5155a1e4afacd7c@sap.com>, , <362a21f4-277c-c3f3-f7f0-08b55c8b2b0b@redhat.com> <89abbea5-9998-2e4d-62d3-e1f3e9bbd1d5@redhat.com> <2e13a32b56cd4d9f89758f4042602e9a@sap.com> Message-ID: <174bf72968b5473cb3757a4f1c125bf7@sap.com> Hi Michihiro, this looks better, now. Just a few comments: - I think "UseUnalignedAccesses" should be used instead of #ifdef SPARC. Other platforms can also be affected. - In theory, I think that an ordered load may get matched which would get replaced by an unordered one. I guess this would probably never occur, but I think such changes should be absolutely bullet proof :) Besides that, it looks correct to me. @Andrew: Do you think this is the right way to do it and is there a chance to get it in jdk8u? Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Mittwoch, 26. April 2017 05:10 To: Doerr, Martin Cc: aph at redhat.com; Gustavo Bueno Romero ; Hiroshi H Horii ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker Subject: RE: Optimizing byte reverse code for int value Martin, Thanks a lot for your comments. I fixed my code. Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.05/ Best regards, -- Michihiro, IBM Research - Tokyo [Inactive hide details for "Doerr, Martin" ---2017/04/24 18:11:29---Hi Michihiro, please note that I'm not a jdk8u reviewer.]"Doerr, Martin" ---2017/04/24 18:11:29---Hi Michihiro, please note that I'm not a jdk8u reviewer. From: "Doerr, Martin" > To: Michihiro Horie/Japan/IBM at IBMJP Cc: "aph at redhat.com" >, Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-dev at openjdk.java.net" >, "ppc-aix-port-dev at openjdk.java.net" >, "Simonis, Volker" >, Gustavo Bueno Romero > Date: 2017/04/24 18:11 Subject: RE: Optimizing byte reverse code for int value ________________________________ Hi Michihiro, please note that I'm not a jdk8u reviewer. However, I have taken a quick look and I have the following concerns: 1. I think it's incorrect for Big Endian. 2. The pattern can also match for an unaligned 4 byte address which would break platforms like SPARC. 3. I couldn't see checks for shift amount and masks. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Freitag, 21. April 2017 18:18 To: Doerr, Martin > Cc: aph at redhat.com; Hiroshi H Horii >; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker >; Gustavo Bueno Romero > Subject: RE: Optimizing byte reverse code for int value Would you review following change for jdk8? Webrev: http://cr.openjdk.java.net/~horii/8178294/webrev.01/ Our byte-reverse optimization now works in shared code. I tested it with jtreg on x86, ppc64, and ppc64le. Best regards, -- Michihiro, IBM Research - Tokyo ----- Original message ----- From: "Doerr, Martin" > To: Michihiro Horie/Japan/IBM at IBMJP Cc: "aph at redhat.com" >, Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-dev at openjdk.java.net" >, "ppc-aix-port-dev at openjdk.java.net" >, "Simonis, Volker" > Subject: RE: Optimizing byte reverse code for int value Date: Wed, Apr 12, 2017 12:13 AM Hi Michihiro, thanks for the quick reply. I think Andrew's idea is to optimize in the shared code instead of the platform backends. I haven't thought about where this could be done. Or would it be possible to backport jdk (especially Unsafe) changes? If the required changes are small enough and we don't have to touch any public interface, this might be an option, too. We'll appreciate if you take care of the new match rules for PPC64. Thanks a lot. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Dienstag, 11. April 2017 16:55 To: Doerr, Martin > Cc: aph at redhat.com; Hiroshi H Horii >; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker > Subject: RE: Optimizing byte reverse code for int value Andrew, Martin, Thanks a lot for your helpful feedback. >Have you considered it as a generic optimization for all processors? We would support all processors for our byte-reverse optimization to make it generic. Currently, I just finished adding match rules for little endian and big endian on PPC64, and am testing it in AIX. >In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64. We would like to handle adding match rules for byte reverse load/store instructions on PPC64 for JDK10 if you would not mind. Would it be fine with you? Best regards, -- Michihiro, IBM Research - Tokyo ----- Original message ----- From: "Doerr, Martin" > To: Andrew Haley >, Michihiro Horie/Japan/IBM at IBMJP Cc: "Simonis, Volker" >, "ppc-aix-port-dev at openjdk.java.net" >, "hotspot-dev at openjdk.java.net" >, Hiroshi H Horii/Japan/IBM at IBMJP Subject: RE: Optimizing byte reverse code for int value Date: Tue, Apr 11, 2017 10:44 PM Hi Andrew, thank you for your helpful comments. I fully agree with you. In addition, I noticed that we don't have match rules which exploit byte reverse load/store instructions on PPC64. SPARC already has them: match(Set dst (ReverseBytesI/L/US/S (LoadI src))); match(Set dst (StoreI dst (ReverseBytesI/L/US/S src))); I think we should add them for jdk10. They should be used when the platform endianness doesn't match the bigEndian parameter in Unsafe methods. Best regards, Martin -----Original Message----- From: hotspot-dev [mailto:hotspot-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley Sent: Dienstag, 11. April 2017 13:02 To: Michihiro Horie > Cc: Simonis, Volker >; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Hiroshi H Horii > Subject: Re: Optimizing byte reverse code for int value On 11/04/17 11:36, Michihiro Horie wrote: > Thank you very much for letting us know Unsafe.getIntUnaligned is available in > JDK9. I do agree we should fix Java source code. > We think our byte-reverse optimization would still work on jdk8u as Hiroshi > mentioned. Would you agree on this point? I do, but I do not agree that this patch should necessarily be done in the PowerPC-specific back end. Have you considered it as a generic optimization for all processors? Andrew. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From thomas.stuefe at gmail.com Wed Apr 26 14:03:28 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 26 Apr 2017 16:03:28 +0200 Subject: RFR(xs, jdk10, aix-only): 8171504: [aix] On AIX, -XXaltjvm= option is ignored Message-ID: Hi all, may I please have reviews for this small change. It adapts the -XXaltjvm handling to match all other platforms. (Note that this patch has not the intention of improving the code, I almost verbatim copied the code from linux. Improvements were done in different changes, see e.g. JDK-8171508 ("os::jvm_path -XXaltjvm processing error after 8066474") and JDK-8173828 ("realpath is unsafe"). Issue: https://bugs.openjdk.java.net/browse/JDK-8171504 Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8171504-aix-xxaltjvm-path-is-ignored/webrev.00/webrev/ Thank you! Kind Regards, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.langer at sap.com Thu Apr 27 16:39:45 2017 From: christoph.langer at sap.com (Langer, Christoph) Date: Thu, 27 Apr 2017 16:39:45 +0000 Subject: RFR(xs, jdk10, aix-only): 8171504: [aix] On AIX, -XXaltjvm= option is ignored In-Reply-To: References: Message-ID: <7b179216a3014caeaaf6dabc33b3687b@sap.com> Hi Thomas, to me this AIX only change looks good - the only diff to the Linux implementation is the realpath call. Reviewed. Best regards Christoph > -----Original Message----- > From: hotspot-runtime-dev [mailto:hotspot-runtime-dev- > bounces at openjdk.java.net] On Behalf Of Thomas St?fe > Sent: Mittwoch, 26. April 2017 16:03 > To: ppc-aix-port-dev at openjdk.java.net; hotspot-runtime- > dev at openjdk.java.net > Subject: RFR(xs, jdk10, aix-only): 8171504: [aix] On AIX, -XXaltjvm= > option is ignored > > Hi all, > > may I please have reviews for this small change. It adapts the -XXaltjvm > handling to match all other platforms. > > (Note that this patch has not the intention of improving the code, I almost > verbatim copied the code from linux. Improvements were done in different > changes, see e.g. JDK-8171508 ("os::jvm_path -XXaltjvm processing error > after 8066474") and JDK-8173828 ("realpath is unsafe"). > Issue: https://bugs.openjdk.java.net/browse/JDK-8171504 > > Webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8171504-aix-xxaltjvm-path-is- > ignored/webrev.00/webrev/ > > Thank you! > > Kind Regards, Thomas From thomas.stuefe at gmail.com Thu Apr 27 17:16:50 2017 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 27 Apr 2017 19:16:50 +0200 Subject: RFR(xs, jdk10, aix-only): 8171504: [aix] On AIX, -XXaltjvm= option is ignored In-Reply-To: <7b179216a3014caeaaf6dabc33b3687b@sap.com> References: <7b179216a3014caeaaf6dabc33b3687b@sap.com> Message-ID: Thank you Christoph! ..Thomas On Thu, Apr 27, 2017 at 6:39 PM, Langer, Christoph wrote: > Hi Thomas, > > to me this AIX only change looks good - the only diff to the Linux > implementation is the realpath call. Reviewed. > > Best regards > Christoph > > > -----Original Message----- > > From: hotspot-runtime-dev [mailto:hotspot-runtime-dev- > > bounces at openjdk.java.net] On Behalf Of Thomas St?fe > > Sent: Mittwoch, 26. April 2017 16:03 > > To: ppc-aix-port-dev at openjdk.java.net; hotspot-runtime- > > dev at openjdk.java.net > > Subject: RFR(xs, jdk10, aix-only): 8171504: [aix] On AIX, > -XXaltjvm= > > option is ignored > > > > Hi all, > > > > may I please have reviews for this small change. It adapts the -XXaltjvm > > handling to match all other platforms. > > > > (Note that this patch has not the intention of improving the code, I > almost > > verbatim copied the code from linux. Improvements were done in different > > changes, see e.g. JDK-8171508 ("os::jvm_path -XXaltjvm processing error > > after 8066474") and JDK-8173828 ("realpath is unsafe"). > > Issue: https://bugs.openjdk.java.net/browse/JDK-8171504 > > > > Webrev: > > http://cr.openjdk.java.net/~stuefe/webrevs/8171504-aix-xxaltjvm-path-is- > > ignored/webrev.00/webrev/ > > > > Thank you! > > > > Kind Regards, Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: