From gromero at linux.vnet.ibm.com Fri Apr 1 20:36:03 2016 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Fri, 1 Apr 2016 17:36:03 -0300 Subject: PPC64 VSX load/store instructions in stubs Message-ID: <56FEDBB3.5030106@linux.vnet.ibm.com> Hi Martin, Hi Volker Currently VSX load/store instructions are not being used in PPC64 stubs, particularly in arraycopy stubs inside generate_arraycopy_stubs() like, but not limited to, generate_disjoint_{byte,short,int,long}_copy. We can speed up mass copy using VSX (Vector-Scalar Extension) load/store instruction in processors >= POWER8, the same way it's already done for libc memcpy(). This is an initial patch just for jshort_disjoint_arraycopy() VSX vector load/store: http://81.de.7a9f.ip4.static.sl-reverse.com/202539/webrev What are your thoughts on that? Is there any impediment to use VSX instructions in OpenJDK at the moment? Thank you. Best regards, Gustavo From martin.doerr at sap.com Tue Apr 5 14:13:49 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 5 Apr 2016 14:13:49 +0000 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: <56FEDBB3.5030106@linux.vnet.ibm.com> References: <56FEDBB3.5030106@linux.vnet.ibm.com> Message-ID: Hi Gustavo, I think such changes are appreciated if they improve performance. I think VSX instructions can be used as long as we don't violate the ABI (only use volatile registers). If you add tests for availability of instructions to vm_version_ppc, please also add them to the feature-string in VM_Version::initialize() (add a "%s" and the name of the instruction). We can assist in getting such changes pushed into hs-comp. Thanks for working on it. Best regards, Martin -----Original Message----- From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] Sent: Freitag, 1. April 2016 22:36 To: Doerr, Martin ; Simonis, Volker ; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net Cc: brenohl at br.ibm.com Subject: PPC64 VSX load/store instructions in stubs Hi Martin, Hi Volker Currently VSX load/store instructions are not being used in PPC64 stubs, particularly in arraycopy stubs inside generate_arraycopy_stubs() like, but not limited to, generate_disjoint_{byte,short,int,long}_copy. We can speed up mass copy using VSX (Vector-Scalar Extension) load/store instruction in processors >= POWER8, the same way it's already done for libc memcpy(). This is an initial patch just for jshort_disjoint_arraycopy() VSX vector load/store: http://81.de.7a9f.ip4.static.sl-reverse.com/202539/webrev What are your thoughts on that? Is there any impediment to use VSX instructions in OpenJDK at the moment? Thank you. Best regards, Gustavo From volker.simonis at gmail.com Tue Apr 5 17:23:54 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 5 Apr 2016 19:23:54 +0200 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: <56FEDBB3.5030106@linux.vnet.ibm.com> References: <56FEDBB3.5030106@linux.vnet.ibm.com> Message-ID: Hi Gustavo, thanks a lot for your contribution. Can you please describe if you've run benchmarks and which performance improvements you saw? With your change if we're running on Power 8, we will only use the fast path for arrays with at least 32 elements. For smaller arrays, we will fall-back to copying only 2 elements at a time which will be slower than the initial version which copied 4 at a time in that case. Did you verified your changes on both, little and big endian? And what about unaligned memory accesses? As far as I read, lxvd2x/stxvd2x still work, but may be slower. I saw there also exist instructions for aligned load/stores. Would it make sens (performance-wise) to use them for the cases where we can be sure that we have aligned memory accesses? Thank you and best regards, Volker On Fri, Apr 1, 2016 at 10:36 PM, Gustavo Romero wrote: > Hi Martin, Hi Volker > > Currently VSX load/store instructions are not being used in PPC64 stubs, > particularly in arraycopy stubs inside generate_arraycopy_stubs() like, > but not limited to, generate_disjoint_{byte,short,int,long}_copy. > > We can speed up mass copy using VSX (Vector-Scalar Extension) load/store > instruction in processors >= POWER8, the same way it's already done for > libc memcpy(). > > This is an initial patch just for jshort_disjoint_arraycopy() VSX vector > load/store: > > http://81.de.7a9f.ip4.static.sl-reverse.com/202539/webrev > > What are your thoughts on that? Is there any impediment to use VSX > instructions in OpenJDK at the moment? > > Thank you. > > Best regards, > Gustavo > From thomas.stuefe at gmail.com Thu Apr 7 10:14:22 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 7 Apr 2016 12:14:22 +0200 Subject: RFR(xs): 8153727: AIX jdk build broken after 8145174 Message-ID: Hi all, please review this tiny build fix for AIX. Thank you! Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8153727-fix-aixbuild-after-8145174/webrev.00/webrev/ Bug: https://bugs.openjdk.java.net/browse/JDK-8153727 Kind Regards, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sergey.Bylokhov at oracle.com Thu Apr 7 10:18:48 2016 From: Sergey.Bylokhov at oracle.com (Sergey Bylokhov) Date: Thu, 7 Apr 2016 13:18:48 +0300 Subject: RFR(xs): 8153727: AIX jdk build broken after 8145174 In-Reply-To: References: Message-ID: <57063408.9050206@oracle.com> Looks fine. On 07.04.16 13:14, Thomas St?fe wrote: > Hi all, > > please review this tiny build fix for AIX. Thank you! > > Webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8153727-fix-aixbuild-after-8145174/webrev.00/webrev/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153727 > > Kind Regards, Thomas -- Best regards, Sergey. From volker.simonis at gmail.com Thu Apr 7 10:27:02 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Thu, 7 Apr 2016 12:27:02 +0200 Subject: RFR(xs): 8153727: AIX jdk build broken after 8145174 In-Reply-To: References: Message-ID: Hi Thomas, thanks for doing this fix. It looks good. I've also forwarded your request to build-dev as this is a build change. Just one question: do we pass the new test test/java/awt/SplashScreen/MultiResolutionSplash/unix/UnixMultiResolutionSplashTest.java which came in with 8145174 on AIX or do we have to fix it as well? Regards, Volker On Thu, Apr 7, 2016 at 12:14 PM, Thomas St?fe wrote: > Hi all, > > please review this tiny build fix for AIX. Thank you! > > Webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8153727-fix-aixbuild-after-8145174/webrev.00/webrev/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153727 > > Kind Regards, Thomas From thomas.stuefe at gmail.com Thu Apr 7 10:40:09 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 7 Apr 2016 12:40:09 +0200 Subject: RFR(xs): 8153727: AIX jdk build broken after 8145174 In-Reply-To: References: Message-ID: Hi Volker, On Thu, Apr 7, 2016 at 12:27 PM, Volker Simonis wrote: > Hi Thomas, > > thanks for doing this fix. It looks good. > > I've also forwarded your request to build-dev as this is a build change. > > Just one question: do we pass the new test > > test/java/awt/SplashScreen/MultiResolutionSplash/unix/UnixMultiResolutionSplashTest.java > which came in with 8145174 on AIX or do we have to fix it as well? > > Sorry, cannot answer right now, because my jtreg on AIX seems broken. Lets wait for the nightly test results. Regards, Thomas > Regards, > Volker > > > > On Thu, Apr 7, 2016 at 12:14 PM, Thomas St?fe > wrote: > > Hi all, > > > > please review this tiny build fix for AIX. Thank you! > > > > Webrev: > > > http://cr.openjdk.java.net/~stuefe/webrevs/8153727-fix-aixbuild-after-8145174/webrev.00/webrev/ > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153727 > > > > Kind Regards, Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.joelsson at oracle.com Thu Apr 7 10:48:26 2016 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Thu, 7 Apr 2016 12:48:26 +0200 Subject: RFR(xs): 8153727: AIX jdk build broken after 8145174 In-Reply-To: References: Message-ID: <57063AFA.2070806@oracle.com> Looks good to me. /Erik On 2016-04-07 12:40, Thomas St?fe wrote: > Hi Volker, > > > > On Thu, Apr 7, 2016 at 12:27 PM, Volker Simonis > wrote: > >> Hi Thomas, >> >> thanks for doing this fix. It looks good. >> >> I've also forwarded your request to build-dev as this is a build change. >> >> Just one question: do we pass the new test >> >> test/java/awt/SplashScreen/MultiResolutionSplash/unix/UnixMultiResolutionSplashTest.java >> which came in with 8145174 on AIX or do we have to fix it as well? >> >> > Sorry, cannot answer right now, because my jtreg on AIX seems broken. Lets > wait for the nightly test results. > > Regards, Thomas > > >> Regards, >> Volker >> >> >> >> On Thu, Apr 7, 2016 at 12:14 PM, Thomas St?fe >> wrote: >>> Hi all, >>> >>> please review this tiny build fix for AIX. Thank you! >>> >>> Webrev: >>> >> http://cr.openjdk.java.net/~stuefe/webrevs/8153727-fix-aixbuild-after-8145174/webrev.00/webrev/ >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153727 >>> >>> Kind Regards, Thomas From thomas.stuefe at gmail.com Thu Apr 7 10:57:47 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 7 Apr 2016 12:57:47 +0200 Subject: RFR(xs): 8153727: AIX jdk build broken after 8145174 In-Reply-To: <57063408.9050206@oracle.com> References: <57063408.9050206@oracle.com> Message-ID: Thank you! On Thu, Apr 7, 2016 at 12:18 PM, Sergey Bylokhov wrote: > Looks fine. > > On 07.04.16 13:14, Thomas St?fe wrote: > >> Hi all, >> >> please review this tiny build fix for AIX. Thank you! >> >> Webrev: >> >> http://cr.openjdk.java.net/~stuefe/webrevs/8153727-fix-aixbuild-after-8145174/webrev.00/webrev/ >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8153727 >> >> Kind Regards, Thomas >> > > > -- > Best regards, Sergey. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Thu Apr 7 10:57:58 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 7 Apr 2016 12:57:58 +0200 Subject: RFR(xs): 8153727: AIX jdk build broken after 8145174 In-Reply-To: <57063AFA.2070806@oracle.com> References: <57063AFA.2070806@oracle.com> Message-ID: Thanks all! On Thu, Apr 7, 2016 at 12:48 PM, Erik Joelsson wrote: > Looks good to me. > > /Erik > > > On 2016-04-07 12:40, Thomas St?fe wrote: > >> Hi Volker, >> >> >> >> On Thu, Apr 7, 2016 at 12:27 PM, Volker Simonis > > >> wrote: >> >> Hi Thomas, >>> >>> thanks for doing this fix. It looks good. >>> >>> I've also forwarded your request to build-dev as this is a build change. >>> >>> Just one question: do we pass the new test >>> >>> >>> test/java/awt/SplashScreen/MultiResolutionSplash/unix/UnixMultiResolutionSplashTest.java >>> which came in with 8145174 on AIX or do we have to fix it as well? >>> >>> >>> Sorry, cannot answer right now, because my jtreg on AIX seems broken. >> Lets >> wait for the nightly test results. >> >> Regards, Thomas >> >> >> Regards, >>> Volker >>> >>> >>> >>> On Thu, Apr 7, 2016 at 12:14 PM, Thomas St?fe >>> wrote: >>> >>>> Hi all, >>>> >>>> please review this tiny build fix for AIX. Thank you! >>>> >>>> Webrev: >>>> >>>> >>> http://cr.openjdk.java.net/~stuefe/webrevs/8153727-fix-aixbuild-after-8145174/webrev.00/webrev/ >>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153727 >>>> >>>> Kind Regards, Thomas >>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From HORII at jp.ibm.com Fri Apr 8 10:53:48 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Fri, 8 Apr 2016 10:53:48 +0000 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 Message-ID: <201604081054.u38AsJb4014954@d19av07.sagamino.japan.ibm.com> Dear all: Can I please request reviews for the following change? This change was created for JDK 9 and ppc64. Description: This change adds options of compare-and-exchange for POWER architecture. As described in atomic_linux_ppc.inline.hpp, the current implementation of cmpxchg is fence_cmpxchg_acquire. This implementation is useful for general purposes because twice calls of sync before and after cmpxchg will keep consistency. However, they sometimes cause overheads because sync instructions are very expensive in the current POWER chip design. With this change, callers can explicitly specify to run fence and acquire with two additional bool parameters. Because their default values are "true", it is not necessary to modify existing cmpxchg calls. In addition, with the new parameters of cmpxchg, this change improves performance of copy_to_survivor in the parallel GC. copy_to_survivor changes forward pointers by using cmpxchg. This operation doesn't require any sync instructions, in my understanding. A pointer is changed at most once in a GC and when cmpxchg fails, the latest pointer is available for the caller. When I evaluated SPECjbb2013 (slightly customized because obsolete grizzly doesn't support new version format of Java 9), pause time of young GC was reduced from 10% to 20%. Summary of source code changes: * src/share/vm/runtime/atomic.hpp * src/share/vm/runtime/atomic.cpp * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp - Add two arguments of fence and acquire to cmpxchg only for PPC64. Though cmpxchg in atomic_linux_ppc.inline.hpp has some branches, they are reduced while inlining to callers. * src/share/vm/oops/oop.inline.hpp - Changed cas_set_mark to call cmpxchg without fence and acquire. cas_set_mark is called only by cas_forward_to that is called only by copy_to_survivor_space and oop_promotion_failed in psPromotionManager. Code change: Please see an attached diff file that was generated with "hg diff -g" under the latest hotspot directory. Passed test: SPECjbb2013 (customized) * I believe some other cmpxchg will be optimized by reducing fence or acquire because twice calls of sync are too conservative to implement Java memory model. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64_cmpxchg_opt.diff Type: application/octet-stream Size: 8837 bytes Desc: not available URL: From mikael.vidstedt at oracle.com Fri Apr 8 22:33:33 2016 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Fri, 8 Apr 2016 15:33:33 -0700 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub Message-ID: <570831BD.7080005@oracle.com> Please review: Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ * Note: this is patch 2 in a set of 3 all aiming to clean up and unify the unsafe memory getters/setters, along with the handling of unsafe access errors. The other two issues are: https://bugs.openjdk.java.net/browse/JDK-8153890 - Handle unsafe access error as an asynchronous exception https://bugs.openjdk.java.net/browse/JDK-8150921 - Update Unsafe getters/setters to use double-register variants * Summary (copied from the bug description) In certain cases, such as accessing a region of a memory mapped file which has been truncated on unix-style operating systems, a SIGBUS signal will be raised and the VM will process it in the signal handler. How the signal is processed differs depending on the operating system and/or CPU architecture, with two major alternatives: * "stubless" Do the necessary thread state updates directly in the signal handler, and modify the context so that the signal handler returns to the place where the execution should continue * Using a stub Update the context so that when the signal handler returns the thread will continue execution in a generated stub, which in turn will call some native code in the VM to update the thread state and figure out where execution should continue. The stub will then jump to that new place. It should be noted that the work of updating the thread state is very small - it's setting a flag or two in the thread structure, and figures out where the next instruction starts. It should also be noted that the generated stubs today are broken, because they do not preserve all the live registers over the call into the VM. There are two ways to address this: * Preserve all the necessary registers This would mean implementing, in macro assembly, the necessary logic for preserving all the live registers, including, but not limited to, floating point registers, flag registers, etc. It quickly becomes obvious that this platform specific and error prone. * Leverage the fact that the operating system already does this as part of the signal handling Do the necessary work in the signal handler instead, removing the need for the stub alltogether As mentioned, on some platforms the latter model is already in use. It is dramatically easier and all platforms should be updated to do it the same way. * Testing Just as mentioned in the RFR for JDK-8153890, a new test was developed to test this code path: http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java In fact, it was when running this test I found the register preservation issue. JPRT also passes. Much like JDK-8153890 I wanted to get some feedback on this before running additional tests. Cheers, Mikael From david.holmes at oracle.com Mon Apr 11 00:57:47 2016 From: david.holmes at oracle.com (David Holmes) Date: Mon, 11 Apr 2016 10:57:47 +1000 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: <570831BD.7080005@oracle.com> References: <570831BD.7080005@oracle.com> Message-ID: <570AF68B.9090707@oracle.com> Hi Mikael, I think we need to be able to answer the question as to why the stubbed and stubless forms of this code exist to ensure that converting all platforms to the same form is appropriate. I'm still going through this but my initial reaction is to wonder why we don't use the same form of handle_unsafe_access on all platforms and always pass in npc? (That seems to be the only difference in code that otherwise seems platform independent.) Thanks, David On 9/04/2016 8:33 AM, Mikael Vidstedt wrote: > > Please review: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 > Webrev: > http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ > > > * Note: this is patch 2 in a set of 3 all aiming to clean up and unify > the unsafe memory getters/setters, along with the handling of unsafe > access errors. The other two issues are: > > https://bugs.openjdk.java.net/browse/JDK-8153890 - Handle unsafe access > error as an asynchronous exception > https://bugs.openjdk.java.net/browse/JDK-8150921 - Update Unsafe > getters/setters to use double-register variants > > > * Summary (copied from the bug description) > > > In certain cases, such as accessing a region of a memory mapped file > which has been truncated on unix-style operating systems, a SIGBUS > signal will be raised and the VM will process it in the signal handler. > > How the signal is processed differs depending on the operating system > and/or CPU architecture, with two major alternatives: > > * "stubless" > > Do the necessary thread state updates directly in the signal handler, > and modify the context so that the signal handler returns to the place > where the execution should continue > > * Using a stub > > Update the context so that when the signal handler returns the thread > will continue execution in a generated stub, which in turn will call > some native code in the VM to update the thread state and figure out > where execution should continue. The stub will then jump to that new place. > > > It should be noted that the work of updating the thread state is very > small - it's setting a flag or two in the thread structure, and figures > out where the next instruction starts. It should also be noted that the > generated stubs today are broken, because they do not preserve all the > live registers over the call into the VM. There are two ways to address > this: > > * Preserve all the necessary registers > > This would mean implementing, in macro assembly, the necessary logic for > preserving all the live registers, including, but not limited to, > floating point registers, flag registers, etc. It quickly becomes > obvious that this platform specific and error prone. > > * Leverage the fact that the operating system already does this as part > of the signal handling > > Do the necessary work in the signal handler instead, removing the need > for the stub alltogether > > As mentioned, on some platforms the latter model is already in use. It > is dramatically easier and all platforms should be updated to do it the > same way. > > > * Testing > > Just as mentioned in the RFR for JDK-8153890, a new test was developed > to test this code path: > > http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java > > In fact, it was when running this test I found the register preservation > issue. JPRT also passes. Much like JDK-8153890 I wanted to get some > feedback on this before running additional tests. > > > Cheers, > Mikael > From thomas.stuefe at gmail.com Mon Apr 11 09:03:19 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 11 Apr 2016 11:03:19 +0200 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: <570831BD.7080005@oracle.com> References: <570831BD.7080005@oracle.com> Message-ID: Hi Mikael, just a question, should the new stubless functions not live someplace else instead of in stubRoutines_ ? After all, they are not stub routines anymore. Kind Regards, Thomas On Sat, Apr 9, 2016 at 12:33 AM, Mikael Vidstedt wrote: > > Please review: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 > Webrev: > http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ > > * Note: this is patch 2 in a set of 3 all aiming to clean up and unify the > unsafe memory getters/setters, along with the handling of unsafe access > errors. The other two issues are: > > https://bugs.openjdk.java.net/browse/JDK-8153890 - Handle unsafe access > error as an asynchronous exception > https://bugs.openjdk.java.net/browse/JDK-8150921 - Update Unsafe > getters/setters to use double-register variants > > > * Summary (copied from the bug description) > > > In certain cases, such as accessing a region of a memory mapped file which > has been truncated on unix-style operating systems, a SIGBUS signal will be > raised and the VM will process it in the signal handler. > > How the signal is processed differs depending on the operating system > and/or CPU architecture, with two major alternatives: > > * "stubless" > > Do the necessary thread state updates directly in the signal handler, and > modify the context so that the signal handler returns to the place where > the execution should continue > > * Using a stub > > Update the context so that when the signal handler returns the thread will > continue execution in a generated stub, which in turn will call some native > code in the VM to update the thread state and figure out where execution > should continue. The stub will then jump to that new place. > > > It should be noted that the work of updating the thread state is very > small - it's setting a flag or two in the thread structure, and figures out > where the next instruction starts. It should also be noted that the > generated stubs today are broken, because they do not preserve all the live > registers over the call into the VM. There are two ways to address this: > > * Preserve all the necessary registers > > This would mean implementing, in macro assembly, the necessary logic for > preserving all the live registers, including, but not limited to, floating > point registers, flag registers, etc. It quickly becomes obvious that this > platform specific and error prone. > > * Leverage the fact that the operating system already does this as part of > the signal handling > > Do the necessary work in the signal handler instead, removing the need for > the stub alltogether > > As mentioned, on some platforms the latter model is already in use. It is > dramatically easier and all platforms should be updated to do it the same > way. > > > * Testing > > Just as mentioned in the RFR for JDK-8153890, a new test was developed to > test this code path: > > http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java > > In fact, it was when running this test I found the register preservation > issue. JPRT also passes. Much like JDK-8153890 I wanted to get some > feedback on this before running additional tests. > > > Cheers, > Mikael > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Mon Apr 11 09:05:46 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 11 Apr 2016 11:05:46 +0200 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: <570AF68B.9090707@oracle.com> References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com> Message-ID: Hi David, On Mon, Apr 11, 2016 at 2:57 AM, David Holmes wrote: > Hi Mikael, > > I think we need to be able to answer the question as to why the stubbed > and stubless forms of this code exist to ensure that converting all > platforms to the same form is appropriate. > > I'm still going through this but my initial reaction is to wonder why we > don't use the same form of handle_unsafe_access on all platforms and always > pass in npc? (That seems to be the only difference in code that otherwise > seems platform independent.) > On Solaris we get the npc for free in the signal ucontext. On x86 it has to be calculated. But yes, this could be moved out of the handle functions and just passed in. I also saw that we apparently miss handling for ppc. No one seemed to miss it until now, but it may make sense to add handling anyway. Kind Regards, Thomas > > Thanks, > David > > > On 9/04/2016 8:33 AM, Mikael Vidstedt wrote: > >> >> Please review: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 >> Webrev: >> >> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ >> >> >> * Note: this is patch 2 in a set of 3 all aiming to clean up and unify >> the unsafe memory getters/setters, along with the handling of unsafe >> access errors. The other two issues are: >> >> https://bugs.openjdk.java.net/browse/JDK-8153890 - Handle unsafe access >> error as an asynchronous exception >> https://bugs.openjdk.java.net/browse/JDK-8150921 - Update Unsafe >> getters/setters to use double-register variants >> >> >> * Summary (copied from the bug description) >> >> >> In certain cases, such as accessing a region of a memory mapped file >> which has been truncated on unix-style operating systems, a SIGBUS >> signal will be raised and the VM will process it in the signal handler. >> >> How the signal is processed differs depending on the operating system >> and/or CPU architecture, with two major alternatives: >> >> * "stubless" >> >> Do the necessary thread state updates directly in the signal handler, >> and modify the context so that the signal handler returns to the place >> where the execution should continue >> >> * Using a stub >> >> Update the context so that when the signal handler returns the thread >> will continue execution in a generated stub, which in turn will call >> some native code in the VM to update the thread state and figure out >> where execution should continue. The stub will then jump to that new >> place. >> >> >> It should be noted that the work of updating the thread state is very >> small - it's setting a flag or two in the thread structure, and figures >> out where the next instruction starts. It should also be noted that the >> generated stubs today are broken, because they do not preserve all the >> live registers over the call into the VM. There are two ways to address >> this: >> >> * Preserve all the necessary registers >> >> This would mean implementing, in macro assembly, the necessary logic for >> preserving all the live registers, including, but not limited to, >> floating point registers, flag registers, etc. It quickly becomes >> obvious that this platform specific and error prone. >> >> * Leverage the fact that the operating system already does this as part >> of the signal handling >> >> Do the necessary work in the signal handler instead, removing the need >> for the stub alltogether >> >> As mentioned, on some platforms the latter model is already in use. It >> is dramatically easier and all platforms should be updated to do it the >> same way. >> >> >> * Testing >> >> Just as mentioned in the RFR for JDK-8153890, a new test was developed >> to test this code path: >> >> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java >> >> In fact, it was when running this test I found the register preservation >> issue. JPRT also passes. Much like JDK-8153890 I wanted to get some >> feedback on this before running additional tests. >> >> >> Cheers, >> Mikael >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Mon Apr 11 09:15:09 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 11 Apr 2016 11:15:09 +0200 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com> Message-ID: On Mon, Apr 11, 2016 at 11:05 AM, Thomas St?fe wrote: > Hi David, > > On Mon, Apr 11, 2016 at 2:57 AM, David Holmes > wrote: > >> Hi Mikael, >> >> I think we need to be able to answer the question as to why the stubbed >> and stubless forms of this code exist to ensure that converting all >> platforms to the same form is appropriate. >> >> I'm still going through this but my initial reaction is to wonder why we >> don't use the same form of handle_unsafe_access on all platforms and always >> pass in npc? (That seems to be the only difference in code that otherwise >> seems platform independent.) >> > > On Solaris we get the npc for free in the signal ucontext. On x86 it has > to be calculated. But yes, this could be moved out of the handle functions > and just passed in. > > I also saw that we apparently miss handling for ppc. No one seemed to miss > it until now, but it may make sense to add handling anyway. > > Oh, we do not miss it. Volker just showed me that it is done directly in the signal handlers for AIX and Linux ppc. > Kind Regards, Thomas > > >> >> Thanks, >> David >> >> >> On 9/04/2016 8:33 AM, Mikael Vidstedt wrote: >> >>> >>> Please review: >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 >>> Webrev: >>> >>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ >>> >>> >>> * Note: this is patch 2 in a set of 3 all aiming to clean up and unify >>> the unsafe memory getters/setters, along with the handling of unsafe >>> access errors. The other two issues are: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8153890 - Handle unsafe access >>> error as an asynchronous exception >>> https://bugs.openjdk.java.net/browse/JDK-8150921 - Update Unsafe >>> getters/setters to use double-register variants >>> >>> >>> * Summary (copied from the bug description) >>> >>> >>> In certain cases, such as accessing a region of a memory mapped file >>> which has been truncated on unix-style operating systems, a SIGBUS >>> signal will be raised and the VM will process it in the signal handler. >>> >>> How the signal is processed differs depending on the operating system >>> and/or CPU architecture, with two major alternatives: >>> >>> * "stubless" >>> >>> Do the necessary thread state updates directly in the signal handler, >>> and modify the context so that the signal handler returns to the place >>> where the execution should continue >>> >>> * Using a stub >>> >>> Update the context so that when the signal handler returns the thread >>> will continue execution in a generated stub, which in turn will call >>> some native code in the VM to update the thread state and figure out >>> where execution should continue. The stub will then jump to that new >>> place. >>> >>> >>> It should be noted that the work of updating the thread state is very >>> small - it's setting a flag or two in the thread structure, and figures >>> out where the next instruction starts. It should also be noted that the >>> generated stubs today are broken, because they do not preserve all the >>> live registers over the call into the VM. There are two ways to address >>> this: >>> >>> * Preserve all the necessary registers >>> >>> This would mean implementing, in macro assembly, the necessary logic for >>> preserving all the live registers, including, but not limited to, >>> floating point registers, flag registers, etc. It quickly becomes >>> obvious that this platform specific and error prone. >>> >>> * Leverage the fact that the operating system already does this as part >>> of the signal handling >>> >>> Do the necessary work in the signal handler instead, removing the need >>> for the stub alltogether >>> >>> As mentioned, on some platforms the latter model is already in use. It >>> is dramatically easier and all platforms should be updated to do it the >>> same way. >>> >>> >>> * Testing >>> >>> Just as mentioned in the RFR for JDK-8153890, a new test was developed >>> to test this code path: >>> >>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java >>> >>> In fact, it was when running this test I found the register preservation >>> issue. JPRT also passes. Much like JDK-8153890 I wanted to get some >>> feedback on this before running additional tests. >>> >>> >>> Cheers, >>> Mikael >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.vidstedt at oracle.com Mon Apr 11 17:44:32 2016 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Mon, 11 Apr 2016 10:44:32 -0700 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: References: <570831BD.7080005@oracle.com> Message-ID: <570BE280.6060301@oracle.com> Yes, I asked myself the same thing when I started moving things around. It may be more appropriate to put it in sharedRuntime instead. Does anybody else have an opinion on where it should go? Cheers, Mikael On 4/11/2016 2:03 AM, Thomas St?fe wrote: > Hi Mikael, > > just a question, should the new stubless functions not live someplace > else instead of in stubRoutines_ ? After all, they are not stub > routines anymore. > > Kind Regards, Thomas > > On Sat, Apr 9, 2016 at 12:33 AM, Mikael Vidstedt > > wrote: > > > Please review: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 > Webrev: > http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ > > > * Note: this is patch 2 in a set of 3 all aiming to clean up and > unify the unsafe memory getters/setters, along with the handling > of unsafe access errors. The other two issues are: > > https://bugs.openjdk.java.net/browse/JDK-8153890 - Handle unsafe > access error as an asynchronous exception > https://bugs.openjdk.java.net/browse/JDK-8150921 - Update Unsafe > getters/setters to use double-register variants > > > * Summary (copied from the bug description) > > > In certain cases, such as accessing a region of a memory mapped > file which has been truncated on unix-style operating systems, a > SIGBUS signal will be raised and the VM will process it in the > signal handler. > > How the signal is processed differs depending on the operating > system and/or CPU architecture, with two major alternatives: > > * "stubless" > > Do the necessary thread state updates directly in the signal > handler, and modify the context so that the signal handler returns > to the place where the execution should continue > > * Using a stub > > Update the context so that when the signal handler returns the > thread will continue execution in a generated stub, which in turn > will call some native code in the VM to update the thread state > and figure out where execution should continue. The stub will then > jump to that new place. > > > It should be noted that the work of updating the thread state is > very small - it's setting a flag or two in the thread structure, > and figures out where the next instruction starts. It should also > be noted that the generated stubs today are broken, because they > do not preserve all the live registers over the call into the VM. > There are two ways to address this: > > * Preserve all the necessary registers > > This would mean implementing, in macro assembly, the necessary > logic for preserving all the live registers, including, but not > limited to, floating point registers, flag registers, etc. It > quickly becomes obvious that this platform specific and error prone. > > * Leverage the fact that the operating system already does this as > part of the signal handling > > Do the necessary work in the signal handler instead, removing the > need for the stub alltogether > > As mentioned, on some platforms the latter model is already in > use. It is dramatically easier and all platforms should be updated > to do it the same way. > > > * Testing > > Just as mentioned in the RFR for JDK-8153890, a new test was > developed to test this code path: > > http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java > > > In fact, it was when running this test I found the register > preservation issue. JPRT also passes. Much like JDK-8153890 I > wanted to get some feedback on this before running additional tests. > > > Cheers, > Mikael > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.thalinger at oracle.com Mon Apr 11 17:59:42 2016 From: christian.thalinger at oracle.com (Christian Thalinger) Date: Mon, 11 Apr 2016 07:59:42 -1000 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> Message-ID: [This should be on hotspot-runtime-dev. BCC?ing hotspot-compiler-dev.] > On Apr 8, 2016, at 12:53 AM, Hiroshi H Horii wrote: > > Dear all: > > Can I please request reviews for the following change? > This change was created for JDK 9 and ppc64. > > Description: > This change adds options of compare-and-exchange for POWER architecture. > As described in atomic_linux_ppc.inline.hpp, the current implementation of > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for > general purposes because twice calls of sync before and after cmpxchg will > keep consistency. However, they sometimes cause overheads because > sync instructions are very expensive in the current POWER chip design. > With this change, callers can explicitly specify to run fence and acquire with > two additional bool parameters. Because their default values are "true", > it is not necessary to modify existing cmpxchg calls. > > In addition, with the new parameters of cmpxchg, this change improves > performance of copy_to_survivor in the parallel GC. > copy_to_survivor changes forward pointers by using cmpxchg. This > operation doesn't require any sync instructions, in my understanding. > A pointer is changed at most once in a GC and when cmpxchg fails, > the latest pointer is available for the caller. > > When I evaluated SPECjbb2013 (slightly customized because obsolete grizzly > doesn't support new version format of Java 9), pause time of young GC was > reduced from 10% to 20%. > > Summary of source code changes: > > * src/share/vm/runtime/atomic.hpp > * src/share/vm/runtime/atomic.cpp > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > - Add two arguments of fence and acquire to cmpxchg only for PPC64. > Though cmpxchg in atomic_linux_ppc.inline.hpp has some branches, > they are reduced while inlining to callers. > > * src/share/vm/oops/oop.inline.hpp > - Changed cas_set_mark to call cmpxchg without fence and acquire. > cas_set_mark is called only by cas_forward_to that is called only by > copy_to_survivor_space and oop_promotion_failed in > psPromotionManager. > > Code change: > > Please see an attached diff file that was generated with "hg diff -g" > under the latest hotspot directory. > > Passed test: > SPECjbb2013 (customized) > > * I believe some other cmpxchg will be optimized by reducing fence > or acquire because twice calls of sync are too conservative to implement > Java memory model. > > > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64_cmpxchg_opt.diff Type: application/octet-stream Size: 8837 bytes Desc: not available URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlie.hunt at oracle.com Mon Apr 11 18:19:30 2016 From: charlie.hunt at oracle.com (charlie hunt) Date: Mon, 11 Apr 2016 13:19:30 -0500 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> Message-ID: <72593D96-F446-4A40-83AB-644E2B3A33EA@oracle.com> FYI, SPECjbb2013 is obsolete in favor of SPECjbb2015. SPECjbb2015 should run fine with JDK 9 in the default configuration with grizzly as the transport. I have run it on JDK 9 SPARC and JDK 9 x86/x64 platforms. hths, charlie > On Apr 11, 2016, at 12:59 PM, Christian Thalinger wrote: > > [This should be on hotspot-runtime-dev. BCC?ing hotspot-compiler-dev.] > >> On Apr 8, 2016, at 12:53 AM, Hiroshi H Horii wrote: >> >> Dear all: >> >> Can I please request reviews for the following change? >> This change was created for JDK 9 and ppc64. >> >> Description: >> This change adds options of compare-and-exchange for POWER architecture. >> As described in atomic_linux_ppc.inline.hpp, the current implementation of >> cmpxchg is fence_cmpxchg_acquire. This implementation is useful for >> general purposes because twice calls of sync before and after cmpxchg will >> keep consistency. However, they sometimes cause overheads because >> sync instructions are very expensive in the current POWER chip design. >> With this change, callers can explicitly specify to run fence and acquire with >> two additional bool parameters. Because their default values are "true", >> it is not necessary to modify existing cmpxchg calls. >> >> In addition, with the new parameters of cmpxchg, this change improves >> performance of copy_to_survivor in the parallel GC. >> copy_to_survivor changes forward pointers by using cmpxchg. This >> operation doesn't require any sync instructions, in my understanding. >> A pointer is changed at most once in a GC and when cmpxchg fails, >> the latest pointer is available for the caller. >> >> When I evaluated SPECjbb2013 (slightly customized because obsolete grizzly >> doesn't support new version format of Java 9), pause time of young GC was >> reduced from 10% to 20%. >> >> Summary of source code changes: >> >> * src/share/vm/runtime/atomic.hpp >> * src/share/vm/runtime/atomic.cpp >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >> - Add two arguments of fence and acquire to cmpxchg only for PPC64. >> Though cmpxchg in atomic_linux_ppc.inline.hpp has some branches, >> they are reduced while inlining to callers. >> >> * src/share/vm/oops/oop.inline.hpp >> - Changed cas_set_mark to call cmpxchg without fence and acquire. >> cas_set_mark is called only by cas_forward_to that is called only by >> copy_to_survivor_space and oop_promotion_failed in >> psPromotionManager. >> >> Code change: >> >> Please see an attached diff file that was generated with "hg diff -g" >> under the latest hotspot directory. >> >> Passed test: >> SPECjbb2013 (customized) >> >> * I believe some other cmpxchg will be optimized by reducing fence >> or acquire because twice calls of sync are too conservative to implement >> Java memory model. >> >> >> >> Regards, >> Hiroshi >> ----------------------- >> Hiroshi Horii, Ph.D. >> IBM Research - Tokyo >> > From mikael.vidstedt at oracle.com Mon Apr 11 18:52:39 2016 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Mon, 11 Apr 2016 11:52:39 -0700 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com> Message-ID: <570BF276.5000000@oracle.com> David/Thomas, Thanks for the early feedback. Some comments below. On 4/11/2016 2:15 AM, Thomas St?fe wrote: > > > On Mon, Apr 11, 2016 at 11:05 AM, Thomas St?fe > > wrote: > > Hi David, > > On Mon, Apr 11, 2016 at 2:57 AM, David Holmes > > wrote: > > Hi Mikael, > > I think we need to be able to answer the question as to why > the stubbed and stubless forms of this code exist to ensure > that converting all platforms to the same form is appropriate. > > I'm still going through this but my initial reaction is to > wonder why we don't use the same form of handle_unsafe_access > on all platforms and always pass in npc? (That seems to be the > only difference in code that otherwise seems platform > independent.) > Yes, in effect what the handler is supposed to do is call t->set_pending_unsafe_access_error() and update the context so that when the thread eventually starts executing again it will start on the next instruction, following the one that faulted. Given how trivial the handler is I can see no reason to make it go through a stub, and several reasons for handling it directly in the signal handler instead. Maybe I'm missing something, let me know if you think of anything! > > On Solaris we get the npc for free in the signal ucontext. On x86 > it has to be calculated. But yes, this could be moved out of the > handle functions and just passed in. > Since the function is called from multiple different places it seems appropriate to have a dedicated function for it, even though it's "just" doing those two things. It also means it can be shared across the different operating systems, within the same CPU architecture. As noted, SPARC is indeed the odd man out which needs to take the additional "npc" argument, but I really don't think that's a big issue in the grand scheme of things. > > I also saw that we apparently miss handling for ppc. No one seemed > to miss it until now, but it may make sense to add handling anyway. > > > Oh, we do not miss it. Volker just showed me that it is done directly > in the signal handlers for AIX and Linux ppc. Exactly. AIX/ppc, linux/ppc and linux/aarch64 all handle it directly in the signal handler. Cheers, Mikael > Kind Regards, Thomas > > > Thanks, > David > > > On 9/04/2016 8:33 AM, Mikael Vidstedt wrote: > > > Please review: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 > Webrev: > http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ > > > > * Note: this is patch 2 in a set of 3 all aiming to clean > up and unify > the unsafe memory getters/setters, along with the handling > of unsafe > access errors. The other two issues are: > > https://bugs.openjdk.java.net/browse/JDK-8153890 - Handle > unsafe access > error as an asynchronous exception > https://bugs.openjdk.java.net/browse/JDK-8150921 - Update > Unsafe > getters/setters to use double-register variants > > > * Summary (copied from the bug description) > > > In certain cases, such as accessing a region of a memory > mapped file > which has been truncated on unix-style operating systems, > a SIGBUS > signal will be raised and the VM will process it in the > signal handler. > > How the signal is processed differs depending on the > operating system > and/or CPU architecture, with two major alternatives: > > * "stubless" > > Do the necessary thread state updates directly in the > signal handler, > and modify the context so that the signal handler returns > to the place > where the execution should continue > > * Using a stub > > Update the context so that when the signal handler returns > the thread > will continue execution in a generated stub, which in turn > will call > some native code in the VM to update the thread state and > figure out > where execution should continue. The stub will then jump > to that new place. > > > It should be noted that the work of updating the thread > state is very > small - it's setting a flag or two in the thread > structure, and figures > out where the next instruction starts. It should also be > noted that the > generated stubs today are broken, because they do not > preserve all the > live registers over the call into the VM. There are two > ways to address > this: > > * Preserve all the necessary registers > > This would mean implementing, in macro assembly, the > necessary logic for > preserving all the live registers, including, but not > limited to, > floating point registers, flag registers, etc. It quickly > becomes > obvious that this platform specific and error prone. > > * Leverage the fact that the operating system already does > this as part > of the signal handling > > Do the necessary work in the signal handler instead, > removing the need > for the stub alltogether > > As mentioned, on some platforms the latter model is > already in use. It > is dramatically easier and all platforms should be updated > to do it the > same way. > > > * Testing > > Just as mentioned in the RFR for JDK-8153890, a new test > was developed > to test this code path: > > http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java > > > In fact, it was when running this test I found the > register preservation > issue. JPRT also passes. Much like JDK-8153890 I wanted to > get some > feedback on this before running additional tests. > > > Cheers, > Mikael > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Tue Apr 12 00:03:26 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 12 Apr 2016 10:03:26 +1000 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com> Message-ID: <570C3B4E.4080903@oracle.com> On 11/04/2016 7:15 PM, Thomas St?fe wrote: > > > On Mon, Apr 11, 2016 at 11:05 AM, Thomas St?fe > wrote: > > Hi David, > > On Mon, Apr 11, 2016 at 2:57 AM, David Holmes > > wrote: > > Hi Mikael, > > I think we need to be able to answer the question as to why the > stubbed and stubless forms of this code exist to ensure that > converting all platforms to the same form is appropriate. > > I'm still going through this but my initial reaction is to > wonder why we don't use the same form of handle_unsafe_access on > all platforms and always pass in npc? (That seems to be the only > difference in code that otherwise seems platform independent.) > > > On Solaris we get the npc for free in the signal ucontext. On x86 it > has to be calculated. But yes, this could be moved out of the handle > functions and just passed in. > > I also saw that we apparently miss handling for ppc. No one seemed > to miss it until now, but it may make sense to add handling anyway. > > > Oh, we do not miss it. Volker just showed me that it is done directly > in the signal handlers for AIX and Linux ppc. So is there more scope to refactor this out of the platform specific code altogether? David > > Kind Regards, Thomas > > > Thanks, > David > > > On 9/04/2016 8:33 AM, Mikael Vidstedt wrote: > > > Please review: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 > Webrev: > http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ > > > * Note: this is patch 2 in a set of 3 all aiming to clean up > and unify > the unsafe memory getters/setters, along with the handling > of unsafe > access errors. The other two issues are: > > https://bugs.openjdk.java.net/browse/JDK-8153890 - Handle > unsafe access > error as an asynchronous exception > https://bugs.openjdk.java.net/browse/JDK-8150921 - Update Unsafe > getters/setters to use double-register variants > > > * Summary (copied from the bug description) > > > In certain cases, such as accessing a region of a memory > mapped file > which has been truncated on unix-style operating systems, a > SIGBUS > signal will be raised and the VM will process it in the > signal handler. > > How the signal is processed differs depending on the > operating system > and/or CPU architecture, with two major alternatives: > > * "stubless" > > Do the necessary thread state updates directly in the signal > handler, > and modify the context so that the signal handler returns to > the place > where the execution should continue > > * Using a stub > > Update the context so that when the signal handler returns > the thread > will continue execution in a generated stub, which in turn > will call > some native code in the VM to update the thread state and > figure out > where execution should continue. The stub will then jump to > that new place. > > > It should be noted that the work of updating the thread > state is very > small - it's setting a flag or two in the thread structure, > and figures > out where the next instruction starts. It should also be > noted that the > generated stubs today are broken, because they do not > preserve all the > live registers over the call into the VM. There are two ways > to address > this: > > * Preserve all the necessary registers > > This would mean implementing, in macro assembly, the > necessary logic for > preserving all the live registers, including, but not > limited to, > floating point registers, flag registers, etc. It quickly > becomes > obvious that this platform specific and error prone. > > * Leverage the fact that the operating system already does > this as part > of the signal handling > > Do the necessary work in the signal handler instead, > removing the need > for the stub alltogether > > As mentioned, on some platforms the latter model is already > in use. It > is dramatically easier and all platforms should be updated > to do it the > same way. > > > * Testing > > Just as mentioned in the RFR for JDK-8153890, a new test was > developed > to test this code path: > > http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java > > In fact, it was when running this test I found the register > preservation > issue. JPRT also passes. Much like JDK-8153890 I wanted to > get some > feedback on this before running additional tests. > > > Cheers, > Mikael > > > From david.holmes at oracle.com Tue Apr 12 00:29:51 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 12 Apr 2016 10:29:51 +1000 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: <570AF68B.9090707@oracle.com> References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com> Message-ID: <570C417F.20600@oracle.com> On 11/04/2016 10:57 AM, David Holmes wrote: > Hi Mikael, > > I think we need to be able to answer the question as to why the stubbed > and stubless forms of this code exist to ensure that converting all > platforms to the same form is appropriate. The more I look at this the more the stubs make no sense :) AIII a stub is generated when we need runtime code that may be different to that which we could write directly for compiling at build time - ie to use CPU specific features of the actual CPU. But I see nothing here that suggests any such usage. So I agree with removing the stubs. > I'm still going through this but my initial reaction is to wonder why we > don't use the same form of handle_unsafe_access on all platforms and > always pass in npc? (That seems to be the only difference in code that > otherwise seems platform independent.) Futher to this and Thomas's comments I think handle_unsafe_access(thread, pc, npc) can be defined in shared code (where? not sure). Further, if we always pass in npc then we don't need to pass in pc as it is unused (seems unused in original code too for sparc). BTW I found this comment somewhat unfathomable (both now and in original code): + // pc is the instruction which we must emulate + // doing a no-op is fine: return garbage from the load but finally realized that it means that after the load that raised the signal the native code proceeds normally but the value apparently loaded is just garbage/arbitrary, and the only sign something went wrong is the setting of the pending unsafe-access-error bit. This would be a potential source of bugs I think, except that when we hit the Java level, we throw the exception and so never actually "return" the garbage value. But it does mean we would have to be careful if calling the unsafe routines from native code. Thanks, David > Thanks, > David > > On 9/04/2016 8:33 AM, Mikael Vidstedt wrote: >> >> Please review: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 >> Webrev: >> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ >> >> >> >> * Note: this is patch 2 in a set of 3 all aiming to clean up and unify >> the unsafe memory getters/setters, along with the handling of unsafe >> access errors. The other two issues are: >> >> https://bugs.openjdk.java.net/browse/JDK-8153890 - Handle unsafe access >> error as an asynchronous exception >> https://bugs.openjdk.java.net/browse/JDK-8150921 - Update Unsafe >> getters/setters to use double-register variants >> >> >> * Summary (copied from the bug description) >> >> >> In certain cases, such as accessing a region of a memory mapped file >> which has been truncated on unix-style operating systems, a SIGBUS >> signal will be raised and the VM will process it in the signal handler. >> >> How the signal is processed differs depending on the operating system >> and/or CPU architecture, with two major alternatives: >> >> * "stubless" >> >> Do the necessary thread state updates directly in the signal handler, >> and modify the context so that the signal handler returns to the place >> where the execution should continue >> >> * Using a stub >> >> Update the context so that when the signal handler returns the thread >> will continue execution in a generated stub, which in turn will call >> some native code in the VM to update the thread state and figure out >> where execution should continue. The stub will then jump to that new >> place. >> >> >> It should be noted that the work of updating the thread state is very >> small - it's setting a flag or two in the thread structure, and figures >> out where the next instruction starts. It should also be noted that the >> generated stubs today are broken, because they do not preserve all the >> live registers over the call into the VM. There are two ways to address >> this: >> >> * Preserve all the necessary registers >> >> This would mean implementing, in macro assembly, the necessary logic for >> preserving all the live registers, including, but not limited to, >> floating point registers, flag registers, etc. It quickly becomes >> obvious that this platform specific and error prone. >> >> * Leverage the fact that the operating system already does this as part >> of the signal handling >> >> Do the necessary work in the signal handler instead, removing the need >> for the stub alltogether >> >> As mentioned, on some platforms the latter model is already in use. It >> is dramatically easier and all platforms should be updated to do it the >> same way. >> >> >> * Testing >> >> Just as mentioned in the RFR for JDK-8153890, a new test was developed >> to test this code path: >> >> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java >> >> In fact, it was when running this test I found the register preservation >> issue. JPRT also passes. Much like JDK-8153890 I wanted to get some >> feedback on this before running additional tests. >> >> >> Cheers, >> Mikael >> From thomas.stuefe at gmail.com Tue Apr 12 09:15:37 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 12 Apr 2016 11:15:37 +0200 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: <570C417F.20600@oracle.com> References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com> <570C417F.20600@oracle.com> Message-ID: Hi Mikael, David, On Tue, Apr 12, 2016 at 2:29 AM, David Holmes wrote: > On 11/04/2016 10:57 AM, David Holmes wrote: > >> Hi Mikael, >> >> I think we need to be able to answer the question as to why the stubbed >> and stubless forms of this code exist to ensure that converting all >> platforms to the same form is appropriate. >> > > The more I look at this the more the stubs make no sense :) AIII a stub is > generated when we need runtime code that may be different to that which we > could write directly for compiling at build time - ie to use CPU specific > features of the actual CPU. But I see nothing here that suggests any such > usage. > > So I agree with removing the stubs. > > I'm still going through this but my initial reaction is to wonder why we >> don't use the same form of handle_unsafe_access on all platforms and >> always pass in npc? (That seems to be the only difference in code that >> otherwise seems platform independent.) >> > > Futher to this and Thomas's comments I think handle_unsafe_access(thread, > pc, npc) can be defined in shared code (where? not sure). Further, if we > always pass in npc then we don't need to pass in pc as it is unused (seems > unused in original code too for sparc). > > I agree. We commonized ucontext_set_pc for all Posix platforms, so we can make a common function "handle_unsafe_access(thread, npc)" and inside use os::Posix::ucontext_set_pc to modify the context. Then we can get rid of the special handling in the signal handlers inside os_aix_ppc.cpp and os_linux_ppc.cpp (for both the compiled and the interpreted case). BTW I found this comment somewhat unfathomable (both now and in original > code): > > + // pc is the instruction which we must emulate > + // doing a no-op is fine: return garbage from the load > > but finally realized that it means that after the load that raised the > signal the native code proceeds normally but the value apparently loaded is > just garbage/arbitrary, and the only sign something went wrong is the > setting of the pending unsafe-access-error bit. This would be a potential > source of bugs I think, except that when we hit the Java level, we throw > the exception and so never actually "return" the garbage value. But it does > mean we would have to be careful if calling the unsafe routines from native > code. > > I admit I do not understand fully how the _special_runtime_exit_condition flag is processed later, at least not for all cases: If I have a java method A using sun.misc.unsafe, which gets compiled, the sun.misc.unsafe intrinsic gets inlined into that method. So, the whole method A gets marked as "has unsafe access"? So, any SIGBUS happening inside this method - which may be larger than the inlined sun.misc.unsafe call - will yield an InternalError? And when is the flag checked if that method A is called from another java method B? Sorry if the questions are stupid, I am not a JIT expert, but I try to understand how much can happen between the SIGBUS and the InternalError getting thrown. Thanks, Thomas > Thanks, > David > > > Thanks, >> David >> >> On 9/04/2016 8:33 AM, Mikael Vidstedt wrote: >> >>> >>> Please review: >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 >>> Webrev: >>> >>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ >>> >>> >>> >>> * Note: this is patch 2 in a set of 3 all aiming to clean up and unify >>> the unsafe memory getters/setters, along with the handling of unsafe >>> access errors. The other two issues are: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8153890 - Handle unsafe access >>> error as an asynchronous exception >>> https://bugs.openjdk.java.net/browse/JDK-8150921 - Update Unsafe >>> getters/setters to use double-register variants >>> >>> >>> * Summary (copied from the bug description) >>> >>> >>> In certain cases, such as accessing a region of a memory mapped file >>> which has been truncated on unix-style operating systems, a SIGBUS >>> signal will be raised and the VM will process it in the signal handler. >>> >>> How the signal is processed differs depending on the operating system >>> and/or CPU architecture, with two major alternatives: >>> >>> * "stubless" >>> >>> Do the necessary thread state updates directly in the signal handler, >>> and modify the context so that the signal handler returns to the place >>> where the execution should continue >>> >>> * Using a stub >>> >>> Update the context so that when the signal handler returns the thread >>> will continue execution in a generated stub, which in turn will call >>> some native code in the VM to update the thread state and figure out >>> where execution should continue. The stub will then jump to that new >>> place. >>> >>> >>> It should be noted that the work of updating the thread state is very >>> small - it's setting a flag or two in the thread structure, and figures >>> out where the next instruction starts. It should also be noted that the >>> generated stubs today are broken, because they do not preserve all the >>> live registers over the call into the VM. There are two ways to address >>> this: >>> >>> * Preserve all the necessary registers >>> >>> This would mean implementing, in macro assembly, the necessary logic for >>> preserving all the live registers, including, but not limited to, >>> floating point registers, flag registers, etc. It quickly becomes >>> obvious that this platform specific and error prone. >>> >>> * Leverage the fact that the operating system already does this as part >>> of the signal handling >>> >>> Do the necessary work in the signal handler instead, removing the need >>> for the stub alltogether >>> >>> As mentioned, on some platforms the latter model is already in use. It >>> is dramatically easier and all platforms should be updated to do it the >>> same way. >>> >>> >>> * Testing >>> >>> Just as mentioned in the RFR for JDK-8153890, a new test was developed >>> to test this code path: >>> >>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java >>> >>> In fact, it was when running this test I found the register preservation >>> issue. JPRT also passes. Much like JDK-8153890 I wanted to get some >>> feedback on this before running additional tests. >>> >>> >>> Cheers, >>> Mikael >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Sat Apr 16 07:41:44 2016 From: david.holmes at oracle.com (David Holmes) Date: Sat, 16 Apr 2016 17:41:44 +1000 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> Message-ID: <5711ECB8.5090100@oracle.com> Hi Hiroshi, As the diff file does not survive the mail process I can't see the actual proposed changes. There doesn't seem to be a bug for this so could you please file one. Also can you get someone to host the webrev for you on cr.openjdk.java.net? Or else include the diff in the bug report. It is fine for ppc to have variations of cmpxchg with different memory barrier semantics, but the shared API must not be affected as there is a requirement that the basic form of this operation provide "full bi-directional fence" semantics. Note that these semantics are not in place to fulfill Java Memory Model requirements, but are an internal contract in hotspot. Thanks, David On 12/04/2016 3:59 AM, Christian Thalinger wrote: > [This should be on hotspot-runtime-dev. BCC?ing hotspot-compiler-dev.] > >> On Apr 8, 2016, at 12:53 AM, Hiroshi H Horii wrote: >> >> Dear all: >> >> Can I please request reviews for the following change? >> This change was created for JDK 9 and ppc64. >> >> Description: >> This change adds options of compare-and-exchange for POWER architecture. >> As described in atomic_linux_ppc.inline.hpp, the current implementation of >> cmpxchg is fence_cmpxchg_acquire. This implementation is useful for >> general purposes because twice calls of sync before and after cmpxchg will >> keep consistency. However, they sometimes cause overheads because >> sync instructions are very expensive in the current POWER chip design. >> With this change, callers can explicitly specify to run fence and acquire with >> two additional bool parameters. Because their default values are "true", >> it is not necessary to modify existing cmpxchg calls. >> >> In addition, with the new parameters of cmpxchg, this change improves >> performance of copy_to_survivor in the parallel GC. >> copy_to_survivor changes forward pointers by using cmpxchg. This >> operation doesn't require any sync instructions, in my understanding. >> A pointer is changed at most once in a GC and when cmpxchg fails, >> the latest pointer is available for the caller. >> >> When I evaluated SPECjbb2013 (slightly customized because obsolete grizzly >> doesn't support new version format of Java 9), pause time of young GC was >> reduced from 10% to 20%. >> >> Summary of source code changes: >> >> * src/share/vm/runtime/atomic.hpp >> * src/share/vm/runtime/atomic.cpp >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >> - Add two arguments of fence and acquire to cmpxchg only for PPC64. >> Though cmpxchg in atomic_linux_ppc.inline.hpp has some branches, >> they are reduced while inlining to callers. >> >> * src/share/vm/oops/oop.inline.hpp >> - Changed cas_set_mark to call cmpxchg without fence and acquire. >> cas_set_mark is called only by cas_forward_to that is called only by >> copy_to_survivor_space and oop_promotion_failed in >> psPromotionManager. >> >> Code change: >> >> Please see an attached diff file that was generated with "hg diff -g" >> under the latest hotspot directory. >> >> Passed test: >> SPECjbb2013 (customized) >> >> * I believe some other cmpxchg will be optimized by reducing fence >> or acquire because twice calls of sync are too conservative to implement >> Java memory model. >> >> >> >> Regards, >> Hiroshi >> ----------------------- >> Hiroshi Horii, Ph.D. >> IBM Research - Tokyo >> > From david.holmes at oracle.com Sat Apr 16 07:43:20 2016 From: david.holmes at oracle.com (David Holmes) Date: Sat, 16 Apr 2016 17:43:20 +1000 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> Message-ID: <5711ED18.7000706@oracle.com> Hi Hiroshi, As the diff file does not survive the mail process I can't see the actual proposed changes. There doesn't seem to be a bug for this so could you please file one. Also can you get someone to host the webrev for you on cr.openjdk.java.net? Or else include the diff in the bug report. It is fine for ppc to have variations of cmpxchg with different memory barrier semantics, but the shared API must not be affected as there is a requirement that the basic form of this operation provide "full bi-directional fence" semantics. Note that these semantics are not in place to fulfill Java Memory Model requirements, but are an internal contract in hotspot. Thanks, David On 12/04/2016 3:59 AM, Christian Thalinger wrote: > [This should be on hotspot-runtime-dev. BCC?ing hotspot-compiler-dev.] > >> On Apr 8, 2016, at 12:53 AM, Hiroshi H Horii wrote: >> >> Dear all: >> >> Can I please request reviews for the following change? >> This change was created for JDK 9 and ppc64. >> >> Description: >> This change adds options of compare-and-exchange for POWER architecture. >> As described in atomic_linux_ppc.inline.hpp, the current implementation of >> cmpxchg is fence_cmpxchg_acquire. This implementation is useful for >> general purposes because twice calls of sync before and after cmpxchg will >> keep consistency. However, they sometimes cause overheads because >> sync instructions are very expensive in the current POWER chip design. >> With this change, callers can explicitly specify to run fence and acquire with >> two additional bool parameters. Because their default values are "true", >> it is not necessary to modify existing cmpxchg calls. >> >> In addition, with the new parameters of cmpxchg, this change improves >> performance of copy_to_survivor in the parallel GC. >> copy_to_survivor changes forward pointers by using cmpxchg. This >> operation doesn't require any sync instructions, in my understanding. >> A pointer is changed at most once in a GC and when cmpxchg fails, >> the latest pointer is available for the caller. >> >> When I evaluated SPECjbb2013 (slightly customized because obsolete grizzly >> doesn't support new version format of Java 9), pause time of young GC was >> reduced from 10% to 20%. >> >> Summary of source code changes: >> >> * src/share/vm/runtime/atomic.hpp >> * src/share/vm/runtime/atomic.cpp >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >> - Add two arguments of fence and acquire to cmpxchg only for PPC64. >> Though cmpxchg in atomic_linux_ppc.inline.hpp has some branches, >> they are reduced while inlining to callers. >> >> * src/share/vm/oops/oop.inline.hpp >> - Changed cas_set_mark to call cmpxchg without fence and acquire. >> cas_set_mark is called only by cas_forward_to that is called only by >> copy_to_survivor_space and oop_promotion_failed in >> psPromotionManager. >> >> Code change: >> >> Please see an attached diff file that was generated with "hg diff -g" >> under the latest hotspot directory. >> >> Passed test: >> SPECjbb2013 (customized) >> >> * I believe some other cmpxchg will be optimized by reducing fence >> or acquire because twice calls of sync are too conservative to implement >> Java memory model. >> >> >> >> Regards, >> Hiroshi >> ----------------------- >> Hiroshi Horii, Ph.D. >> IBM Research - Tokyo >> > From HORII at jp.ibm.com Mon Apr 18 02:15:17 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Mon, 18 Apr 2016 02:15:17 +0000 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <5711ED18.7000706@oracle.com> References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> <5711ED18.7000706@oracle.com> Message-ID: <201604180215.u3I2FOH7026854@d19av06.sagamino.japan.ibm.com> Hi David, Thank you for your replying and sorry that I didn't append my diff file when the discussion was forwarded to this mailing list. I appended my original diff file (hg diff -g) to this mail. > It is fine for ppc to have variations of cmpxchg with different memory > barrier semantics, but the shared API must not be affected as there is a > requirement that the basic form of this operation provide "full > bi-directional fence" semantics. Note that these semantics are not in > place to fulfill Java Memory Model requirements, but are an internal > contract in hotspot. Sure. Probably, it is better for me to modify my patch because it changes the internal contract. I will create a new patch that adds new cmpxchg functions for ppc. > Also can you get someone to host the webrev > for you on cr.openjdk.java.net? Or else include the diff in the bug report. I will ask someone to create webrev after my next patch is created. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo David Holmes wrote on 04/16/2016 16:43:20: > From: David Holmes > To: Christian Thalinger , Hiroshi H > Horii/Japan/IBM at IBMJP > Cc: Tim Ellison , ppc-aix-port- > dev at openjdk.java.net, hotspot-runtime-dev at openjdk.java.net > Date: 04/16/2016 16:46 > Subject: Re: enhancement of cmpxchg and copy_to_survivor for ppc64 > > Hi Hiroshi, > > As the diff file does not survive the mail process I can't see the > actual proposed changes. There doesn't seem to be a bug for this so > could you please file one. Also can you get someone to host the webrev > for you on cr.openjdk.java.net? Or else include the diff in the bug report. > > It is fine for ppc to have variations of cmpxchg with different memory > barrier semantics, but the shared API must not be affected as there is a > requirement that the basic form of this operation provide "full > bi-directional fence" semantics. Note that these semantics are not in > place to fulfill Java Memory Model requirements, but are an internal > contract in hotspot. > > Thanks, > David > > On 12/04/2016 3:59 AM, Christian Thalinger wrote: > > [This should be on hotspot-runtime-dev. BCC?ing hotspot-compiler-dev.] > > > >> On Apr 8, 2016, at 12:53 AM, Hiroshi H Horii wrote: > >> > >> Dear all: > >> > >> Can I please request reviews for the following change? > >> This change was created for JDK 9 and ppc64. > >> > >> Description: > >> This change adds options of compare-and-exchange for POWER architecture. > >> As described in atomic_linux_ppc.inline.hpp, the current implementation of > >> cmpxchg is fence_cmpxchg_acquire. This implementation is useful for > >> general purposes because twice calls of sync before and after cmpxchg will > >> keep consistency. However, they sometimes cause overheads because > >> sync instructions are very expensive in the current POWER chip design. > >> With this change, callers can explicitly specify to run fence and > acquire with > >> two additional bool parameters. Because their default values are "true", > >> it is not necessary to modify existing cmpxchg calls. > >> > >> In addition, with the new parameters of cmpxchg, this change improves > >> performance of copy_to_survivor in the parallel GC. > >> copy_to_survivor changes forward pointers by using cmpxchg. This > >> operation doesn't require any sync instructions, in my understanding. > >> A pointer is changed at most once in a GC and when cmpxchg fails, > >> the latest pointer is available for the caller. > >> > >> When I evaluated SPECjbb2013 (slightly customized because obsolete grizzly > >> doesn't support new version format of Java 9), pause time of young GC was > >> reduced from 10% to 20%. > >> > >> Summary of source code changes: > >> > >> * src/share/vm/runtime/atomic.hpp > >> * src/share/vm/runtime/atomic.cpp > >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > >> - Add two arguments of fence and acquire to cmpxchg only for PPC64. > >> Though cmpxchg in atomic_linux_ppc.inline.hpp has some branches, > >> they are reduced while inlining to callers. > >> > >> * src/share/vm/oops/oop.inline.hpp > >> - Changed cas_set_mark to call cmpxchg without fence and acquire. > >> cas_set_mark is called only by cas_forward_to that is > called only by > >> copy_to_survivor_space and oop_promotion_failed in > >> psPromotionManager. > >> > >> Code change: > >> > >> Please see an attached diff file that was generated with "hg diff -g" > >> under the latest hotspot directory. > >> > >> Passed test: > >> SPECjbb2013 (customized) > >> > >> * I believe some other cmpxchg will be optimized by reducing fence > >> or acquire because twice calls of sync are too conservative toimplement > >> Java memory model. > >> > >> > >> > >> Regards, > >> Hiroshi > >> ----------------------- > >> Hiroshi Horii, Ph.D. > >> IBM Research - Tokyo > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ppc64_cmpxchg_opt.diff Type: application/octet-stream Size: 8837 bytes Desc: not available URL: From david.holmes at oracle.com Mon Apr 18 04:38:55 2016 From: david.holmes at oracle.com (David Holmes) Date: Mon, 18 Apr 2016 14:38:55 +1000 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <201604180215.u3I2FUZi001650@d19av07.sagamino.japan.ibm.com> References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> <5711ED18.7000706@oracle.com> <201604180215.u3I2FUZi001650@d19av07.sagamino.japan.ibm.com> Message-ID: <571464DF.3070706@oracle.com> Hi Hiroshi, On 18/04/2016 12:15 PM, Hiroshi H Horii wrote: > Hi David, > > Thank you for your replying and sorry that I didn't append my diff file > when the discussion was forwarded to this mailing list. I appended my > original diff file (hg diff -g) to this mail. > > > It is fine for ppc to have variations of cmpxchg with different memory > > barrier semantics, but the shared API must not be affected as there is a > > requirement that the basic form of this operation provide "full > > bi-directional fence" semantics. Note that these semantics are not in > > place to fulfill Java Memory Model requirements, but are an internal > > contract in hotspot. > > Sure. Probably, it is better for me to modify my patch because it > changes the internal contract. I will create a new patch that adds new > cmpxchg functions for ppc. I think this is only usable from PPC specific code, not from the shared code as per your original patch. The oopDesc::cas_set_mark may be written to expect the full bi-directional fence that is required by the atomic.hpp contract. If we break that contract we would have to prove correctness along all code paths using that code - well the ppc64 folk would have to do that :). But I would object to the platform-specific code in the shared file - sorry. Thanks, David > > Also can you get someone to host the webrev > > for you on cr.openjdk.java.net? Or else include the diff in the bug > report. > > I will ask someone to create webrev after my next patch is created. > > > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > David Holmes wrote on 04/16/2016 16:43:20: > > > From: David Holmes > > To: Christian Thalinger , Hiroshi H > > Horii/Japan/IBM at IBMJP > > Cc: Tim Ellison , ppc-aix-port- > > dev at openjdk.java.net, hotspot-runtime-dev at openjdk.java.net > > Date: 04/16/2016 16:46 > > Subject: Re: enhancement of cmpxchg and copy_to_survivor for ppc64 > > > > Hi Hiroshi, > > > > As the diff file does not survive the mail process I can't see the > > actual proposed changes. There doesn't seem to be a bug for this so > > could you please file one. Also can you get someone to host the webrev > > for you on cr.openjdk.java.net? Or else include the diff in the bug > report. > > > > It is fine for ppc to have variations of cmpxchg with different memory > > barrier semantics, but the shared API must not be affected as there is a > > requirement that the basic form of this operation provide "full > > bi-directional fence" semantics. Note that these semantics are not in > > place to fulfill Java Memory Model requirements, but are an internal > > contract in hotspot. > > > > Thanks, > > David > > > > On 12/04/2016 3:59 AM, Christian Thalinger wrote: > > > [This should be on hotspot-runtime-dev. BCC?ing hotspot-compiler-dev.] > > > > > >> On Apr 8, 2016, at 12:53 AM, Hiroshi H Horii wrote: > > >> > > >> Dear all: > > >> > > >> Can I please request reviews for the following change? > > >> This change was created for JDK 9 and ppc64. > > >> > > >> Description: > > >> This change adds options of compare-and-exchange for POWER > architecture. > > >> As described in atomic_linux_ppc.inline.hpp, the current > implementation of > > >> cmpxchg is fence_cmpxchg_acquire. This implementation is useful for > > >> general purposes because twice calls of sync before and after > cmpxchg will > > >> keep consistency. However, they sometimes cause overheads because > > >> sync instructions are very expensive in the current POWER chip design. > > >> With this change, callers can explicitly specify to run fence and > > acquire with > > >> two additional bool parameters. Because their default values are > "true", > > >> it is not necessary to modify existing cmpxchg calls. > > >> > > >> In addition, with the new parameters of cmpxchg, this change improves > > >> performance of copy_to_survivor in the parallel GC. > > >> copy_to_survivor changes forward pointers by using cmpxchg. This > > >> operation doesn't require any sync instructions, in my understanding. > > >> A pointer is changed at most once in a GC and when cmpxchg fails, > > >> the latest pointer is available for the caller. > > >> > > >> When I evaluated SPECjbb2013 (slightly customized because obsolete > grizzly > > >> doesn't support new version format of Java 9), pause time of young > GC was > > >> reduced from 10% to 20%. > > >> > > >> Summary of source code changes: > > >> > > >> * src/share/vm/runtime/atomic.hpp > > >> * src/share/vm/runtime/atomic.cpp > > >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > >> - Add two arguments of fence and acquire to cmpxchg only > for PPC64. > > >> Though cmpxchg in atomic_linux_ppc.inline.hpp has some > branches, > > >> they are reduced while inlining to callers. > > >> > > >> * src/share/vm/oops/oop.inline.hpp > > >> - Changed cas_set_mark to call cmpxchg without fence and > acquire. > > >> cas_set_mark is called only by cas_forward_to that is > > called only by > > >> copy_to_survivor_space and oop_promotion_failed in > > >> psPromotionManager. > > >> > > >> Code change: > > >> > > >> Please see an attached diff file that was generated with "hg > diff -g" > > >> under the latest hotspot directory. > > >> > > >> Passed test: > > >> SPECjbb2013 (customized) > > >> > > >> * I believe some other cmpxchg will be optimized by reducing fence > > >> or acquire because twice calls of sync are too conservative > toimplement > > >> Java memory model. > > >> > > >> > > >> > > >> Regards, > > >> Hiroshi > > >> ----------------------- > > >> Hiroshi Horii, Ph.D. > > >> IBM Research - Tokyo > > >> > > > > > > From volker.simonis at gmail.com Mon Apr 18 16:23:25 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 18 Apr 2016 18:23:25 +0200 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <5714E416.6030300@redhat.com> References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> <5711ED18.7000706@oracle.com> <201604180215.u3I2FUZi001650@d19av07.sagamino.japan.ibm.com> <571464DF.3070706@oracle.com> <5714E416.6030300@redhat.com> Message-ID: We've looked at the proposed changes and we are pretty sure that the cmpxchg done during copy_to_survivor_space in the parallel GC doesn't require the full fence/acquire semantics. But we also agree that this should not be ifdefed PPC64 in shared code. Andrews suggestion of using the new C++11 atomic memory operators is good, although in practice it may be hard to get all the different compilers under the hood. But now that we've even got the corresponding cmpxchg routines with various acquire/release semantics in Java-land in the new jdk.internal.Unsafe package, it would be a pity if it would not be possible to use that functionality within the Hotspot. I think one approach to enable an easy transition would be to do the proposed enhancements (or something similar) to cmpxchg unconditionally in atomic.hpp. For example instead of two extra boolean parameters we could use an enum similar to the one in library_call.cpp: typedef enum { Relaxed, Opaque, Volatile, Acquire, Release } AccessKind; The default value of this parameter should of course be conservative (i.e. Volatile) so we don't change the current behavior. After that individual, performance critical callers of these routines can be examined if they really require the most conservative setting and maybe optimized. What do you think? Regards, Martin and Volker On Mon, Apr 18, 2016 at 3:41 PM, Andrew Haley wrote: > On 04/18/2016 02:01 PM, Carsten Varming wrote: >> An important question is: Should the shared parts of hotspot move towards >> weaker memory models? If yes, then everybody should review code assuming >> the weaker semantics. If no, then there really isn't room for patches like >> this one :(. > > This would surely be useful. For example, the bitmap marking uses a > two-way acquire and release barrier at the moment, and I'm fairly sure > we don't need that. > > I don't think this change should be #ifdef PPC64. That disadvantages > other targets such as AArch64, to no advantage. I understand that > moving this to shared code requires more work, but we should do at > least some of it in the JDK9 timeframe. > > C++11 has a considerably greater variety of atomic memory operators > than the ones in HotSpot. Over time I believe we should migrate to > C++11-like operators in our code base. One way to do this would be to > create new operators which map in a simple way onto the standard ones. > The we can get rid of much of this inline assembly code. > > Andrew. From david.holmes at oracle.com Tue Apr 19 00:03:39 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 19 Apr 2016 10:03:39 +1000 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> <5711ED18.7000706@oracle.com> <201604180215.u3I2FUZi001650@d19av07.sagamino.japan.ibm.com> <571464DF.3070706@oracle.com> Message-ID: <571575DB.9090009@oracle.com> Hi Carsten, On 18/04/2016 11:01 PM, Carsten Varming wrote: > Dear Hiroshi and David, > > I took a brief look at the patch and the idea looks good to me, in fact, > a bunch of the garbage collectors could benefit from similar changes (G1 > and ParNew both use oopDesc::forward_to_atomic which does not call > cas_set_mark, and they could both benefit from the weaker semantics). > > An important question is: Should the shared parts of hotspot move > towards weaker memory models? If yes, then everybody should review code > assuming the weaker semantics. If no, then there really isn't room for > patches like this one :(. The shared code should be written in a form that is correct on the weakest memory model we support - always using the appropriate Atomic and/or OrderAccess operations. On individual platforms those operations then reduce - sometimes to no-ops - based on the platform semantics. But that also requires we have a comprehensive set of Atomic and/or OrderAccess operations to choose from, so that we don't define algorithms that are unnecessarily excessive in their use of memory barriers. That said, establishing that a given algorithm+data-structure can be coded in a more relaxed form is a significant challenge in terms of "proving" correctness and adequate testing. I expect conservatism to rule in many places. More on Volker's response ... Thanks, David > The patch needs some work (IMHO). The name cas_set_mark must be renamed > to reflect the weaker semantics, and the ifdefs in the shared code > should be removed. All platforms should be changed as the Atomic API is > extended, and platforms with weaker memory models should implement the > weaker semantics. Finally, oopDesc::forward_to_atomic should use the > weaker semantics, preferable by calling cas_set_mark instead of calling > into Atomic directly. > > BTW. what is going on in oopDesc::forward_to_atomic? If curMark != > oldMark, then curMark must have been marked (a similar fact is checked > by "assert(is_forwarded())" (modulo a short race)), oldMark is set to > curMark and then oldMark->is_marked is checked in the look condition. > Why is there a loop? It seems much simpler if the assert is changed to a > "guarantee(curMark->is_marked)" and the loop is changed to an if > statement. Oh, and an assert is probably sufficient. > > My 2 cents, > Carsten > > On Mon, Apr 18, 2016 at 12:38 AM, David Holmes > wrote: > > Hi Hiroshi, > > On 18/04/2016 12:15 PM, Hiroshi H Horii wrote: > > Hi David, > > Thank you for your replying and sorry that I didn't append my > diff file > when the discussion was forwarded to this mailing list. I > appended my > original diff file (hg diff -g) to this mail. > > > It is fine for ppc to have variations of cmpxchg with > different memory > > barrier semantics, but the shared API must not be affected > as there is a > > requirement that the basic form of this operation provide "full > > bi-directional fence" semantics. Note that these semantics > are not in > > place to fulfill Java Memory Model requirements, but are an > internal > > contract in hotspot. > > Sure. Probably, it is better for me to modify my patch because it > changes the internal contract. I will create a new patch that > adds new > cmpxchg functions for ppc. > > > I think this is only usable from PPC specific code, not from the > shared code as per your original patch. The oopDesc::cas_set_mark > may be written to expect the full bi-directional fence that is > required by the atomic.hpp contract. If we break that contract we > would have to prove correctness along all code paths using that code > - well the ppc64 folk would have to do that :). But I would object > to the platform-specific code in the shared file - sorry. > > Thanks, > David > > > Also can you get someone to host the webrev > > for you on cr.openjdk.java.net ? > Or else include the diff in the bug > report. > > I will ask someone to create webrev after my next patch is created. > > > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > David Holmes > wrote on 04/16/2016 16:43:20: > > > From: David Holmes > > > To: Christian Thalinger >, Hiroshi H > > Horii/Japan/IBM at IBMJP > > Cc: Tim Ellison >, ppc-aix-port- > > dev at openjdk.java.net , > hotspot-runtime-dev at openjdk.java.net > > > Date: 04/16/2016 16:46 > > Subject: Re: enhancement of cmpxchg and copy_to_survivor for > ppc64 > > > > > Hi Hiroshi, > > > > As the diff file does not survive the mail process I can't > see the > > actual proposed changes. There doesn't seem to be a bug for > this so > > could you please file one. Also can you get someone to host > the webrev > > for you on cr.openjdk.java.net ? > Or else include the diff in the bug > report. > > > > It is fine for ppc to have variations of cmpxchg with > different memory > > barrier semantics, but the shared API must not be affected > as there is a > > requirement that the basic form of this operation provide "full > > bi-directional fence" semantics. Note that these semantics > are not in > > place to fulfill Java Memory Model requirements, but are an > internal > > contract in hotspot. > > > > Thanks, > > David > > > > On 12/04/2016 3:59 AM, Christian Thalinger wrote: > > > [This should be on hotspot-runtime-dev. BCC?ing > hotspot-compiler-dev.] > > > > > >> On Apr 8, 2016, at 12:53 AM, Hiroshi H Horii > > wrote: > > >> > > >> Dear all: > > >> > > >> Can I please request reviews for the following change? > > >> This change was created for JDK 9 and ppc64. > > >> > > >> Description: > > >> This change adds options of compare-and-exchange for POWER > architecture. > > >> As described in atomic_linux_ppc.inline.hpp, the current > implementation of > > >> cmpxchg is fence_cmpxchg_acquire. This implementation is > useful for > > >> general purposes because twice calls of sync before and after > cmpxchg will > > >> keep consistency. However, they sometimes cause overheads > because > > >> sync instructions are very expensive in the current POWER > chip design. > > >> With this change, callers can explicitly specify to run > fence and > > acquire with > > >> two additional bool parameters. Because their default > values are > "true", > > >> it is not necessary to modify existing cmpxchg calls. > > >> > > >> In addition, with the new parameters of cmpxchg, this > change improves > > >> performance of copy_to_survivor in the parallel GC. > > >> copy_to_survivor changes forward pointers by using > cmpxchg. This > > >> operation doesn't require any sync instructions, in my > understanding. > > >> A pointer is changed at most once in a GC and when > cmpxchg fails, > > >> the latest pointer is available for the caller. > > >> > > >> When I evaluated SPECjbb2013 (slightly customized because > obsolete > grizzly > > >> doesn't support new version format of Java 9), pause time > of young > GC was > > >> reduced from 10% to 20%. > > >> > > >> Summary of source code changes: > > >> > > >> * src/share/vm/runtime/atomic.hpp > > >> * src/share/vm/runtime/atomic.cpp > > >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > >> - Add two arguments of fence and acquire to > cmpxchg only > for PPC64. > > >> Though cmpxchg in atomic_linux_ppc.inline.hpp > has some > branches, > > >> they are reduced while inlining to callers. > > >> > > >> * src/share/vm/oops/oop.inline.hpp > > >> - Changed cas_set_mark to call cmpxchg without > fence and > acquire. > > >> cas_set_mark is called only by cas_forward_to > that is > > called only by > > >> copy_to_survivor_space and oop_promotion_failed in > > >> psPromotionManager. > > >> > > >> Code change: > > >> > > >> Please see an attached diff file that was generated > with "hg > diff -g" > > >> under the latest hotspot directory. > > >> > > >> Passed test: > > >> SPECjbb2013 (customized) > > >> > > >> * I believe some other cmpxchg will be optimized by > reducing fence > > >> or acquire because twice calls of sync are too > conservative > toimplement > > >> Java memory model. > > >> > > >> > > >> > > >> Regards, > > >> Hiroshi > > >> ----------------------- > > >> Hiroshi Horii, Ph.D. > > >> IBM Research - Tokyo > > >> > > > > > > > From david.holmes at oracle.com Tue Apr 19 00:12:27 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 19 Apr 2016 10:12:27 +1000 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> <5711ED18.7000706@oracle.com> <201604180215.u3I2FUZi001650@d19av07.sagamino.japan.ibm.com> <571464DF.3070706@oracle.com> <5714E416.6030300@redhat.com> Message-ID: <571577EB.1080907@oracle.com> On 19/04/2016 2:23 AM, Volker Simonis wrote: > We've looked at the proposed changes and we are pretty sure that the > cmpxchg done during copy_to_survivor_space in the parallel GC doesn't > require the full fence/acquire semantics. But we also agree that this > should not be ifdefed PPC64 in shared code. > > Andrews suggestion of using the new C++11 atomic memory operators is > good, although in practice it may be hard to get all the different > compilers under the hood. > > But now that we've even got the corresponding cmpxchg routines with > various acquire/release semantics in Java-land in the new > jdk.internal.Unsafe package, it would be a pity if it would not be > possible to use that functionality within the Hotspot. > > I think one approach to enable an easy transition would be to do the > proposed enhancements (or something similar) to cmpxchg > unconditionally in atomic.hpp. For example instead of two extra > boolean parameters we could use an enum similar to the one in > library_call.cpp: > > typedef enum { Relaxed, Opaque, Volatile, Acquire, Release } AccessKind; > > The default value of this parameter should of course be conservative > (i.e. Volatile) so we don't change the current behavior. After that > individual, performance critical callers of these routines can be > examined if they really require the most conservative setting and > maybe optimized. > > What do you think? I think expanding our Atomic and OrderAccess API's to align with the C++11 atomic and memory-ordering APIs is a good thing to do - even if we don't actually switch to direct compiler support for a while yet. It may be challenging to efficiently implement all of the C++11 semantics directly in the meantime. However this is too late for JDK 9 I think, with FC (for hotspot) on May 12. Though addressing the immediate concern, without trying to generalize to full C++11 semantic support may be feasible - eg add a "relaxed cas" for use in one or two particular pieces of code. Thanks, David > Regards, > Martin and Volker > > On Mon, Apr 18, 2016 at 3:41 PM, Andrew Haley wrote: >> On 04/18/2016 02:01 PM, Carsten Varming wrote: >>> An important question is: Should the shared parts of hotspot move towards >>> weaker memory models? If yes, then everybody should review code assuming >>> the weaker semantics. If no, then there really isn't room for patches like >>> this one :(. >> >> This would surely be useful. For example, the bitmap marking uses a >> two-way acquire and release barrier at the moment, and I'm fairly sure >> we don't need that. >> >> I don't think this change should be #ifdef PPC64. That disadvantages >> other targets such as AArch64, to no advantage. I understand that >> moving this to shared code requires more work, but we should do at >> least some of it in the JDK9 timeframe. >> >> C++11 has a considerably greater variety of atomic memory operators >> than the ones in HotSpot. Over time I believe we should migrate to >> C++11-like operators in our code base. One way to do this would be to >> create new operators which map in a simple way onto the standard ones. >> The we can get rid of much of this inline assembly code. >> >> Andrew. From varming at gmail.com Mon Apr 18 13:01:47 2016 From: varming at gmail.com (Carsten Varming) Date: Mon, 18 Apr 2016 09:01:47 -0400 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <571464DF.3070706@oracle.com> References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> <5711ED18.7000706@oracle.com> <201604180215.u3I2FUZi001650@d19av07.sagamino.japan.ibm.com> <571464DF.3070706@oracle.com> Message-ID: Dear Hiroshi and David, I took a brief look at the patch and the idea looks good to me, in fact, a bunch of the garbage collectors could benefit from similar changes (G1 and ParNew both use oopDesc::forward_to_atomic which does not call cas_set_mark, and they could both benefit from the weaker semantics). An important question is: Should the shared parts of hotspot move towards weaker memory models? If yes, then everybody should review code assuming the weaker semantics. If no, then there really isn't room for patches like this one :(. The patch needs some work (IMHO). The name cas_set_mark must be renamed to reflect the weaker semantics, and the ifdefs in the shared code should be removed. All platforms should be changed as the Atomic API is extended, and platforms with weaker memory models should implement the weaker semantics. Finally, oopDesc::forward_to_atomic should use the weaker semantics, preferable by calling cas_set_mark instead of calling into Atomic directly. BTW. what is going on in oopDesc::forward_to_atomic? If curMark != oldMark, then curMark must have been marked (a similar fact is checked by "assert(is_forwarded())" (modulo a short race)), oldMark is set to curMark and then oldMark->is_marked is checked in the look condition. Why is there a loop? It seems much simpler if the assert is changed to a "guarantee(curMark->is_marked)" and the loop is changed to an if statement. Oh, and an assert is probably sufficient. My 2 cents, Carsten On Mon, Apr 18, 2016 at 12:38 AM, David Holmes wrote: > Hi Hiroshi, > > On 18/04/2016 12:15 PM, Hiroshi H Horii wrote: > >> Hi David, >> >> Thank you for your replying and sorry that I didn't append my diff file >> when the discussion was forwarded to this mailing list. I appended my >> original diff file (hg diff -g) to this mail. >> >> > It is fine for ppc to have variations of cmpxchg with different memory >> > barrier semantics, but the shared API must not be affected as there is >> a >> > requirement that the basic form of this operation provide "full >> > bi-directional fence" semantics. Note that these semantics are not in >> > place to fulfill Java Memory Model requirements, but are an internal >> > contract in hotspot. >> >> Sure. Probably, it is better for me to modify my patch because it >> changes the internal contract. I will create a new patch that adds new >> cmpxchg functions for ppc. >> > > I think this is only usable from PPC specific code, not from the shared > code as per your original patch. The oopDesc::cas_set_mark may be written > to expect the full bi-directional fence that is required by the atomic.hpp > contract. If we break that contract we would have to prove correctness > along all code paths using that code - well the ppc64 folk would have to do > that :). But I would object to the platform-specific code in the shared > file - sorry. > > Thanks, > David > > > Also can you get someone to host the webrev >> > for you on cr.openjdk.java.net? Or else include the diff in the bug >> report. >> >> I will ask someone to create webrev after my next patch is created. >> >> >> >> Regards, >> Hiroshi >> ----------------------- >> Hiroshi Horii, Ph.D. >> IBM Research - Tokyo >> >> >> David Holmes wrote on 04/16/2016 16:43:20: >> >> > From: David Holmes >> > To: Christian Thalinger , Hiroshi H >> > Horii/Japan/IBM at IBMJP >> > Cc: Tim Ellison , ppc-aix-port- >> > dev at openjdk.java.net, hotspot-runtime-dev at openjdk.java.net >> > Date: 04/16/2016 16:46 >> > Subject: Re: enhancement of cmpxchg and copy_to_survivor for ppc64 >> >> > >> > Hi Hiroshi, >> > >> > As the diff file does not survive the mail process I can't see the >> > actual proposed changes. There doesn't seem to be a bug for this so >> > could you please file one. Also can you get someone to host the webrev >> > for you on cr.openjdk.java.net? Or else include the diff in the bug >> report. >> > >> > It is fine for ppc to have variations of cmpxchg with different memory >> > barrier semantics, but the shared API must not be affected as there is >> a >> > requirement that the basic form of this operation provide "full >> > bi-directional fence" semantics. Note that these semantics are not in >> > place to fulfill Java Memory Model requirements, but are an internal >> > contract in hotspot. >> > >> > Thanks, >> > David >> > >> > On 12/04/2016 3:59 AM, Christian Thalinger wrote: >> > > [This should be on hotspot-runtime-dev. BCC?ing >> hotspot-compiler-dev.] >> > > >> > >> On Apr 8, 2016, at 12:53 AM, Hiroshi H Horii >> wrote: >> > >> >> > >> Dear all: >> > >> >> > >> Can I please request reviews for the following change? >> > >> This change was created for JDK 9 and ppc64. >> > >> >> > >> Description: >> > >> This change adds options of compare-and-exchange for POWER >> architecture. >> > >> As described in atomic_linux_ppc.inline.hpp, the current >> implementation of >> > >> cmpxchg is fence_cmpxchg_acquire. This implementation is useful for >> > >> general purposes because twice calls of sync before and after >> cmpxchg will >> > >> keep consistency. However, they sometimes cause overheads because >> > >> sync instructions are very expensive in the current POWER chip >> design. >> > >> With this change, callers can explicitly specify to run fence and >> > acquire with >> > >> two additional bool parameters. Because their default values are >> "true", >> > >> it is not necessary to modify existing cmpxchg calls. >> > >> >> > >> In addition, with the new parameters of cmpxchg, this change >> improves >> > >> performance of copy_to_survivor in the parallel GC. >> > >> copy_to_survivor changes forward pointers by using cmpxchg. This >> > >> operation doesn't require any sync instructions, in my >> understanding. >> > >> A pointer is changed at most once in a GC and when cmpxchg fails, >> > >> the latest pointer is available for the caller. >> > >> >> > >> When I evaluated SPECjbb2013 (slightly customized because obsolete >> grizzly >> > >> doesn't support new version format of Java 9), pause time of young >> GC was >> > >> reduced from 10% to 20%. >> > >> >> > >> Summary of source code changes: >> > >> >> > >> * src/share/vm/runtime/atomic.hpp >> > >> * src/share/vm/runtime/atomic.cpp >> > >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >> > >> - Add two arguments of fence and acquire to cmpxchg only >> for PPC64. >> > >> Though cmpxchg in atomic_linux_ppc.inline.hpp has some >> branches, >> > >> they are reduced while inlining to callers. >> > >> >> > >> * src/share/vm/oops/oop.inline.hpp >> > >> - Changed cas_set_mark to call cmpxchg without fence and >> acquire. >> > >> cas_set_mark is called only by cas_forward_to that is >> > called only by >> > >> copy_to_survivor_space and oop_promotion_failed in >> > >> psPromotionManager. >> > >> >> > >> Code change: >> > >> >> > >> Please see an attached diff file that was generated with "hg >> diff -g" >> > >> under the latest hotspot directory. >> > >> >> > >> Passed test: >> > >> SPECjbb2013 (customized) >> > >> >> > >> * I believe some other cmpxchg will be optimized by reducing fence >> > >> or acquire because twice calls of sync are too conservative >> toimplement >> > >> Java memory model. >> > >> >> > >> >> > >> >> > >> Regards, >> > >> Hiroshi >> > >> ----------------------- >> > >> Hiroshi Horii, Ph.D. >> > >> IBM Research - Tokyo >> > >> >> > > >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From HORII at jp.ibm.com Fri Apr 22 12:28:13 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Fri, 22 Apr 2016 21:28:13 +0900 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 Message-ID: <201604221228.u3MCSXXK020788@d19av05.sagamino.japan.ibm.com> Dear all: Can I please request reviews for the following change? Code change: http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/ (I initially created and Martin enhanced so much) This change follows the discussion started from this mail. http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/018960.html Description: This change provides relaxed compare-and-exchange by introducing similar semantics of C++ atomic memory operators, enum memory_order. As described in atomic_linux_ppc.inline.hpp, the current implementation of cmpxchg is fence_cmpxchg_acquire. This implementation is useful for general purposes because twice calls of sync before and after cmpxchg will provide strict consistency. However, they sometimes cause overheads because sync instructions are very expensive in the current POWER chip design. In addition, for the other platforms, such as aarch64, this strict semantics may cause some overheads (according to the Andrew's mail). http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019073.html With this change, callers can explicitly specify constraints of memory ordering for cmpxchg with an additional parameter, memory_order order. typedef enum memory_order { memory_order_relaxed, memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel, memory_order_seq_cst } memory_order; Because the default value of the parameter is memory_order_seq_cst, existing codes can use the same semantics of cmpxchg without any modification. The relaxed cmpxchg is implemented only on ppc in this changeset. Therefore, the behavior on the other platforms will not be changed with this changeset. In addition, with the new parameter of cmpxchg, this change improves performance of copy_to_survivor in the parallel GC. copy_to_survivor changes forward pointers by using cmpxchg. This operation doesn't require any sync instructions. A pointer is changed at most once in a GC and when cmpxchg fails, the latest pointer is available for the caller. cas_set_mark and cas_forward_to are extended with an additional memory_order parameter as cmpxchg and copy_to_survivor uses memory_order_relaxed to modify the forward pointers. Summary of source code changes: * src/share/vm/runtime/atomic.hpp - Defines enum memory_order and adds a parameter to cmpxchg. * src/share/vm/runtime/atomic.cpp * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp - Added a parameter for each cmpxchg function to follow the change of atomic.hpp. Their implementations are not changed. * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp - Added a parameter for each cmpxchg function to follow the change of atomic.hpp. In addition, implementations are changed corresponding to the specified memory_order. * src/share/vm/oops/oop.hpp * src/share/vm/oops/oop.inline.hpp - Add a memory_order parameter to use relaxed cmpxchg in cas_set_mark and cas_forward_to. * src/share/vm/gc/parallel/psPromotionManager.cpp * src/share/vm/gc/parallel/psPromotionManager.inline.hpp Martin tested this changeset on linuxx86_64, linuxppc64le and darwinintel64. Though more time is needed to test on the other platform, we would like to ask reviews and start discussion on this changeset. I also tested this changeset with SPECjbb2013 and confirmed that gc pause time is reduced. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Fri Apr 22 12:57:07 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 22 Apr 2016 22:57:07 +1000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> Message-ID: <571A1FA3.9030006@oracle.com> Hi Hiroshi, Two initial questions: 1. Are the current cmpxchg semantics exactly the same as memory_order_seq_cst? 2. Has there been a discussion already, establishing that the modified GC code can indeed use memory_order_relaxed? Otherwise who is postulating that and based on what evidence? Missing memory barriers have caused very difficult to track down bugs in the past - very rare race conditions. So any relaxation here has to be done with extreme confidence. Thanks, David On 22/04/2016 10:28 PM, Hiroshi H Horii wrote: > Dear all: > > Can I please request reviews for the following change? > > Code change: > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/ > (I initially created and Martin enhanced so much) > > This change follows the discussion started from this mail. > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/018960.html > > Description: > This change provides relaxed compare-and-exchange by introducing > similar semantics of C++ atomic memory operators, enum memory_order. > As described in atomic_linux_ppc.inline.hpp, the current implementation of > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for > general purposes because twice calls of sync before and after cmpxchg will > provide strict consistency. However, they sometimes cause overheads > because > sync instructions are very expensive in the current POWER chip design. > In addition, for the other platforms, such as aarch64, this strict > semantics > may cause some overheads (according to the Andrew's mail). > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019073.html > > With this change, callers can explicitly specify constraints of memory > ordering > for cmpxchg with an additional parameter, memory_order order. > > typedef enum memory_order { > memory_order_relaxed, > memory_order_consume, > memory_order_acquire, > memory_order_release, > memory_order_acq_rel, > memory_order_seq_cst > } memory_order; > > Because the default value of the parameter is memory_order_seq_cst, > existing codes can use the same semantics of cmpxchg without any > modification. The relaxed cmpxchg is implemented only on ppc > in this changeset. Therefore, the behavior on the other platforms will > not be changed with this changeset. > > In addition, with the new parameter of cmpxchg, this change improves > performance of copy_to_survivor in the parallel GC. > copy_to_survivor changes forward pointers by using cmpxchg. This > operation doesn't require any sync instructions. A pointer is changed > at most once in a GC and when cmpxchg fails, the latest pointer is > available for the caller. cas_set_mark and cas_forward_to are extended > with an additional memory_order parameter as cmpxchg and copy_to_survivor > uses memory_order_relaxed to modify the forward pointers. > > Summary of source code changes: > > * src/share/vm/runtime/atomic.hpp > - Defines enum memory_order and adds a parameter to cmpxchg. > > * src/share/vm/runtime/atomic.cpp > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp > - Added a parameter for each cmpxchg function to follow > the change of atomic.hpp. Their implementations are not changed. > > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > - Added a parameter for each cmpxchg function to follow > the change of atomic.hpp. In addition, implementations > are changed corresponding to the specified memory_order. > > * src/share/vm/oops/oop.hpp > * src/share/vm/oops/oop.inline.hpp > - Add a memory_order parameter to use relaxed cmpxchg in > cas_set_mark and cas_forward_to. > > * src/share/vm/gc/parallel/psPromotionManager.cpp > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp > > Martin tested this changeset on linuxx86_64, linuxppc64le and > darwinintel64. > Though more time is needed to test on the other platform, we would like to > ask > reviews and start discussion on this changeset. > I also tested this changeset with SPECjbb2013 and confirmed that gc pause > time > is reduced. > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > From HORII at jp.ibm.com Mon Apr 25 07:09:30 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Mon, 25 Apr 2016 16:09:30 +0900 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <571A1FA3.9030006@oracle.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <571A1FA3.9030006@oracle.com> Message-ID: <201604250709.u3P79c66017076@d19av06.sagamino.japan.ibm.com> Hi David, Thank you for your comments and questions. > 1. Are the current cmpxchg semantics exactly the same as > memory_order_seq_cst? This is very good question.. I guess, cmpxchg needs a more conservative constraint for memory ordering than C++11, to add sync after a compare-and-exchange operation. Could someone give comments or thoughts? memory_order_seq_cst is defined as "Any operation with this memory order is both an acquire operation and a release operation, plus a single total order exists in which all threads observe all modifications (see below) in the same order." (http://en.cppreference.com/w/cpp/atomic/memory_order) In my environment, g++ and xlc generate following assemblies on ppc64le. (interestingly, they generates the same assemblies for any memory_order) g++ (4.9.2) 100008a4: ac 04 00 7c sync 100008a8: 28 50 20 7d lwarx r9,0,r10 100008ac: 00 18 09 7c cmpw r9,r3 100008b0: 0c 00 c2 40 bne- 100008bc 100008b4: 2d 51 80 7c stwcx. r4,0,r10 100008b8: f0 ff c2 40 bne- 100008a8 100008bc: 2c 01 00 4c isync xlc (13.1.3) 10000888: ac 04 00 7c sync 1000088c: 28 28 c0 7c lwarx r6,0,r5 10000890: 40 00 26 7c cmpld r6,r0 10000894: 0c 00 82 40 bne 100008a0 10000898: 2d 29 80 7c stwcx. r4,0,r5 1000089c: f0 ff e2 40 bne+ 1000088c 100008a0: 2c 01 00 4c isync On the other hand, the current OpenJDK generates following assemblies. 508: ac 04 00 7c sync 50c: 00 00 5c e9 ld r10,0(r28) 510: 00 50 3b 7c cmpd r27,r10 514: 1c 00 c2 40 bne- 530 518: a8 40 5c 7d ldarx r10,r28,r8 51c: 00 50 3b 7c cmpd r27,r10 520: 10 00 c2 40 bne- 530 524: ad 41 3c 7d stdcx. r9,r28,r8 528: f0 ff c2 40 bne- 518 52c: ac 04 00 7c sync 530: 00 50 bb 7f ... Though we can ignore 50c-514 (because they are a duplicated guard condition), the last sync instruction (52c) makes cmpxchg more strict than memory_order_seq_cst. In some cases, the last sync is necessary when this thread must be able to read all of the changes in the other threads while executing from 508 to 530 (that processes compare-and-exchange). > 2. Has there been a discussion already, establishing that the modified > GC code can indeed use memory_order_relaxed? Otherwise who is > postulating that and based on what evidence? Volker and his colleagues have investigated the current GC codes according to this. http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019079.html However, I believe, we need comments of other GC experts to change the shared codes. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo David Holmes wrote on 04/22/2016 21:57:07: > From: David Holmes > To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime- > dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net > Cc: Tim Ellison , ppc-aix-port-dev at openjdk.java.net > Date: 04/22/2016 21:58 > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > copy_to_survivor for ppc64 > > Hi Hiroshi, > > Two initial questions: > > 1. Are the current cmpxchg semantics exactly the same as > memory_order_seq_cst? > > 2. Has there been a discussion already, establishing that the modified > GC code can indeed use memory_order_relaxed? Otherwise who is > postulating that and based on what evidence? > > Missing memory barriers have caused very difficult to track down bugs in > the past - very rare race conditions. So any relaxation here has to be > done with extreme confidence. > > Thanks, > David > > On 22/04/2016 10:28 PM, Hiroshi H Horii wrote: > > Dear all: > > > > Can I please request reviews for the following change? > > > > Code change: > > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/ > > (I initially created and Martin enhanced so much) > > > > This change follows the discussion started from this mail. > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > April/018960.html > > > > Description: > > This change provides relaxed compare-and-exchange by introducing > > similar semantics of C++ atomic memory operators, enum memory_order. > > As described in atomic_linux_ppc.inline.hpp, the current implementation of > > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for > > general purposes because twice calls of sync before and after cmpxchg will > > provide strict consistency. However, they sometimes cause overheads > > because > > sync instructions are very expensive in the current POWER chip design. > > In addition, for the other platforms, such as aarch64, this strict > > semantics > > may cause some overheads (according to the Andrew's mail). > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > April/019073.html > > > > With this change, callers can explicitly specify constraints of memory > > ordering > > for cmpxchg with an additional parameter, memory_order order. > > > > typedef enum memory_order { > > memory_order_relaxed, > > memory_order_consume, > > memory_order_acquire, > > memory_order_release, > > memory_order_acq_rel, > > memory_order_seq_cst > > } memory_order; > > > > Because the default value of the parameter is memory_order_seq_cst, > > existing codes can use the same semantics of cmpxchg without any > > modification. The relaxed cmpxchg is implemented only on ppc > > in this changeset. Therefore, the behavior on the other platforms will > > not be changed with this changeset. > > > > In addition, with the new parameter of cmpxchg, this change improves > > performance of copy_to_survivor in the parallel GC. > > copy_to_survivor changes forward pointers by using cmpxchg. This > > operation doesn't require any sync instructions. A pointer is changed > > at most once in a GC and when cmpxchg fails, the latest pointer is > > available for the caller. cas_set_mark and cas_forward_to are extended > > with an additional memory_order parameter as cmpxchg and copy_to_survivor > > uses memory_order_relaxed to modify the forward pointers. > > > > Summary of source code changes: > > > > * src/share/vm/runtime/atomic.hpp > > - Defines enum memory_order and adds a parameter to cmpxchg. > > > > * src/share/vm/runtime/atomic.cpp > > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp > > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp > > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp > > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp > > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp > > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp > > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp > > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp > > - Added a parameter for each cmpxchg function to follow > > the change of atomic.hpp. Their implementations are not changed. > > > > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp > > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > - Added a parameter for each cmpxchg function to follow > > the change of atomic.hpp. In addition, implementations > > are changed corresponding to the specified memory_order. > > > > * src/share/vm/oops/oop.hpp > > * src/share/vm/oops/oop.inline.hpp > > - Add a memory_order parameter to use relaxed cmpxchg in > > cas_set_mark and cas_forward_to. > > > > * src/share/vm/gc/parallel/psPromotionManager.cpp > > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp > > > > Martin tested this changeset on linuxx86_64, linuxppc64le and > > darwinintel64. > > Though more time is needed to test on the other platform, we would like to > > ask > > reviews and start discussion on this changeset. > > I also tested this changeset with SPECjbb2013 and confirmed that gc pause > > time > > is reduced. > > > > Regards, > > Hiroshi > > ----------------------- > > Hiroshi Horii, Ph.D. > > IBM Research - Tokyo > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Mon Apr 25 10:25:15 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 25 Apr 2016 10:25:15 +0000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <201604250709.u3P79lDi013336@d19av08.sagamino.japan.ibm.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <571A1FA3.9030006@oracle.com> <201604250709.u3P79lDi013336@d19av08.sagamino.japan.ibm.com> Message-ID: Hi David and Hiroshi, thank you very much for this interesting question and analysis. I think we shouldn't better use an own enum (e.g. like AccessKind in library_call.cpp). Otherwise we'll get trouble when we switch to C++11. Would you agree? Would it be better to split this bug into 2 and discuss the cmpxchg interface change on the runtime list and the GC change on the gc list? Best regards, Martin From: Hiroshi H Horii [mailto:HORII at jp.ibm.com] Sent: Montag, 25. April 2016 09:10 To: David Holmes Cc: hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Tim Ellison ; Volker Simonis ; Doerr, Martin ; Lindenmaier, Goetz Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 Hi David, Thank you for your comments and questions. > 1. Are the current cmpxchg semantics exactly the same as > memory_order_seq_cst? This is very good question.. I guess, cmpxchg needs a more conservative constraint for memory ordering than C++11, to add sync after a compare-and-exchange operation. Could someone give comments or thoughts? memory_order_seq_cst is defined as "Any operation with this memory order is both an acquire operation and a release operation, plus a single total order exists in which all threads observe all modifications (see below) in the same order." (http://en.cppreference.com/w/cpp/atomic/memory_order) In my environment, g++ and xlc generate following assemblies on ppc64le. (interestingly, they generates the same assemblies for any memory_order) g++ (4.9.2) 100008a4: ac 04 00 7c sync 100008a8: 28 50 20 7d lwarx r9,0,r10 100008ac: 00 18 09 7c cmpw r9,r3 100008b0: 0c 00 c2 40 bne- 100008bc 100008b4: 2d 51 80 7c stwcx. r4,0,r10 100008b8: f0 ff c2 40 bne- 100008a8 100008bc: 2c 01 00 4c isync xlc (13.1.3) 10000888: ac 04 00 7c sync 1000088c: 28 28 c0 7c lwarx r6,0,r5 10000890: 40 00 26 7c cmpld r6,r0 10000894: 0c 00 82 40 bne 100008a0 10000898: 2d 29 80 7c stwcx. r4,0,r5 1000089c: f0 ff e2 40 bne+ 1000088c 100008a0: 2c 01 00 4c isync On the other hand, the current OpenJDK generates following assemblies. 508: ac 04 00 7c sync 50c: 00 00 5c e9 ld r10,0(r28) 510: 00 50 3b 7c cmpd r27,r10 514: 1c 00 c2 40 bne- 530 518: a8 40 5c 7d ldarx r10,r28,r8 51c: 00 50 3b 7c cmpd r27,r10 520: 10 00 c2 40 bne- 530 524: ad 41 3c 7d stdcx. r9,r28,r8 528: f0 ff c2 40 bne- 518 52c: ac 04 00 7c sync 530: 00 50 bb 7f ... Though we can ignore 50c-514 (because they are a duplicated guard condition), the last sync instruction (52c) makes cmpxchg more strict than memory_order_seq_cst. In some cases, the last sync is necessary when this thread must be able to read all of the changes in the other threads while executing from 508 to 530 (that processes compare-and-exchange). > 2. Has there been a discussion already, establishing that the modified > GC code can indeed use memory_order_relaxed? Otherwise who is > postulating that and based on what evidence? Volker and his colleagues have investigated the current GC codes according to this. http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019079.html However, I believe, we need comments of other GC experts to change the shared codes. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo David Holmes > wrote on 04/22/2016 21:57:07: > From: David Holmes > > To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime- > dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net > Cc: Tim Ellison >, ppc-aix-port-dev at openjdk.java.net > Date: 04/22/2016 21:58 > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > copy_to_survivor for ppc64 > > Hi Hiroshi, > > Two initial questions: > > 1. Are the current cmpxchg semantics exactly the same as > memory_order_seq_cst? > > 2. Has there been a discussion already, establishing that the modified > GC code can indeed use memory_order_relaxed? Otherwise who is > postulating that and based on what evidence? > > Missing memory barriers have caused very difficult to track down bugs in > the past - very rare race conditions. So any relaxation here has to be > done with extreme confidence. > > Thanks, > David > > On 22/04/2016 10:28 PM, Hiroshi H Horii wrote: > > Dear all: > > > > Can I please request reviews for the following change? > > > > Code change: > > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/ > > (I initially created and Martin enhanced so much) > > > > This change follows the discussion started from this mail. > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > April/018960.html > > > > Description: > > This change provides relaxed compare-and-exchange by introducing > > similar semantics of C++ atomic memory operators, enum memory_order. > > As described in atomic_linux_ppc.inline.hpp, the current implementation of > > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for > > general purposes because twice calls of sync before and after cmpxchg will > > provide strict consistency. However, they sometimes cause overheads > > because > > sync instructions are very expensive in the current POWER chip design. > > In addition, for the other platforms, such as aarch64, this strict > > semantics > > may cause some overheads (according to the Andrew's mail). > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > April/019073.html > > > > With this change, callers can explicitly specify constraints of memory > > ordering > > for cmpxchg with an additional parameter, memory_order order. > > > > typedef enum memory_order { > > memory_order_relaxed, > > memory_order_consume, > > memory_order_acquire, > > memory_order_release, > > memory_order_acq_rel, > > memory_order_seq_cst > > } memory_order; > > > > Because the default value of the parameter is memory_order_seq_cst, > > existing codes can use the same semantics of cmpxchg without any > > modification. The relaxed cmpxchg is implemented only on ppc > > in this changeset. Therefore, the behavior on the other platforms will > > not be changed with this changeset. > > > > In addition, with the new parameter of cmpxchg, this change improves > > performance of copy_to_survivor in the parallel GC. > > copy_to_survivor changes forward pointers by using cmpxchg. This > > operation doesn't require any sync instructions. A pointer is changed > > at most once in a GC and when cmpxchg fails, the latest pointer is > > available for the caller. cas_set_mark and cas_forward_to are extended > > with an additional memory_order parameter as cmpxchg and copy_to_survivor > > uses memory_order_relaxed to modify the forward pointers. > > > > Summary of source code changes: > > > > * src/share/vm/runtime/atomic.hpp > > - Defines enum memory_order and adds a parameter to cmpxchg. > > > > * src/share/vm/runtime/atomic.cpp > > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp > > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp > > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp > > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp > > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp > > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp > > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp > > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp > > - Added a parameter for each cmpxchg function to follow > > the change of atomic.hpp. Their implementations are not changed. > > > > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp > > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > - Added a parameter for each cmpxchg function to follow > > the change of atomic.hpp. In addition, implementations > > are changed corresponding to the specified memory_order. > > > > * src/share/vm/oops/oop.hpp > > * src/share/vm/oops/oop.inline.hpp > > - Add a memory_order parameter to use relaxed cmpxchg in > > cas_set_mark and cas_forward_to. > > > > * src/share/vm/gc/parallel/psPromotionManager.cpp > > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp > > > > Martin tested this changeset on linuxx86_64, linuxppc64le and > > darwinintel64. > > Though more time is needed to test on the other platform, we would like to > > ask > > reviews and start discussion on this changeset. > > I also tested this changeset with SPECjbb2013 and confirmed that gc pause > > time > > is reduced. > > > > Regards, > > Hiroshi > > ----------------------- > > Hiroshi Horii, Ph.D. > > IBM Research - Tokyo > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikael.vidstedt at oracle.com Tue Apr 26 18:35:58 2016 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Tue, 26 Apr 2016 11:35:58 -0700 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com> <570C417F.20600@oracle.com> Message-ID: <571FB50E.6090108@oracle.com> On 4/12/2016 2:15 AM, Thomas St?fe wrote: > Hi Mikael, David, > > On Tue, Apr 12, 2016 at 2:29 AM, David Holmes > wrote: > > On 11/04/2016 10:57 AM, David Holmes wrote: > > Hi Mikael, > > I think we need to be able to answer the question as to why > the stubbed > and stubless forms of this code exist to ensure that > converting all > platforms to the same form is appropriate. > > > The more I look at this the more the stubs make no sense :) AIII a > stub is generated when we need runtime code that may be different > to that which we could write directly for compiling at build time > - ie to use CPU specific features of the actual CPU. But I see > nothing here that suggests any such usage. > > So I agree with removing the stubs. > > I'm still going through this but my initial reaction is to > wonder why we > don't use the same form of handle_unsafe_access on all > platforms and > always pass in npc? (That seems to be the only difference in > code that > otherwise seems platform independent.) > > > Futher to this and Thomas's comments I think > handle_unsafe_access(thread, pc, npc) can be defined in shared > code (where? not sure). Further, if we always pass in npc then we > don't need to pass in pc as it is unused (seems unused in original > code too for sparc). > > > I agree. We commonized ucontext_set_pc for all Posix platforms, so we > can make a common function "handle_unsafe_access(thread, npc)" and > inside use os::Posix::ucontext_set_pc to modify the context. Then we > can get rid of the special handling in the signal handlers inside > os_aix_ppc.cpp and os_linux_ppc.cpp (for both the compiled and the > interpreted case). There is definitely room for unification and simplification here. Right now the signal handling code is, sadly, different on all the different platforms, despite the fact that in many cases it should be similar or the exact same. That said, as much as a refactoring/rewrite of the signal handler code is needed, it will very quickly turn into a much larger effort... In this specific case, it would probably make more sense to pass in the full context to the handle_unsafe_access method and have it do whatever it feels is necessary to update it. However, a lot of the signal handler code assumes that a "stub" variable gets set up and only at the end of the main signal handler function does the actual context get updated. Changing how that works only for this specific case is obviously not a good idea, which means it's back to the full scale refactoring and out of scope for the bug fix. So to me the fact that the method prototypes differ depending on the exact platform is just a reflection of how the contexts differ. In lack of the full context the handler method needs to take whatever parts of the context is needed to do it's job. I could of course change the handler method to only take a single "next_pc" argument, but to me that feels like putting a key part of the logic which handles the unsafe access (specifically, the part which calculates the next pc) in the wrong place - IMHO that should really be tightly coupled with the rest of the logic needed to handle an unsafe access (updating the thread state etc.), and hence I feel that it really belongs in the handle_unsafe_access method itself. Happy to hear your thoughts, but I hope we can agree that the suggested fix, even in its current state, is still significantly better than what is there now. Unless somebody has a better suggestion, I'm going to be moving the implementations of the handle_unsafe_access methods to sharedRuntime (instead of stubRoutines) and will send out a new webrev shortly. Cheers, Mikael > > > BTW I found this comment somewhat unfathomable (both now and in > original code): > > + // pc is the instruction which we must emulate > + // doing a no-op is fine: return garbage from the load > > but finally realized that it means that after the load that raised > the signal the native code proceeds normally but the value > apparently loaded is just garbage/arbitrary, and the only sign > something went wrong is the setting of the pending > unsafe-access-error bit. This would be a potential source of bugs > I think, except that when we hit the Java level, we throw the > exception and so never actually "return" the garbage value. But it > does mean we would have to be careful if calling the unsafe > routines from native code. > > > I admit I do not understand fully how the > _special_runtime_exit_condition flag is processed later, at least not > for all cases: If I have a java method A using sun.misc.unsafe, which > gets compiled, the sun.misc.unsafe intrinsic gets inlined into that > method. So, the whole method A gets marked as "has unsafe access"? So, > any SIGBUS happening inside this method - which may be larger than the > inlined sun.misc.unsafe call - will yield an InternalError? And when > is the flag checked if that method A is called from another java method B? > > Sorry if the questions are stupid, I am not a JIT expert, but I try to > understand how much can happen between the SIGBUS and the > InternalError getting thrown. No questions are stupid here. As you may have seen in the other thread, I filed JDK-8154592[1] to cover making the handling of the faults synchronous. Hope that helps. Cheers, Mikael [1] https://bugs.openjdk.java.net/browse/JDK-8154592 > > Thanks, Thomas > > Thanks, > David > > > Thanks, > David > > On 9/04/2016 8:33 AM, Mikael Vidstedt wrote: > > > Please review: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 > Webrev: > http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ > > > > > * Note: this is patch 2 in a set of 3 all aiming to clean > up and unify > the unsafe memory getters/setters, along with the handling > of unsafe > access errors. The other two issues are: > > https://bugs.openjdk.java.net/browse/JDK-8153890 - Handle > unsafe access > error as an asynchronous exception > https://bugs.openjdk.java.net/browse/JDK-8150921 - Update > Unsafe > getters/setters to use double-register variants > > > * Summary (copied from the bug description) > > > In certain cases, such as accessing a region of a memory > mapped file > which has been truncated on unix-style operating systems, > a SIGBUS > signal will be raised and the VM will process it in the > signal handler. > > How the signal is processed differs depending on the > operating system > and/or CPU architecture, with two major alternatives: > > * "stubless" > > Do the necessary thread state updates directly in the > signal handler, > and modify the context so that the signal handler returns > to the place > where the execution should continue > > * Using a stub > > Update the context so that when the signal handler returns > the thread > will continue execution in a generated stub, which in turn > will call > some native code in the VM to update the thread state and > figure out > where execution should continue. The stub will then jump > to that new > place. > > > It should be noted that the work of updating the thread > state is very > small - it's setting a flag or two in the thread > structure, and figures > out where the next instruction starts. It should also be > noted that the > generated stubs today are broken, because they do not > preserve all the > live registers over the call into the VM. There are two > ways to address > this: > > * Preserve all the necessary registers > > This would mean implementing, in macro assembly, the > necessary logic for > preserving all the live registers, including, but not > limited to, > floating point registers, flag registers, etc. It quickly > becomes > obvious that this platform specific and error prone. > > * Leverage the fact that the operating system already does > this as part > of the signal handling > > Do the necessary work in the signal handler instead, > removing the need > for the stub alltogether > > As mentioned, on some platforms the latter model is > already in use. It > is dramatically easier and all platforms should be updated > to do it the > same way. > > > * Testing > > Just as mentioned in the RFR for JDK-8153890, a new test > was developed > to test this code path: > > http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java > > > In fact, it was when running this test I found the > register preservation > issue. JPRT also passes. Much like JDK-8153890 I wanted to > get some > feedback on this before running additional tests. > > > Cheers, > Mikael > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Wed Apr 27 07:24:32 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 27 Apr 2016 09:24:32 +0200 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: <571FB50E.6090108@oracle.com> References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com> <570C417F.20600@oracle.com> <571FB50E.6090108@oracle.com> Message-ID: Hi Mikael, On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt wrote: > > > On 4/12/2016 2:15 AM, Thomas St?fe wrote: > > Hi Mikael, David, > > On Tue, Apr 12, 2016 at 2:29 AM, David Holmes > wrote: > >> On 11/04/2016 10:57 AM, David Holmes wrote: >> >>> Hi Mikael, >>> >>> I think we need to be able to answer the question as to why the stubbed >>> and stubless forms of this code exist to ensure that converting all >>> platforms to the same form is appropriate. >>> >> >> The more I look at this the more the stubs make no sense :) AIII a stub >> is generated when we need runtime code that may be different to that which >> we could write directly for compiling at build time - ie to use CPU >> specific features of the actual CPU. But I see nothing here that suggests >> any such usage. >> >> So I agree with removing the stubs. >> >> I'm still going through this but my initial reaction is to wonder why we >>> don't use the same form of handle_unsafe_access on all platforms and >>> always pass in npc? (That seems to be the only difference in code that >>> otherwise seems platform independent.) >>> >> >> Futher to this and Thomas's comments I think handle_unsafe_access(thread, >> pc, npc) can be defined in shared code (where? not sure). Further, if we >> always pass in npc then we don't need to pass in pc as it is unused (seems >> unused in original code too for sparc). >> >> > I agree. We commonized ucontext_set_pc for all Posix platforms, so we can > make a common function "handle_unsafe_access(thread, npc)" and inside use > os::Posix::ucontext_set_pc to modify the context. Then we can get rid of > the special handling in the signal handlers inside os_aix_ppc.cpp and > os_linux_ppc.cpp (for both the compiled and the interpreted case). > > > There is definitely room for unification and simplification here. Right > now the signal handling code is, sadly, different on all the different > platforms, despite the fact that in many cases it should be similar or the > exact same. That said, as much as a refactoring/rewrite of the signal > handler code is needed, it will very quickly turn into a much larger > effort... > > In this specific case, it would probably make more sense to pass in the > full context to the handle_unsafe_access method and have it do whatever it > feels is necessary to update it. However, a lot of the signal handler code > assumes that a "stub" variable gets set up and only at the end of the main > signal handler function does the actual context get updated. Changing how > that works only for this specific case is obviously not a good idea, which > means it's back to the full scale refactoring and out of scope for the bug > fix. > > So to me the fact that the method prototypes differ depending on the exact > platform is just a reflection of how the contexts differ. In lack of the > full context the handler method needs to take whatever parts of the context > is needed to do it's job. I could of course change the handler method to > only take a single "next_pc" argument, but to me that feels like putting a > key part of the logic which handles the unsafe access (specifically, the > part which calculates the next pc) in the wrong place - IMHO that should > really be tightly coupled with the rest of the logic needed to handle an > unsafe access (updating the thread state etc.), and hence I feel that it > really belongs in the handle_unsafe_access method itself. Happy to hear > your thoughts, but I hope we can agree that the suggested fix, even in its > current state, is still significantly better than what is there now. > > > Unless somebody has a better suggestion, I'm going to be moving the > implementations of the handle_unsafe_access methods to sharedRuntime > (instead of stubRoutines) and will send out a new webrev shortly. > > I am unhappy with the fact that we factor unsafe handling out for x86 and sparc but do it inline for ppc. I know that was done before your change as well but would be happy if that could be improved. I would prefer either one of: 1) flatten out the coding into the signal handlers like it is done in os_linux_ppc.cpp and os_aix_ppc.cpp or 2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case I would actually prefer (1) even though this would multiply the code out for all os cases into ; we are only talking about 1-2 lines of additional coding, and it would improve the readability of the signal handlers. But this is only my personal opinion, and I do not have strong emotions. I agree with you that a full cleanup of the signal coding is out of scope for this issue. > Cheers, > Mikael > > > > BTW I found this comment somewhat unfathomable (both now and in original >> code): >> >> + // pc is the instruction which we must emulate >> + // doing a no-op is fine: return garbage from the load >> >> but finally realized that it means that after the load that raised the >> signal the native code proceeds normally but the value apparently loaded is >> just garbage/arbitrary, and the only sign something went wrong is the >> setting of the pending unsafe-access-error bit. This would be a potential >> source of bugs I think, except that when we hit the Java level, we throw >> the exception and so never actually "return" the garbage value. But it does >> mean we would have to be careful if calling the unsafe routines from native >> code. >> >> > I admit I do not understand fully how the _special_runtime_exit_condition > flag is processed later, at least not for all cases: If I have a java > method A using sun.misc.unsafe, which gets compiled, the sun.misc.unsafe > intrinsic gets inlined into that method. So, the whole method A gets marked > as "has unsafe access"? So, any SIGBUS happening inside this method - which > may be larger than the inlined sun.misc.unsafe call - will yield an > InternalError? And when is the flag checked if that method A is called from > another java method B? > > Sorry if the questions are stupid, I am not a JIT expert, but I try to > understand how much can happen between the SIGBUS and the InternalError > getting thrown. > > > No questions are stupid here. As you may have seen in the other thread, I > filed JDK-8154592[1] to cover making the handling of the faults > synchronous. Hope that helps. > > Thank you! Kind Regards, Thomas > Cheers, > Mikael > > [1] https://bugs.openjdk.java.net/browse/JDK-8154592 > > > > Thanks, Thomas > > >> Thanks, >> David >> >> >> Thanks, >>> David >>> >>> On 9/04/2016 8:33 AM, Mikael Vidstedt wrote: >>> >>>> >>>> Please review: >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 >>>> Webrev: >>>> >>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ >>>> >>>> >>>> >>>> * Note: this is patch 2 in a set of 3 all aiming to clean up and unify >>>> the unsafe memory getters/setters, along with the handling of unsafe >>>> access errors. The other two issues are: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8153890 - Handle unsafe access >>>> error as an asynchronous exception >>>> https://bugs.openjdk.java.net/browse/JDK-8150921 - Update Unsafe >>>> getters/setters to use double-register variants >>>> >>>> >>>> * Summary (copied from the bug description) >>>> >>>> >>>> In certain cases, such as accessing a region of a memory mapped file >>>> which has been truncated on unix-style operating systems, a SIGBUS >>>> signal will be raised and the VM will process it in the signal handler. >>>> >>>> How the signal is processed differs depending on the operating system >>>> and/or CPU architecture, with two major alternatives: >>>> >>>> * "stubless" >>>> >>>> Do the necessary thread state updates directly in the signal handler, >>>> and modify the context so that the signal handler returns to the place >>>> where the execution should continue >>>> >>>> * Using a stub >>>> >>>> Update the context so that when the signal handler returns the thread >>>> will continue execution in a generated stub, which in turn will call >>>> some native code in the VM to update the thread state and figure out >>>> where execution should continue. The stub will then jump to that new >>>> place. >>>> >>>> >>>> It should be noted that the work of updating the thread state is very >>>> small - it's setting a flag or two in the thread structure, and figures >>>> out where the next instruction starts. It should also be noted that the >>>> generated stubs today are broken, because they do not preserve all the >>>> live registers over the call into the VM. There are two ways to address >>>> this: >>>> >>>> * Preserve all the necessary registers >>>> >>>> This would mean implementing, in macro assembly, the necessary logic for >>>> preserving all the live registers, including, but not limited to, >>>> floating point registers, flag registers, etc. It quickly becomes >>>> obvious that this platform specific and error prone. >>>> >>>> * Leverage the fact that the operating system already does this as part >>>> of the signal handling >>>> >>>> Do the necessary work in the signal handler instead, removing the need >>>> for the stub alltogether >>>> >>>> As mentioned, on some platforms the latter model is already in use. It >>>> is dramatically easier and all platforms should be updated to do it the >>>> same way. >>>> >>>> >>>> * Testing >>>> >>>> Just as mentioned in the RFR for JDK-8153890, a new test was developed >>>> to test this code path: >>>> >>>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java >>>> >>>> In fact, it was when running this test I found the register preservation >>>> issue. JPRT also passes. Much like JDK-8153890 I wanted to get some >>>> feedback on this before running additional tests. >>>> >>>> >>>> Cheers, >>>> Mikael >>>> >>>> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From HORII at jp.ibm.com Wed Apr 27 03:34:12 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Wed, 27 Apr 2016 12:34:12 +0900 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <571A1FA3.9030006@oracle.com> <201604250709.u3P79lDi013336@d19av08.sagamino.japan.ibm.com> Message-ID: <201604271338.u3RDcdPU015737@d19av07.sagamino.japan.ibm.com> Hi Martin, > I think we shouldn?t better use an own enum (e.g. like AccessKind in > library_call.cpp). > Otherwise we?ll get trouble when we switch to C++11. Would you agree? I agree. I think, to use the enum and semantics of C++11, callers of cmpxchg need to call memory-barrier after cmpxchg when all of updates in the other processes must be available for the following instructions of the cmpxchg. Correct? > Would it be better to split this bug into 2 and discuss the cmpxchg > interface change on the runtime list and the GC change on the gc list? Do you mean that a new cmpxchg with relaxed semantics will be added and used in the GC change? Or, after the discussion of the new cmpxchg interface, will be the discussion of the GC change started? Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo "Doerr, Martin" wrote on 04/25/2016 19:25:15: > From: "Doerr, Martin" > To: Hiroshi H Horii/Japan/IBM at IBMJP, David Holmes > Cc: "hotspot-gc-dev at openjdk.java.net" dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" > , "ppc-aix-port- > dev at openjdk.java.net" , Tim > Ellison , Volker Simonis > , "Lindenmaier, Goetz" > Date: 04/25/2016 19:26 > Subject: RE: RFR(M): 8154736: enhancement of cmpxchg and > copy_to_survivor for ppc64 > > Hi David and Hiroshi, > > thank you very much for this interesting question and analysis. > > I think we shouldn?t better use an own enum (e.g. like AccessKind in > library_call.cpp). > Otherwise we?ll get trouble when we switch to C++11. Would you agree? > > Would it be better to split this bug into 2 and discuss the cmpxchg > interface change on the runtime list and the GC change on the gc list? > > Best regards, > Martin > > From: Hiroshi H Horii [mailto:HORII at jp.ibm.com] > Sent: Montag, 25. April 2016 09:10 > To: David Holmes > Cc: hotspot-gc-dev at openjdk.java.net; hotspot-runtime- > dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Tim Ellison > ; Volker Simonis ; > Doerr, Martin ; Lindenmaier, Goetz > > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > copy_to_survivor for ppc64 > > Hi David, > > Thank you for your comments and questions. > > > 1. Are the current cmpxchg semantics exactly the same as > > memory_order_seq_cst? > > This is very good question.. > > I guess, cmpxchg needs a more conservative constraint for memory ordering > than C++11, to add sync after a compare-and-exchange operation. > > Could someone give comments or thoughts? > > memory_order_seq_cst is defined as > "Any operation with this memory order is both an acquire operation and > a release operation, plus a single total order exists in which > all threads > observe all modifications (see below) in the same order." > (http://en.cppreference.com/w/cpp/atomic/memory_order) > > In my environment, g++ and xlc generate following assemblies on ppc64le. > (interestingly, they generates the same assemblies for any memory_order) > > g++ (4.9.2) > 100008a4: ac 04 00 7c sync > 100008a8: 28 50 20 7d lwarx r9,0,r10 > 100008ac: 00 18 09 7c cmpw r9,r3 > 100008b0: 0c 00 c2 40 bne- 100008bc > 100008b4: 2d 51 80 7c stwcx. r4,0,r10 > 100008b8: f0 ff c2 40 bne- 100008a8 > 100008bc: 2c 01 00 4c isync > > xlc (13.1.3) > 10000888: ac 04 00 7c sync > 1000088c: 28 28 c0 7c lwarx r6,0,r5 > 10000890: 40 00 26 7c cmpld r6,r0 > 10000894: 0c 00 82 40 bne 100008a0 > 10000898: 2d 29 80 7c stwcx. r4,0,r5 > 1000089c: f0 ff e2 40 bne+ 1000088c > 100008a0: 2c 01 00 4c isync > > On the other hand, the current OpenJDK generates following assemblies. > > 508: ac 04 00 7c sync > 50c: 00 00 5c e9 ld r10,0(r28) > 510: 00 50 3b 7c cmpd r27,r10 > 514: 1c 00 c2 40 bne- 530 > 518: a8 40 5c 7d ldarx r10,r28,r8 > 51c: 00 50 3b 7c cmpd r27,r10 > 520: 10 00 c2 40 bne- 530 > 524: ad 41 3c 7d stdcx. r9,r28,r8 > 528: f0 ff c2 40 bne- 518 > 52c: ac 04 00 7c sync > 530: 00 50 bb 7f ... > > Though we can ignore 50c-514 (because they are a duplicated guard condition), > the last sync instruction (52c) makes cmpxchg more strict than > memory_order_seq_cst. > > In some cases, the last sync is necessary when this thread must be > able to read > all of the changes in the other threads while executing from 508 to 530 > (that processes compare-and-exchange). > > > 2. Has there been a discussion already, establishing that the modified > > GC code can indeed use memory_order_relaxed? Otherwise who is > > postulating that and based on what evidence? > > Volker and his colleagues have investigated the current GC codes > according to this. > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > April/019079.html > However, I believe, we need comments of other GC experts to change > the shared codes. > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > David Holmes wrote on 04/22/2016 21:57:07: > > > From: David Holmes > > To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime- > > dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net > > Cc: Tim Ellison , ppc-aix-port-dev at openjdk.java.net > > Date: 04/22/2016 21:58 > > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > > copy_to_survivor for ppc64 > > > > Hi Hiroshi, > > > > Two initial questions: > > > > 1. Are the current cmpxchg semantics exactly the same as > > memory_order_seq_cst? > > > > 2. Has there been a discussion already, establishing that the modified > > GC code can indeed use memory_order_relaxed? Otherwise who is > > postulating that and based on what evidence? > > > > Missing memory barriers have caused very difficult to track down bugs in > > the past - very rare race conditions. So any relaxation here has to be > > done with extreme confidence. > > > > Thanks, > > David > > > > On 22/04/2016 10:28 PM, Hiroshi H Horii wrote: > > > Dear all: > > > > > > Can I please request reviews for the following change? > > > > > > Code change: > > > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/ > > > (I initially created and Martin enhanced so much) > > > > > > This change follows the discussion started from this mail. > > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > > April/018960.html > > > > > > Description: > > > This change provides relaxed compare-and-exchange by introducing > > > similar semantics of C++ atomic memory operators, enum memory_order. > > > As described in atomic_linux_ppc.inline.hpp, the current implementation of > > > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for > > > general purposes because twice calls of sync before and after cmpxchg will > > > provide strict consistency. However, they sometimes cause overheads > > > because > > > sync instructions are very expensive in the current POWER chip design. > > > In addition, for the other platforms, such as aarch64, this strict > > > semantics > > > may cause some overheads (according to the Andrew's mail). > > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > > April/019073.html > > > > > > With this change, callers can explicitly specify constraints of memory > > > ordering > > > for cmpxchg with an additional parameter, memory_order order. > > > > > > typedef enum memory_order { > > > memory_order_relaxed, > > > memory_order_consume, > > > memory_order_acquire, > > > memory_order_release, > > > memory_order_acq_rel, > > > memory_order_seq_cst > > > } memory_order; > > > > > > Because the default value of the parameter is memory_order_seq_cst, > > > existing codes can use the same semantics of cmpxchg without any > > > modification. The relaxed cmpxchg is implemented only on ppc > > > in this changeset. Therefore, the behavior on the other platforms will > > > not be changed with this changeset. > > > > > > In addition, with the new parameter of cmpxchg, this change improves > > > performance of copy_to_survivor in the parallel GC. > > > copy_to_survivor changes forward pointers by using cmpxchg. This > > > operation doesn't require any sync instructions. A pointer is changed > > > at most once in a GC and when cmpxchg fails, the latest pointer is > > > available for the caller. cas_set_mark and cas_forward_to are extended > > > with an additional memory_order parameter as cmpxchg and copy_to_survivor > > > uses memory_order_relaxed to modify the forward pointers. > > > > > > Summary of source code changes: > > > > > > * src/share/vm/runtime/atomic.hpp > > > - Defines enum memory_order and adds a parameter to cmpxchg. > > > > > > * src/share/vm/runtime/atomic.cpp > > > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp > > > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > > > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp > > > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp > > > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp > > > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp > > > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp > > > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp > > > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp > > > - Added a parameter for each cmpxchg function to follow > > > the change of atomic.hpp. Their implementations are not changed. > > > > > > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp > > > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > > - Added a parameter for each cmpxchg function to follow > > > the change of atomic.hpp. In addition, implementations > > > are changed corresponding to the specified memory_order. > > > > > > * src/share/vm/oops/oop.hpp > > > * src/share/vm/oops/oop.inline.hpp > > > - Add a memory_order parameter to use relaxed cmpxchg in > > > cas_set_mark and cas_forward_to. > > > > > > * src/share/vm/gc/parallel/psPromotionManager.cpp > > > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp > > > > > > Martin tested this changeset on linuxx86_64, linuxppc64le and > > > darwinintel64. > > > Though more time is needed to test on the other platform, we would like to > > > ask > > > reviews and start discussion on this changeset. > > > I also tested this changeset with SPECjbb2013 and confirmed that gc pause > > > time > > > is reduced. > > > > > > Regards, > > > Hiroshi > > > ----------------------- > > > Hiroshi Horii, Ph.D. > > > IBM Research - Tokyo > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From volker.simonis at gmail.com Wed Apr 27 14:35:04 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 27 Apr 2016 16:35:04 +0200 Subject: RFR(S): 8155236: AIX: fix detection of Xrender extension Message-ID: Hi, can somebody please review this AIX-only fix which I've found in one of my old patch queues :) http://cr.openjdk.java.net/~simonis/webrevs/2016/8155236/ https://bugs.openjdk.java.net/browse/JDK-8155236 On AIX we have to use a special syntax for the dlopen() string argument because the shared libraries are packed in multi-architecture archives. We first try to load the system default libXrender which is contained in the 'X11.base.lib' fileset starting with AIX 6.1. If the latter wasn't successful, we also try to load the version under /opt/freeware. This may be downloaded from the "AIX Toolbox for Linux Applications" even for AIX 5.3. Thank you and best regards, Volker From martin.doerr at sap.com Wed Apr 27 14:46:18 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 27 Apr 2016 14:46:18 +0000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <201604271338.u3RDcRUd029939@d19av06.sagamino.japan.ibm.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <571A1FA3.9030006@oracle.com> <201604250709.u3P79lDi013336@d19av08.sagamino.japan.ibm.com> <201604271338.u3RDcRUd029939@d19av06.sagamino.japan.ibm.com> Message-ID: <03ca7f623bee439885e9bdeaca5ab80b@DEWDFE13DE14.global.corp.sap> Hi Hiroshi, > I think, to use the enum and semantics of C++11, callers of cmpxchg need to call > memory-barrier after cmpxchg when all of updates in the other processes must be > available for the following instructions of the cmpxchg. Correct? I think the problem is that our current C++ code doesn?t use seq_cst semantics for volatile loads. The sync at the end of the cmpxchg is only needed if a volatile load is following which is not preceded by a sync instruction. As long as this is the case, I think we should keep the sync at the end of cmpxchg. > Do you mean that a new cmpxchg with relaxed semantics will be added and used in > the GC change? Or, after the discussion of the new cmpxchg interface, will be > the discussion of the GC change started? The second. Best regards, Martin From: Hiroshi H Horii [mailto:HORII at jp.ibm.com] Sent: Mittwoch, 27. April 2016 05:34 To: Doerr, Martin Cc: David Holmes ; Lindenmaier, Goetz ; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Tim Ellison ; Volker Simonis Subject: RE: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 Hi Martin, > I think we shouldn?t better use an own enum (e.g. like AccessKind in > library_call.cpp). > Otherwise we?ll get trouble when we switch to C++11. Would you agree? I agree. I think, to use the enum and semantics of C++11, callers of cmpxchg need to call memory-barrier after cmpxchg when all of updates in the other processes must be available for the following instructions of the cmpxchg. Correct? > Would it be better to split this bug into 2 and discuss the cmpxchg > interface change on the runtime list and the GC change on the gc list? Do you mean that a new cmpxchg with relaxed semantics will be added and used in the GC change? Or, after the discussion of the new cmpxchg interface, will be the discussion of the GC change started? Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo "Doerr, Martin" > wrote on 04/25/2016 19:25:15: > From: "Doerr, Martin" > > To: Hiroshi H Horii/Japan/IBM at IBMJP, David Holmes > > Cc: "hotspot-gc-dev at openjdk.java.net" dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" > >, "ppc-aix-port- > dev at openjdk.java.net" >, Tim > Ellison >, Volker Simonis > >, "Lindenmaier, Goetz" > > Date: 04/25/2016 19:26 > Subject: RE: RFR(M): 8154736: enhancement of cmpxchg and > copy_to_survivor for ppc64 > > Hi David and Hiroshi, > > thank you very much for this interesting question and analysis. > > I think we shouldn?t better use an own enum (e.g. like AccessKind in > library_call.cpp). > Otherwise we?ll get trouble when we switch to C++11. Would you agree? > > Would it be better to split this bug into 2 and discuss the cmpxchg > interface change on the runtime list and the GC change on the gc list? > > Best regards, > Martin > > From: Hiroshi H Horii [mailto:HORII at jp.ibm.com] > Sent: Montag, 25. April 2016 09:10 > To: David Holmes > > Cc: hotspot-gc-dev at openjdk.java.net; hotspot-runtime- > dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Tim Ellison > >; Volker Simonis >; > Doerr, Martin >; Lindenmaier, Goetz > > > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > copy_to_survivor for ppc64 > > Hi David, > > Thank you for your comments and questions. > > > 1. Are the current cmpxchg semantics exactly the same as > > memory_order_seq_cst? > > This is very good question.. > > I guess, cmpxchg needs a more conservative constraint for memory ordering > than C++11, to add sync after a compare-and-exchange operation. > > Could someone give comments or thoughts? > > memory_order_seq_cst is defined as > "Any operation with this memory order is both an acquire operation and > a release operation, plus a single total order exists in which > all threads > observe all modifications (see below) in the same order." > (http://en.cppreference.com/w/cpp/atomic/memory_order) > > In my environment, g++ and xlc generate following assemblies on ppc64le. > (interestingly, they generates the same assemblies for any memory_order) > > g++ (4.9.2) > 100008a4: ac 04 00 7c sync > 100008a8: 28 50 20 7d lwarx r9,0,r10 > 100008ac: 00 18 09 7c cmpw r9,r3 > 100008b0: 0c 00 c2 40 bne- 100008bc > 100008b4: 2d 51 80 7c stwcx. r4,0,r10 > 100008b8: f0 ff c2 40 bne- 100008a8 > 100008bc: 2c 01 00 4c isync > > xlc (13.1.3) > 10000888: ac 04 00 7c sync > 1000088c: 28 28 c0 7c lwarx r6,0,r5 > 10000890: 40 00 26 7c cmpld r6,r0 > 10000894: 0c 00 82 40 bne 100008a0 > 10000898: 2d 29 80 7c stwcx. r4,0,r5 > 1000089c: f0 ff e2 40 bne+ 1000088c > 100008a0: 2c 01 00 4c isync > > On the other hand, the current OpenJDK generates following assemblies. > > 508: ac 04 00 7c sync > 50c: 00 00 5c e9 ld r10,0(r28) > 510: 00 50 3b 7c cmpd r27,r10 > 514: 1c 00 c2 40 bne- 530 > 518: a8 40 5c 7d ldarx r10,r28,r8 > 51c: 00 50 3b 7c cmpd r27,r10 > 520: 10 00 c2 40 bne- 530 > 524: ad 41 3c 7d stdcx. r9,r28,r8 > 528: f0 ff c2 40 bne- 518 > 52c: ac 04 00 7c sync > 530: 00 50 bb 7f ... > > Though we can ignore 50c-514 (because they are a duplicated guard condition), > the last sync instruction (52c) makes cmpxchg more strict than > memory_order_seq_cst. > > In some cases, the last sync is necessary when this thread must be > able to read > all of the changes in the other threads while executing from 508 to 530 > (that processes compare-and-exchange). > > > 2. Has there been a discussion already, establishing that the modified > > GC code can indeed use memory_order_relaxed? Otherwise who is > > postulating that and based on what evidence? > > Volker and his colleagues have investigated the current GC codes > according to this. > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > April/019079.html > However, I believe, we need comments of other GC experts to change > the shared codes. > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > David Holmes > wrote on 04/22/2016 21:57:07: > > > From: David Holmes > > > To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime- > > dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net > > Cc: Tim Ellison >, ppc-aix-port-dev at openjdk.java.net > > Date: 04/22/2016 21:58 > > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > > copy_to_survivor for ppc64 > > > > Hi Hiroshi, > > > > Two initial questions: > > > > 1. Are the current cmpxchg semantics exactly the same as > > memory_order_seq_cst? > > > > 2. Has there been a discussion already, establishing that the modified > > GC code can indeed use memory_order_relaxed? Otherwise who is > > postulating that and based on what evidence? > > > > Missing memory barriers have caused very difficult to track down bugs in > > the past - very rare race conditions. So any relaxation here has to be > > done with extreme confidence. > > > > Thanks, > > David > > > > On 22/04/2016 10:28 PM, Hiroshi H Horii wrote: > > > Dear all: > > > > > > Can I please request reviews for the following change? > > > > > > Code change: > > > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/ > > > (I initially created and Martin enhanced so much) > > > > > > This change follows the discussion started from this mail. > > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > > April/018960.html > > > > > > Description: > > > This change provides relaxed compare-and-exchange by introducing > > > similar semantics of C++ atomic memory operators, enum memory_order. > > > As described in atomic_linux_ppc.inline.hpp, the current implementation of > > > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for > > > general purposes because twice calls of sync before and after cmpxchg will > > > provide strict consistency. However, they sometimes cause overheads > > > because > > > sync instructions are very expensive in the current POWER chip design. > > > In addition, for the other platforms, such as aarch64, this strict > > > semantics > > > may cause some overheads (according to the Andrew's mail). > > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > > April/019073.html > > > > > > With this change, callers can explicitly specify constraints of memory > > > ordering > > > for cmpxchg with an additional parameter, memory_order order. > > > > > > typedef enum memory_order { > > > memory_order_relaxed, > > > memory_order_consume, > > > memory_order_acquire, > > > memory_order_release, > > > memory_order_acq_rel, > > > memory_order_seq_cst > > > } memory_order; > > > > > > Because the default value of the parameter is memory_order_seq_cst, > > > existing codes can use the same semantics of cmpxchg without any > > > modification. The relaxed cmpxchg is implemented only on ppc > > > in this changeset. Therefore, the behavior on the other platforms will > > > not be changed with this changeset. > > > > > > In addition, with the new parameter of cmpxchg, this change improves > > > performance of copy_to_survivor in the parallel GC. > > > copy_to_survivor changes forward pointers by using cmpxchg. This > > > operation doesn't require any sync instructions. A pointer is changed > > > at most once in a GC and when cmpxchg fails, the latest pointer is > > > available for the caller. cas_set_mark and cas_forward_to are extended > > > with an additional memory_order parameter as cmpxchg and copy_to_survivor > > > uses memory_order_relaxed to modify the forward pointers. > > > > > > Summary of source code changes: > > > > > > * src/share/vm/runtime/atomic.hpp > > > - Defines enum memory_order and adds a parameter to cmpxchg. > > > > > > * src/share/vm/runtime/atomic.cpp > > > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp > > > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > > > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp > > > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp > > > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp > > > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp > > > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp > > > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp > > > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp > > > - Added a parameter for each cmpxchg function to follow > > > the change of atomic.hpp. Their implementations are not changed. > > > > > > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp > > > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > > - Added a parameter for each cmpxchg function to follow > > > the change of atomic.hpp. In addition, implementations > > > are changed corresponding to the specified memory_order. > > > > > > * src/share/vm/oops/oop.hpp > > > * src/share/vm/oops/oop.inline.hpp > > > - Add a memory_order parameter to use relaxed cmpxchg in > > > cas_set_mark and cas_forward_to. > > > > > > * src/share/vm/gc/parallel/psPromotionManager.cpp > > > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp > > > > > > Martin tested this changeset on linuxx86_64, linuxppc64le and > > > darwinintel64. > > > Though more time is needed to test on the other platform, we would like to > > > ask > > > reviews and start discussion on this changeset. > > > I also tested this changeset with SPECjbb2013 and confirmed that gc pause > > > time > > > is reduced. > > > > > > Regards, > > > Hiroshi > > > ----------------------- > > > Hiroshi Horii, Ph.D. > > > IBM Research - Tokyo > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tony.reix at atos.net Wed Apr 27 14:53:09 2016 From: tony.reix at atos.net (REIX, Tony) Date: Wed, 27 Apr 2016 14:53:09 +0000 Subject: RFR(S): 8155236: AIX: fix detection of Xrender extension In-Reply-To: References: Message-ID: <5720D220.6020906@atos.net> Hi Volker, I'm managing the BullFreeware web-site, which provides LPPs for AIX. http://bullfreeware.com/toolbox.php Notice that AIX 5.3 is no more maintained. I'm interested in having an idea about which LLPs the port of OpenJDK on AIX is using. Do you link with libX11 ? GTK+ ? Glib2 ? other LPP packages ? Thx/Regards, Tony Reix Le 27/04/2016 16:35, Volker Simonis a ?crit : Hi, can somebody please review this AIX-only fix which I've found in one of my old patch queues :) http://cr.openjdk.java.net/~simonis/webrevs/2016/8155236/ https://bugs.openjdk.java.net/browse/JDK-8155236 On AIX we have to use a special syntax for the dlopen() string argument because the shared libraries are packed in multi-architecture archives. We first try to load the system default libXrender which is contained in the 'X11.base.lib' fileset starting with AIX 6.1. If the latter wasn't successful, we also try to load the version under /opt/freeware. This may be downloaded from the "AIX Toolbox for Linux Applications" even for AIX 5.3. Thank you and best regards, Volker -------------- next part -------------- An HTML attachment was scrubbed... URL: From volker.simonis at gmail.com Wed Apr 27 15:29:00 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 27 Apr 2016 17:29:00 +0200 Subject: RFR(S): 8155236: AIX: fix detection of Xrender extension In-Reply-To: <5720D220.6020906@atos.net> References: <5720D220.6020906@atos.net> Message-ID: Hi Tony, the OpenJDK build currently only depends on original IBM AIX packages or packages from the "AIX Toolbox for Linux Applications" (http://www-03.ibm.com/systems/power/software/aix/linux/toolbox/download.html) These basically are: freetype2-2.3.9-1.aix5.2.ppc.rpm freetype2-devel-2.3.9-1.aix5.2.ppc.rpm fontconfig-2.4.2-1.aix5.2.ppc.rpm zlib-devel-1.2.3-4.aix5.2.ppc.rpm pkg-config-0.19-6.aix5.2.ppc.rpm http://hg.openjdk.java.net/ppc-aix-port/jdk7u/raw-file/tip/README-ppc.html contains some outdated build instructions for OpenJDK 7 on AIX, but the dependencies should be still the same. Notice that OpenJDK currently only builds with XLC on AIX. Regards, Volker On Wed, Apr 27, 2016 at 4:53 PM, REIX, Tony wrote: > Hi Volker, > > I'm managing the BullFreeware web-site, which provides LPPs for AIX. > http://bullfreeware.com/toolbox.php > > Notice that AIX 5.3 is no more maintained. > > I'm interested in having an idea about which LLPs the port of OpenJDK on AIX > is using. > Do you link with libX11 ? GTK+ ? Glib2 ? other LPP packages ? > > Thx/Regards, > > Tony Reix > > Le 27/04/2016 16:35, Volker Simonis a ?crit : > > Hi, > > can somebody please review this AIX-only fix which I've found in one > of my old patch queues :) > > http://cr.openjdk.java.net/~simonis/webrevs/2016/8155236/ > https://bugs.openjdk.java.net/browse/JDK-8155236 > > On AIX we have to use a special syntax for the dlopen() string > argument because the shared libraries are packed in multi-architecture > archives. We first try to load the system default libXrender which is > contained in the 'X11.base.lib' fileset starting with AIX 6.1. > > If the latter wasn't successful, we also try to load the version under > /opt/freeware. This may be downloaded from the "AIX Toolbox for Linux > Applications" even for AIX 5.3. > > Thank you and best regards, > Volker > > From mikael.vidstedt at oracle.com Wed Apr 27 15:54:48 2016 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Wed, 27 Apr 2016 08:54:48 -0700 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com> <570C417F.20600@oracle.com> <571FB50E.6090108@oracle.com> Message-ID: <5720E0C8.608@oracle.com> On 4/27/2016 12:24 AM, Thomas St?fe wrote: > Hi Mikael, > > On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt > > wrote: > > > > On 4/12/2016 2:15 AM, Thomas St?fe wrote: >> Hi Mikael, David, >> >> On Tue, Apr 12, 2016 at 2:29 AM, David Holmes >> > wrote: >> >> On 11/04/2016 10:57 AM, David Holmes wrote: >> >> Hi Mikael, >> >> I think we need to be able to answer the question as to >> why the stubbed >> and stubless forms of this code exist to ensure that >> converting all >> platforms to the same form is appropriate. >> >> >> The more I look at this the more the stubs make no sense :) >> AIII a stub is generated when we need runtime code that may >> be different to that which we could write directly for >> compiling at build time - ie to use CPU specific features of >> the actual CPU. But I see nothing here that suggests any such >> usage. >> >> So I agree with removing the stubs. >> >> I'm still going through this but my initial reaction is >> to wonder why we >> don't use the same form of handle_unsafe_access on all >> platforms and >> always pass in npc? (That seems to be the only difference >> in code that >> otherwise seems platform independent.) >> >> >> Futher to this and Thomas's comments I think >> handle_unsafe_access(thread, pc, npc) can be defined in >> shared code (where? not sure). Further, if we always pass in >> npc then we don't need to pass in pc as it is unused (seems >> unused in original code too for sparc). >> >> >> I agree. We commonized ucontext_set_pc for all Posix platforms, >> so we can make a common function "handle_unsafe_access(thread, >> npc)" and inside use os::Posix::ucontext_set_pc to modify the >> context. Then we can get rid of the special handling in the >> signal handlers inside os_aix_ppc.cpp and os_linux_ppc.cpp (for >> both the compiled and the interpreted case). > > There is definitely room for unification and simplification here. > Right now the signal handling code is, sadly, different on all the > different platforms, despite the fact that in many cases it should > be similar or the exact same. That said, as much as a > refactoring/rewrite of the signal handler code is needed, it will > very quickly turn into a much larger effort... > > In this specific case, it would probably make more sense to pass > in the full context to the handle_unsafe_access method and have it > do whatever it feels is necessary to update it. However, a lot of > the signal handler code assumes that a "stub" variable gets set up > and only at the end of the main signal handler function does the > actual context get updated. Changing how that works only for this > specific case is obviously not a good idea, which means it's back > to the full scale refactoring and out of scope for the bug fix. > > So to me the fact that the method prototypes differ depending on > the exact platform is just a reflection of how the contexts > differ. In lack of the full context the handler method needs to > take whatever parts of the context is needed to do it's job. I > could of course change the handler method to only take a single > "next_pc" argument, but to me that feels like putting a key part > of the logic which handles the unsafe access (specifically, the > part which calculates the next pc) in the wrong place - IMHO that > should really be tightly coupled with the rest of the logic needed > to handle an unsafe access (updating the thread state etc.), and > hence I feel that it really belongs in the handle_unsafe_access > method itself. Happy to hear your thoughts, but I hope we can > agree that the suggested fix, even in its current state, is still > significantly better than what is there now. > > > Unless somebody has a better suggestion, I'm going to be moving > the implementations of the handle_unsafe_access methods to > sharedRuntime (instead of stubRoutines) and will send out a new > webrev shortly. > > > I am unhappy with the fact that we factor unsafe handling out for x86 > and sparc but do it inline for ppc. I know that was done before your > change as well but would be happy if that could be improved. I would > prefer either one of: Fully agree - this is an example of the more general problem of logic which is /almost/ the same across different platforms, but which has been effectively copy/pasted and drifted apart over time. > > 1) flatten out the coding into the signal handlers like it is done in > os_linux_ppc.cpp and os_aix_ppc.cpp or > 2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case > > I would actually prefer (1) even though this would multiply the code > out for all os cases into ; we are only talking about 1-2 > lines of additional coding, and it would improve the readability of > the signal handlers. > > But this is only my personal opinion, and I do not have strong > emotions. I agree with you that a full cleanup of the signal coding is > out of scope for this issue. I spent yesterday going back and forth on the various alternatives and the only thing I can say with certainty now is that apart from refactoring the whole thing everything else is ugly... For example, I agree that consistency is an important goal here, but since there's little to no consistency there today it's really hard to make a relevant dent in it. :( Flattening it out is an alternative (and a good one), but that is not something I'm willing to do as part of this change because only flattening this specific case/return will actually add to the inconstency... So ultimately yesterday I chose to do something closer to your alternative 2). Is it still ugly? Yes; lipstick on pig and all of that. Have a look at it and see how you feel about it. I try to keep in mind that what is there today is (more) broken. :) Webrev: http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02/hotspot/webrev/ Incremental from webrev.01: http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02.incr/hotspot/webrev/ Cheers, Mikael > > > Cheers, > Mikael > >> >> >> BTW I found this comment somewhat unfathomable (both now and >> in original code): >> >> + // pc is the instruction which we must emulate >> + // doing a no-op is fine: return garbage from the load >> >> but finally realized that it means that after the load that >> raised the signal the native code proceeds normally but the >> value apparently loaded is just garbage/arbitrary, and the >> only sign something went wrong is the setting of the pending >> unsafe-access-error bit. This would be a potential source of >> bugs I think, except that when we hit the Java level, we >> throw the exception and so never actually "return" the >> garbage value. But it does mean we would have to be careful >> if calling the unsafe routines from native code. >> >> >> I admit I do not understand fully how the >> _special_runtime_exit_condition flag is processed later, at least >> not for all cases: If I have a java method A using >> sun.misc.unsafe, which gets compiled, the sun.misc.unsafe >> intrinsic gets inlined into that method. So, the whole method A >> gets marked as "has unsafe access"? So, any SIGBUS happening >> inside this method - which may be larger than the inlined >> sun.misc.unsafe call - will yield an InternalError? And when is >> the flag checked if that method A is called from another java >> method B? >> >> Sorry if the questions are stupid, I am not a JIT expert, but I >> try to understand how much can happen between the SIGBUS and the >> InternalError getting thrown. > > No questions are stupid here. As you may have seen in the other > thread, I filed JDK-8154592[1] to cover making the handling of the > faults synchronous. Hope that helps. > > > Thank you! > > Kind Regards, Thomas > > Cheers, > Mikael > > [1] https://bugs.openjdk.java.net/browse/JDK-8154592 > > >> >> Thanks, Thomas >> >> Thanks, >> David >> >> >> Thanks, >> David >> >> On 9/04/2016 8:33 AM, Mikael Vidstedt wrote: >> >> >> Please review: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 >> Webrev: >> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ >> >> >> >> >> * Note: this is patch 2 in a set of 3 all aiming to >> clean up and unify >> the unsafe memory getters/setters, along with the >> handling of unsafe >> access errors. The other two issues are: >> >> https://bugs.openjdk.java.net/browse/JDK-8153890 - >> Handle unsafe access >> error as an asynchronous exception >> https://bugs.openjdk.java.net/browse/JDK-8150921 - >> Update Unsafe >> getters/setters to use double-register variants >> >> >> * Summary (copied from the bug description) >> >> >> In certain cases, such as accessing a region of a >> memory mapped file >> which has been truncated on unix-style operating >> systems, a SIGBUS >> signal will be raised and the VM will process it in >> the signal handler. >> >> How the signal is processed differs depending on the >> operating system >> and/or CPU architecture, with two major alternatives: >> >> * "stubless" >> >> Do the necessary thread state updates directly in the >> signal handler, >> and modify the context so that the signal handler >> returns to the place >> where the execution should continue >> >> * Using a stub >> >> Update the context so that when the signal handler >> returns the thread >> will continue execution in a generated stub, which in >> turn will call >> some native code in the VM to update the thread state >> and figure out >> where execution should continue. The stub will then >> jump to that new >> place. >> >> >> It should be noted that the work of updating the >> thread state is very >> small - it's setting a flag or two in the thread >> structure, and figures >> out where the next instruction starts. It should also >> be noted that the >> generated stubs today are broken, because they do not >> preserve all the >> live registers over the call into the VM. There are >> two ways to address >> this: >> >> * Preserve all the necessary registers >> >> This would mean implementing, in macro assembly, the >> necessary logic for >> preserving all the live registers, including, but not >> limited to, >> floating point registers, flag registers, etc. It >> quickly becomes >> obvious that this platform specific and error prone. >> >> * Leverage the fact that the operating system already >> does this as part >> of the signal handling >> >> Do the necessary work in the signal handler instead, >> removing the need >> for the stub alltogether >> >> As mentioned, on some platforms the latter model is >> already in use. It >> is dramatically easier and all platforms should be >> updated to do it the >> same way. >> >> >> * Testing >> >> Just as mentioned in the RFR for JDK-8153890, a new >> test was developed >> to test this code path: >> >> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java >> >> >> In fact, it was when running this test I found the >> register preservation >> issue. JPRT also passes. Much like JDK-8153890 I >> wanted to get some >> feedback on this before running additional tests. >> >> >> Cheers, >> Mikael >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From philip.race at oracle.com Wed Apr 27 16:35:15 2016 From: philip.race at oracle.com (Philip Race) Date: Wed, 27 Apr 2016 09:35:15 -0700 Subject: [OpenJDK 2D-Dev] RFR(S): 8155236: AIX: fix detection of Xrender extension In-Reply-To: References: Message-ID: <5720EA43.7040007@oracle.com> Whilst the syntax is all new to me, since it is contained entirely in #if defined(_AIX) I am OK with this fix. phil. On 4/27/16, 7:35 AM, Volker Simonis wrote: > Hi, > > can somebody please review this AIX-only fix which I've found in one > of my old patch queues :) > > http://cr.openjdk.java.net/~simonis/webrevs/2016/8155236/ > https://bugs.openjdk.java.net/browse/JDK-8155236 > > On AIX we have to use a special syntax for the dlopen() string > argument because the shared libraries are packed in multi-architecture > archives. We first try to load the system default libXrender which is > contained in the 'X11.base.lib' fileset starting with AIX 6.1. > > If the latter wasn't successful, we also try to load the version under > /opt/freeware. This may be downloaded from the "AIX Toolbox for Linux > Applications" even for AIX 5.3. > > Thank you and best regards, > Volker From volker.simonis at gmail.com Wed Apr 27 16:49:43 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 27 Apr 2016 18:49:43 +0200 Subject: [OpenJDK 2D-Dev] RFR(S): 8155236: AIX: fix detection of Xrender extension In-Reply-To: <5720EA43.7040007@oracle.com> References: <5720EA43.7040007@oracle.com> Message-ID: I've checked that it really works :) Thanks, Volker On Wed, Apr 27, 2016 at 6:35 PM, Philip Race wrote: > Whilst the syntax is all new to me, since it is contained > entirely in #if defined(_AIX) I am OK with this fix. > > phil. > > > On 4/27/16, 7:35 AM, Volker Simonis wrote: >> >> Hi, >> >> can somebody please review this AIX-only fix which I've found in one >> of my old patch queues :) >> >> http://cr.openjdk.java.net/~simonis/webrevs/2016/8155236/ >> https://bugs.openjdk.java.net/browse/JDK-8155236 >> >> On AIX we have to use a special syntax for the dlopen() string >> argument because the shared libraries are packed in multi-architecture >> archives. We first try to load the system default libXrender which is >> contained in the 'X11.base.lib' fileset starting with AIX 6.1. >> >> If the latter wasn't successful, we also try to load the version under >> /opt/freeware. This may be downloaded from the "AIX Toolbox for Linux >> Applications" even for AIX 5.3. >> >> Thank you and best regards, >> Volker From thomas.stuefe at gmail.com Thu Apr 28 12:25:53 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 28 Apr 2016 14:25:53 +0200 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: <5720E0C8.608@oracle.com> References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com> <570C417F.20600@oracle.com> <571FB50E.6090108@oracle.com> <5720E0C8.608@oracle.com> Message-ID: Hi Mikael, On Wed, Apr 27, 2016 at 5:54 PM, Mikael Vidstedt wrote: > > > On 4/27/2016 12:24 AM, Thomas St?fe wrote: > > Hi Mikael, > > On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt < > mikael.vidstedt at oracle.com> wrote: > >> >> >> On 4/12/2016 2:15 AM, Thomas St?fe wrote: >> >> Hi Mikael, David, >> >> On Tue, Apr 12, 2016 at 2:29 AM, David Holmes < >> david.holmes at oracle.com> wrote: >> >>> On 11/04/2016 10:57 AM, David Holmes wrote: >>> >>>> Hi Mikael, >>>> >>>> I think we need to be able to answer the question as to why the stubbed >>>> and stubless forms of this code exist to ensure that converting all >>>> platforms to the same form is appropriate. >>>> >>> >>> The more I look at this the more the stubs make no sense :) AIII a stub >>> is generated when we need runtime code that may be different to that which >>> we could write directly for compiling at build time - ie to use CPU >>> specific features of the actual CPU. But I see nothing here that suggests >>> any such usage. >>> >>> So I agree with removing the stubs. >>> >>> I'm still going through this but my initial reaction is to wonder why we >>>> don't use the same form of handle_unsafe_access on all platforms and >>>> always pass in npc? (That seems to be the only difference in code that >>>> otherwise seems platform independent.) >>>> >>> >>> Futher to this and Thomas's comments I think >>> handle_unsafe_access(thread, pc, npc) can be defined in shared code (where? >>> not sure). Further, if we always pass in npc then we don't need to pass in >>> pc as it is unused (seems unused in original code too for sparc). >>> >>> >> I agree. We commonized ucontext_set_pc for all Posix platforms, so we can >> make a common function "handle_unsafe_access(thread, npc)" and inside use >> os::Posix::ucontext_set_pc to modify the context. Then we can get rid of >> the special handling in the signal handlers inside os_aix_ppc.cpp and >> os_linux_ppc.cpp (for both the compiled and the interpreted case). >> >> >> There is definitely room for unification and simplification here. Right >> now the signal handling code is, sadly, different on all the different >> platforms, despite the fact that in many cases it should be similar or the >> exact same. That said, as much as a refactoring/rewrite of the signal >> handler code is needed, it will very quickly turn into a much larger >> effort... >> >> In this specific case, it would probably make more sense to pass in the >> full context to the handle_unsafe_access method and have it do whatever it >> feels is necessary to update it. However, a lot of the signal handler code >> assumes that a "stub" variable gets set up and only at the end of the main >> signal handler function does the actual context get updated. Changing how >> that works only for this specific case is obviously not a good idea, which >> means it's back to the full scale refactoring and out of scope for the bug >> fix. >> >> So to me the fact that the method prototypes differ depending on the >> exact platform is just a reflection of how the contexts differ. In lack of >> the full context the handler method needs to take whatever parts of the >> context is needed to do it's job. I could of course change the handler >> method to only take a single "next_pc" argument, but to me that feels like >> putting a key part of the logic which handles the unsafe access >> (specifically, the part which calculates the next pc) in the wrong place - >> IMHO that should really be tightly coupled with the rest of the logic >> needed to handle an unsafe access (updating the thread state etc.), and >> hence I feel that it really belongs in the handle_unsafe_access method >> itself. Happy to hear your thoughts, but I hope we can agree that the >> suggested fix, even in its current state, is still significantly better >> than what is there now. >> >> >> Unless somebody has a better suggestion, I'm going to be moving the >> implementations of the handle_unsafe_access methods to sharedRuntime >> (instead of stubRoutines) and will send out a new webrev shortly. >> >> > I am unhappy with the fact that we factor unsafe handling out for x86 and > sparc but do it inline for ppc. I know that was done before your change as > well but would be happy if that could be improved. I would prefer either > one of: > > > Fully agree - this is an example of the more general problem of logic > which is /almost/ the same across different platforms, but which has been > effectively copy/pasted and drifted apart over time. > > > 1) flatten out the coding into the signal handlers like it is done in > os_linux_ppc.cpp and os_aix_ppc.cpp or > 2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case > > I would actually prefer (1) even though this would multiply the code out > for all os cases into ; we are only talking about 1-2 lines of > additional coding, and it would improve the readability of the signal > handlers. > > But this is only my personal opinion, and I do not have strong emotions. I > agree with you that a full cleanup of the signal coding is out of scope for > this issue. > > > I spent yesterday going back and forth on the various alternatives and the > only thing I can say with certainty now is that apart from refactoring the > whole thing everything else is ugly... For example, I agree that > consistency is an important goal here, but since there's little to no > consistency there today it's really hard to make a relevant dent in it. :( > > Flattening it out is an alternative (and a good one), but that is not > something I'm willing to do as part of this change because only flattening > this specific case/return will actually add to the inconstency... So > ultimately yesterday I chose to do something closer to your alternative 2). > Is it still ugly? Yes; lipstick on pig and all of that. Have a look at it > and see how you feel about it. I try to keep in mind that what is there > today is (more) broken. :) > > Webrev: > > > http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02/hotspot/webrev/ > > Incremental from webrev.01: > > > http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02.incr/hotspot/webrev/ > > Cheers, > Mikael > > I am fine with the changes as they are in webrev.02. Any further cleanup can be deferred to a later change. I did build your change on AIX, it did build ok. Thanks for taking my input! Kind Regards, Thomas > > > > >> Cheers, >> Mikael >> >> >> >> BTW I found this comment somewhat unfathomable (both now and in original >>> code): >>> >>> + // pc is the instruction which we must emulate >>> + // doing a no-op is fine: return garbage from the load >>> >>> but finally realized that it means that after the load that raised the >>> signal the native code proceeds normally but the value apparently loaded is >>> just garbage/arbitrary, and the only sign something went wrong is the >>> setting of the pending unsafe-access-error bit. This would be a potential >>> source of bugs I think, except that when we hit the Java level, we throw >>> the exception and so never actually "return" the garbage value. But it does >>> mean we would have to be careful if calling the unsafe routines from native >>> code. >>> >>> >> I admit I do not understand fully how the _special_runtime_exit_condition >> flag is processed later, at least not for all cases: If I have a java >> method A using sun.misc.unsafe, which gets compiled, the sun.misc.unsafe >> intrinsic gets inlined into that method. So, the whole method A gets marked >> as "has unsafe access"? So, any SIGBUS happening inside this method - which >> may be larger than the inlined sun.misc.unsafe call - will yield an >> InternalError? And when is the flag checked if that method A is called from >> another java method B? >> >> Sorry if the questions are stupid, I am not a JIT expert, but I try to >> understand how much can happen between the SIGBUS and the InternalError >> getting thrown. >> >> >> No questions are stupid here. As you may have seen in the other thread, I >> filed JDK-8154592[1] to cover making the handling of the faults >> synchronous. Hope that helps. >> >> > Thank you! > > Kind Regards, Thomas > > > >> Cheers, >> Mikael >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8154592 >> >> >> >> Thanks, Thomas >> >> >>> Thanks, >>> David >>> >>> >>> Thanks, >>>> David >>>> >>>> On 9/04/2016 8:33 AM, Mikael Vidstedt wrote: >>>> >>>>> >>>>> Please review: >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 >>>>> Webrev: >>>>> >>>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ >>>>> >>>>> >>>>> >>>>> * Note: this is patch 2 in a set of 3 all aiming to clean up and unify >>>>> the unsafe memory getters/setters, along with the handling of unsafe >>>>> access errors. The other two issues are: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8153890 - Handle unsafe >>>>> access >>>>> error as an asynchronous exception >>>>> https://bugs.openjdk.java.net/browse/JDK-8150921 - Update Unsafe >>>>> getters/setters to use double-register variants >>>>> >>>>> >>>>> * Summary (copied from the bug description) >>>>> >>>>> >>>>> In certain cases, such as accessing a region of a memory mapped file >>>>> which has been truncated on unix-style operating systems, a SIGBUS >>>>> signal will be raised and the VM will process it in the signal handler. >>>>> >>>>> How the signal is processed differs depending on the operating system >>>>> and/or CPU architecture, with two major alternatives: >>>>> >>>>> * "stubless" >>>>> >>>>> Do the necessary thread state updates directly in the signal handler, >>>>> and modify the context so that the signal handler returns to the place >>>>> where the execution should continue >>>>> >>>>> * Using a stub >>>>> >>>>> Update the context so that when the signal handler returns the thread >>>>> will continue execution in a generated stub, which in turn will call >>>>> some native code in the VM to update the thread state and figure out >>>>> where execution should continue. The stub will then jump to that new >>>>> place. >>>>> >>>>> >>>>> It should be noted that the work of updating the thread state is very >>>>> small - it's setting a flag or two in the thread structure, and figures >>>>> out where the next instruction starts. It should also be noted that the >>>>> generated stubs today are broken, because they do not preserve all the >>>>> live registers over the call into the VM. There are two ways to address >>>>> this: >>>>> >>>>> * Preserve all the necessary registers >>>>> >>>>> This would mean implementing, in macro assembly, the necessary logic >>>>> for >>>>> preserving all the live registers, including, but not limited to, >>>>> floating point registers, flag registers, etc. It quickly becomes >>>>> obvious that this platform specific and error prone. >>>>> >>>>> * Leverage the fact that the operating system already does this as part >>>>> of the signal handling >>>>> >>>>> Do the necessary work in the signal handler instead, removing the need >>>>> for the stub alltogether >>>>> >>>>> As mentioned, on some platforms the latter model is already in use. It >>>>> is dramatically easier and all platforms should be updated to do it the >>>>> same way. >>>>> >>>>> >>>>> * Testing >>>>> >>>>> Just as mentioned in the RFR for JDK-8153890, a new test was developed >>>>> to test this code path: >>>>> >>>>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java >>>>> >>>>> In fact, it was when running this test I found the register >>>>> preservation >>>>> issue. JPRT also passes. Much like JDK-8153890 I wanted to get some >>>>> feedback on this before running additional tests. >>>>> >>>>> >>>>> Cheers, >>>>> Mikael >>>>> >>>>> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Thu Apr 28 12:41:57 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 28 Apr 2016 22:41:57 +1000 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: <5720E0C8.608@oracle.com> References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com> <570C417F.20600@oracle.com> <571FB50E.6090108@oracle.com> <5720E0C8.608@oracle.com> Message-ID: <57220515.6080309@oracle.com> Hi Mikael, On 28/04/2016 1:54 AM, Mikael Vidstedt wrote: > > > On 4/27/2016 12:24 AM, Thomas St?fe wrote: >> Hi Mikael, >> >> On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt >> <mikael.vidstedt at oracle.com> wrote: >> >> >> >> On 4/12/2016 2:15 AM, Thomas St?fe wrote: >>> Hi Mikael, David, >>> >>> On Tue, Apr 12, 2016 at 2:29 AM, David Holmes >>> <david.holmes at oracle.com> wrote: >>> >>> On 11/04/2016 10:57 AM, David Holmes wrote: >>> >>> Hi Mikael, >>> >>> I think we need to be able to answer the question as to >>> why the stubbed >>> and stubless forms of this code exist to ensure that >>> converting all >>> platforms to the same form is appropriate. >>> >>> >>> The more I look at this the more the stubs make no sense :) >>> AIII a stub is generated when we need runtime code that may >>> be different to that which we could write directly for >>> compiling at build time - ie to use CPU specific features of >>> the actual CPU. But I see nothing here that suggests any such >>> usage. >>> >>> So I agree with removing the stubs. >>> >>> I'm still going through this but my initial reaction is >>> to wonder why we >>> don't use the same form of handle_unsafe_access on all >>> platforms and >>> always pass in npc? (That seems to be the only difference >>> in code that >>> otherwise seems platform independent.) >>> >>> >>> Futher to this and Thomas's comments I think >>> handle_unsafe_access(thread, pc, npc) can be defined in >>> shared code (where? not sure). Further, if we always pass in >>> npc then we don't need to pass in pc as it is unused (seems >>> unused in original code too for sparc). >>> >>> >>> I agree. We commonized ucontext_set_pc for all Posix platforms, >>> so we can make a common function "handle_unsafe_access(thread, >>> npc)" and inside use os::Posix::ucontext_set_pc to modify the >>> context. Then we can get rid of the special handling in the >>> signal handlers inside os_aix_ppc.cpp and os_linux_ppc.cpp (for >>> both the compiled and the interpreted case). >> >> There is definitely room for unification and simplification here. >> Right now the signal handling code is, sadly, different on all the >> different platforms, despite the fact that in many cases it should >> be similar or the exact same. That said, as much as a >> refactoring/rewrite of the signal handler code is needed, it will >> very quickly turn into a much larger effort... >> >> In this specific case, it would probably make more sense to pass >> in the full context to the handle_unsafe_access method and have it >> do whatever it feels is necessary to update it. However, a lot of >> the signal handler code assumes that a "stub" variable gets set up >> and only at the end of the main signal handler function does the >> actual context get updated. Changing how that works only for this >> specific case is obviously not a good idea, which means it's back >> to the full scale refactoring and out of scope for the bug fix. >> >> So to me the fact that the method prototypes differ depending on >> the exact platform is just a reflection of how the contexts >> differ. In lack of the full context the handler method needs to >> take whatever parts of the context is needed to do it's job. I >> could of course change the handler method to only take a single >> "next_pc" argument, but to me that feels like putting a key part >> of the logic which handles the unsafe access (specifically, the >> part which calculates the next pc) in the wrong place - IMHO that >> should really be tightly coupled with the rest of the logic needed >> to handle an unsafe access (updating the thread state etc.), and >> hence I feel that it really belongs in the handle_unsafe_access >> method itself. Happy to hear your thoughts, but I hope we can >> agree that the suggested fix, even in its current state, is still >> significantly better than what is there now. >> >> >> Unless somebody has a better suggestion, I'm going to be moving >> the implementations of the handle_unsafe_access methods to >> sharedRuntime (instead of stubRoutines) and will send out a new >> webrev shortly. >> >> >> I am unhappy with the fact that we factor unsafe handling out for x86 >> and sparc but do it inline for ppc. I know that was done before your >> change as well but would be happy if that could be improved. I would >> prefer either one of: > > Fully agree - this is an example of the more general problem of logic > which is /almost/ the same across different platforms, but which has > been effectively copy/pasted and drifted apart over time. > >> >> 1) flatten out the coding into the signal handlers like it is done in >> os_linux_ppc.cpp and os_aix_ppc.cpp or >> 2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case >> >> I would actually prefer (1) even though this would multiply the code >> out for all os cases into ; we are only talking about 1-2 >> lines of additional coding, and it would improve the readability of >> the signal handlers. >> >> But this is only my personal opinion, and I do not have strong >> emotions. I agree with you that a full cleanup of the signal coding is >> out of scope for this issue. > > I spent yesterday going back and forth on the various alternatives and > the only thing I can say with certainty now is that apart from > refactoring the whole thing everything else is ugly... For example, I > agree that consistency is an important goal here, but since there's > little to no consistency there today it's really hard to make a relevant > dent in it. :( > > Flattening it out is an alternative (and a good one), but that is not > something I'm willing to do as part of this change because only > flattening this specific case/return will actually add to the > inconstency... So ultimately yesterday I chose to do something closer to > your alternative 2). Is it still ugly? Yes; lipstick on pig and all of > that. Have a look at it and see how you feel about it. I try to keep in > mind that what is there today is (more) broken. :) > > Webrev: > > http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02/hotspot/webrev/ Now I see this in code form I really don't understand why next_pc is passed in, unused and then returned ?? Otherwise in src/share/vm/runtime/sharedRuntime.cpp in the comment block - capitals after periods please :) Stub removal seems fine. Thanks, David > Incremental from webrev.01: > > http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02.incr/hotspot/webrev/ > > Cheers, > Mikael > >> >> >> Cheers, >> Mikael >> >>> >>> >>> BTW I found this comment somewhat unfathomable (both now and >>> in original code): >>> >>> + // pc is the instruction which we must emulate >>> + // doing a no-op is fine: return garbage from the load >>> >>> but finally realized that it means that after the load that >>> raised the signal the native code proceeds normally but the >>> value apparently loaded is just garbage/arbitrary, and the >>> only sign something went wrong is the setting of the pending >>> unsafe-access-error bit. This would be a potential source of >>> bugs I think, except that when we hit the Java level, we >>> throw the exception and so never actually "return" the >>> garbage value. But it does mean we would have to be careful >>> if calling the unsafe routines from native code. >>> >>> >>> I admit I do not understand fully how the >>> _special_runtime_exit_condition flag is processed later, at least >>> not for all cases: If I have a java method A using >>> sun.misc.unsafe, which gets compiled, the sun.misc.unsafe >>> intrinsic gets inlined into that method. So, the whole method A >>> gets marked as "has unsafe access"? So, any SIGBUS happening >>> inside this method - which may be larger than the inlined >>> sun.misc.unsafe call - will yield an InternalError? And when is >>> the flag checked if that method A is called from another java >>> method B? >>> >>> Sorry if the questions are stupid, I am not a JIT expert, but I >>> try to understand how much can happen between the SIGBUS and the >>> InternalError getting thrown. >> >> No questions are stupid here. As you may have seen in the other >> thread, I filed JDK-8154592[1] to cover making the handling of the >> faults synchronous. Hope that helps. >> >> >> Thank you! >> >> Kind Regards, Thomas >> >> Cheers, >> Mikael >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8154592 >> >> >>> >>> Thanks, Thomas >>> >>> Thanks, >>> David >>> >>> >>> Thanks, >>> David >>> >>> On 9/04/2016 8:33 AM, Mikael Vidstedt wrote: >>> >>> >>> Please review: >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 >>> Webrev: >>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ >>> >>> >>> >>> >>> * Note: this is patch 2 in a set of 3 all aiming to >>> clean up and unify >>> the unsafe memory getters/setters, along with the >>> handling of unsafe >>> access errors. The other two issues are: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8153890 - >>> Handle unsafe access >>> error as an asynchronous exception >>> https://bugs.openjdk.java.net/browse/JDK-8150921 - >>> Update Unsafe >>> getters/setters to use double-register variants >>> >>> >>> * Summary (copied from the bug description) >>> >>> >>> In certain cases, such as accessing a region of a >>> memory mapped file >>> which has been truncated on unix-style operating >>> systems, a SIGBUS >>> signal will be raised and the VM will process it in >>> the signal handler. >>> >>> How the signal is processed differs depending on the >>> operating system >>> and/or CPU architecture, with two major alternatives: >>> >>> * "stubless" >>> >>> Do the necessary thread state updates directly in the >>> signal handler, >>> and modify the context so that the signal handler >>> returns to the place >>> where the execution should continue >>> >>> * Using a stub >>> >>> Update the context so that when the signal handler >>> returns the thread >>> will continue execution in a generated stub, which in >>> turn will call >>> some native code in the VM to update the thread state >>> and figure out >>> where execution should continue. The stub will then >>> jump to that new >>> place. >>> >>> >>> It should be noted that the work of updating the >>> thread state is very >>> small - it's setting a flag or two in the thread >>> structure, and figures >>> out where the next instruction starts. It should also >>> be noted that the >>> generated stubs today are broken, because they do not >>> preserve all the >>> live registers over the call into the VM. There are >>> two ways to address >>> this: >>> >>> * Preserve all the necessary registers >>> >>> This would mean implementing, in macro assembly, the >>> necessary logic for >>> preserving all the live registers, including, but not >>> limited to, >>> floating point registers, flag registers, etc. It >>> quickly becomes >>> obvious that this platform specific and error prone. >>> >>> * Leverage the fact that the operating system already >>> does this as part >>> of the signal handling >>> >>> Do the necessary work in the signal handler instead, >>> removing the need >>> for the stub alltogether >>> >>> As mentioned, on some platforms the latter model is >>> already in use. It >>> is dramatically easier and all platforms should be >>> updated to do it the >>> same way. >>> >>> >>> * Testing >>> >>> Just as mentioned in the RFR for JDK-8153890, a new >>> test was developed >>> to test this code path: >>> >>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java >>> >>> >>> In fact, it was when running this test I found the >>> register preservation >>> issue. JPRT also passes. Much like JDK-8153890 I >>> wanted to get some >>> feedback on this before running additional tests. >>> >>> >>> Cheers, >>> Mikael >>> >>> >> >> > From mikael.vidstedt at oracle.com Thu Apr 28 15:44:46 2016 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Thu, 28 Apr 2016 08:44:46 -0700 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: <57220515.6080309@oracle.com> References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com> <570C417F.20600@oracle.com> <571FB50E.6090108@oracle.com> <5720E0C8.608@oracle.com> <57220515.6080309@oracle.com> Message-ID: On 4/28/2016 5:41 AM, David Holmes wrote: > Hi Mikael, > > On 28/04/2016 1:54 AM, Mikael Vidstedt wrote: >> >> >> On 4/27/2016 12:24 AM, Thomas St?fe wrote: >>> Hi Mikael, >>> >>> On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt >>> <mikael.vidstedt at oracle.com> wrote: >>> >>> >>> >>> On 4/12/2016 2:15 AM, Thomas St?fe wrote: >>>> Hi Mikael, David, >>>> >>>> On Tue, Apr 12, 2016 at 2:29 AM, David Holmes >>>> <david.holmes at oracle.com> wrote: >>>> >>>> On 11/04/2016 10:57 AM, David Holmes wrote: >>>> >>>> Hi Mikael, >>>> >>>> I think we need to be able to answer the question as to >>>> why the stubbed >>>> and stubless forms of this code exist to ensure that >>>> converting all >>>> platforms to the same form is appropriate. >>>> >>>> >>>> The more I look at this the more the stubs make no sense :) >>>> AIII a stub is generated when we need runtime code that may >>>> be different to that which we could write directly for >>>> compiling at build time - ie to use CPU specific features of >>>> the actual CPU. But I see nothing here that suggests any such >>>> usage. >>>> >>>> So I agree with removing the stubs. >>>> >>>> I'm still going through this but my initial reaction is >>>> to wonder why we >>>> don't use the same form of handle_unsafe_access on all >>>> platforms and >>>> always pass in npc? (That seems to be the only difference >>>> in code that >>>> otherwise seems platform independent.) >>>> >>>> >>>> Futher to this and Thomas's comments I think >>>> handle_unsafe_access(thread, pc, npc) can be defined in >>>> shared code (where? not sure). Further, if we always pass in >>>> npc then we don't need to pass in pc as it is unused (seems >>>> unused in original code too for sparc). >>>> >>>> >>>> I agree. We commonized ucontext_set_pc for all Posix platforms, >>>> so we can make a common function "handle_unsafe_access(thread, >>>> npc)" and inside use os::Posix::ucontext_set_pc to modify the >>>> context. Then we can get rid of the special handling in the >>>> signal handlers inside os_aix_ppc.cpp and os_linux_ppc.cpp (for >>>> both the compiled and the interpreted case). >>> >>> There is definitely room for unification and simplification here. >>> Right now the signal handling code is, sadly, different on all the >>> different platforms, despite the fact that in many cases it should >>> be similar or the exact same. That said, as much as a >>> refactoring/rewrite of the signal handler code is needed, it will >>> very quickly turn into a much larger effort... >>> >>> In this specific case, it would probably make more sense to pass >>> in the full context to the handle_unsafe_access method and have it >>> do whatever it feels is necessary to update it. However, a lot of >>> the signal handler code assumes that a "stub" variable gets set up >>> and only at the end of the main signal handler function does the >>> actual context get updated. Changing how that works only for this >>> specific case is obviously not a good idea, which means it's back >>> to the full scale refactoring and out of scope for the bug fix. >>> >>> So to me the fact that the method prototypes differ depending on >>> the exact platform is just a reflection of how the contexts >>> differ. In lack of the full context the handler method needs to >>> take whatever parts of the context is needed to do it's job. I >>> could of course change the handler method to only take a single >>> "next_pc" argument, but to me that feels like putting a key part >>> of the logic which handles the unsafe access (specifically, the >>> part which calculates the next pc) in the wrong place - IMHO that >>> should really be tightly coupled with the rest of the logic needed >>> to handle an unsafe access (updating the thread state etc.), and >>> hence I feel that it really belongs in the handle_unsafe_access >>> method itself. Happy to hear your thoughts, but I hope we can >>> agree that the suggested fix, even in its current state, is still >>> significantly better than what is there now. >>> >>> >>> Unless somebody has a better suggestion, I'm going to be moving >>> the implementations of the handle_unsafe_access methods to >>> sharedRuntime (instead of stubRoutines) and will send out a new >>> webrev shortly. >>> >>> >>> I am unhappy with the fact that we factor unsafe handling out for x86 >>> and sparc but do it inline for ppc. I know that was done before your >>> change as well but would be happy if that could be improved. I would >>> prefer either one of: >> >> Fully agree - this is an example of the more general problem of logic >> which is /almost/ the same across different platforms, but which has >> been effectively copy/pasted and drifted apart over time. >> >>> >>> 1) flatten out the coding into the signal handlers like it is done in >>> os_linux_ppc.cpp and os_aix_ppc.cpp or >>> 2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case >>> >>> I would actually prefer (1) even though this would multiply the code >>> out for all os cases into ; we are only talking about 1-2 >>> lines of additional coding, and it would improve the readability of >>> the signal handlers. >>> >>> But this is only my personal opinion, and I do not have strong >>> emotions. I agree with you that a full cleanup of the signal coding is >>> out of scope for this issue. >> >> I spent yesterday going back and forth on the various alternatives and >> the only thing I can say with certainty now is that apart from >> refactoring the whole thing everything else is ugly... For example, I >> agree that consistency is an important goal here, but since there's >> little to no consistency there today it's really hard to make a relevant >> dent in it. :( >> >> Flattening it out is an alternative (and a good one), but that is not >> something I'm willing to do as part of this change because only >> flattening this specific case/return will actually add to the >> inconstency... So ultimately yesterday I chose to do something closer to >> your alternative 2). Is it still ugly? Yes; lipstick on pig and all of >> that. Have a look at it and see how you feel about it. I try to keep in >> mind that what is there today is (more) broken. :) >> >> Webrev: >> >> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02/hotspot/webrev/ >> > > Now I see this in code form I really don't understand why next_pc is > passed in, unused and then returned ?? Given that the full context can't easily be passed in and updated (which I still think is the right, long term way of doing this), I chose to do it this way instead. It is a signal to a caller that *only* calling handle_unsafe_access is not enough, there's more to it in that the context also needs to be updated. I see it as a way to make sure the actual next_pc calculation and update is not forgotten, and make it clear that it goes hand in hand with updating the thread state. It's obviously not perfect, but I do feel like it ever so slightly helps clarify how the access fault needs to be handled. > > Otherwise in src/share/vm/runtime/sharedRuntime.cpp in the comment > block - capitals after periods please :) Fixed, I'll not post a new webrev for it though :) Cheers, Mikael > > Stub removal seems fine. > > Thanks, > David > >> Incremental from webrev.01: >> >> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02.incr/hotspot/webrev/ >> >> >> Cheers, >> Mikael >> >>> >>> >>> Cheers, >>> Mikael >>> >>>> >>>> >>>> BTW I found this comment somewhat unfathomable (both now and >>>> in original code): >>>> >>>> + // pc is the instruction which we must emulate >>>> + // doing a no-op is fine: return garbage from the load >>>> >>>> but finally realized that it means that after the load that >>>> raised the signal the native code proceeds normally but the >>>> value apparently loaded is just garbage/arbitrary, and the >>>> only sign something went wrong is the setting of the pending >>>> unsafe-access-error bit. This would be a potential source of >>>> bugs I think, except that when we hit the Java level, we >>>> throw the exception and so never actually "return" the >>>> garbage value. But it does mean we would have to be careful >>>> if calling the unsafe routines from native code. >>>> >>>> >>>> I admit I do not understand fully how the >>>> _special_runtime_exit_condition flag is processed later, at least >>>> not for all cases: If I have a java method A using >>>> sun.misc.unsafe, which gets compiled, the sun.misc.unsafe >>>> intrinsic gets inlined into that method. So, the whole method A >>>> gets marked as "has unsafe access"? So, any SIGBUS happening >>>> inside this method - which may be larger than the inlined >>>> sun.misc.unsafe call - will yield an InternalError? And when is >>>> the flag checked if that method A is called from another java >>>> method B? >>>> >>>> Sorry if the questions are stupid, I am not a JIT expert, but I >>>> try to understand how much can happen between the SIGBUS and the >>>> InternalError getting thrown. >>> >>> No questions are stupid here. As you may have seen in the other >>> thread, I filed JDK-8154592[1] to cover making the handling of the >>> faults synchronous. Hope that helps. >>> >>> >>> Thank you! >>> >>> Kind Regards, Thomas >>> >>> Cheers, >>> Mikael >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8154592 >>> >>> >>>> >>>> Thanks, Thomas >>>> >>>> Thanks, >>>> David >>>> >>>> >>>> Thanks, >>>> David >>>> >>>> On 9/04/2016 8:33 AM, Mikael Vidstedt wrote: >>>> >>>> >>>> Please review: >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 >>>> Webrev: >>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ >>>> >>>> >>>> >>>> >>>> * Note: this is patch 2 in a set of 3 all aiming to >>>> clean up and unify >>>> the unsafe memory getters/setters, along with the >>>> handling of unsafe >>>> access errors. The other two issues are: >>>> >>>> https://bugs.openjdk.java.net/browse/JDK-8153890 - >>>> Handle unsafe access >>>> error as an asynchronous exception >>>> https://bugs.openjdk.java.net/browse/JDK-8150921 - >>>> Update Unsafe >>>> getters/setters to use double-register variants >>>> >>>> >>>> * Summary (copied from the bug description) >>>> >>>> >>>> In certain cases, such as accessing a region of a >>>> memory mapped file >>>> which has been truncated on unix-style operating >>>> systems, a SIGBUS >>>> signal will be raised and the VM will process it in >>>> the signal handler. >>>> >>>> How the signal is processed differs depending on the >>>> operating system >>>> and/or CPU architecture, with two major alternatives: >>>> >>>> * "stubless" >>>> >>>> Do the necessary thread state updates directly in the >>>> signal handler, >>>> and modify the context so that the signal handler >>>> returns to the place >>>> where the execution should continue >>>> >>>> * Using a stub >>>> >>>> Update the context so that when the signal handler >>>> returns the thread >>>> will continue execution in a generated stub, which in >>>> turn will call >>>> some native code in the VM to update the thread state >>>> and figure out >>>> where execution should continue. The stub will then >>>> jump to that new >>>> place. >>>> >>>> >>>> It should be noted that the work of updating the >>>> thread state is very >>>> small - it's setting a flag or two in the thread >>>> structure, and figures >>>> out where the next instruction starts. It should also >>>> be noted that the >>>> generated stubs today are broken, because they do not >>>> preserve all the >>>> live registers over the call into the VM. There are >>>> two ways to address >>>> this: >>>> >>>> * Preserve all the necessary registers >>>> >>>> This would mean implementing, in macro assembly, the >>>> necessary logic for >>>> preserving all the live registers, including, but not >>>> limited to, >>>> floating point registers, flag registers, etc. It >>>> quickly becomes >>>> obvious that this platform specific and error prone. >>>> >>>> * Leverage the fact that the operating system already >>>> does this as part >>>> of the signal handling >>>> >>>> Do the necessary work in the signal handler instead, >>>> removing the need >>>> for the stub alltogether >>>> >>>> As mentioned, on some platforms the latter model is >>>> already in use. It >>>> is dramatically easier and all platforms should be >>>> updated to do it the >>>> same way. >>>> >>>> >>>> * Testing >>>> >>>> Just as mentioned in the RFR for JDK-8153890, a new >>>> test was developed >>>> to test this code path: >>>> >>>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java >>>> >>>> >>>> >>>> In fact, it was when running this test I found the >>>> register preservation >>>> issue. JPRT also passes. Much like JDK-8153890 I >>>> wanted to get some >>>> feedback on this before running additional tests. >>>> >>>> >>>> Cheers, >>>> Mikael >>>> >>>> >>> >>> >>