From david.holmes at oracle.com Mon May 2 23:04:24 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 3 May 2016 09:04:24 +1000 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com> <570C417F.20600@oracle.com> <571FB50E.6090108@oracle.com> <5720E0C8.608@oracle.com> <57220515.6080309@oracle.com> Message-ID: <71ac3531-8c7b-44dc-23ac-09c9867f8c20@oracle.com> On 29/04/2016 1:44 AM, Mikael Vidstedt wrote: > > > On 4/28/2016 5:41 AM, David Holmes wrote: >> Hi Mikael, >> >> On 28/04/2016 1:54 AM, Mikael Vidstedt wrote: >>> >>> >>> On 4/27/2016 12:24 AM, Thomas St?fe wrote: >>>> Hi Mikael, >>>> >>>> On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt >>>> <mikael.vidstedt at oracle.com> wrote: >>>> >>>> >>>> >>>> On 4/12/2016 2:15 AM, Thomas St?fe wrote: >>>>> Hi Mikael, David, >>>>> >>>>> On Tue, Apr 12, 2016 at 2:29 AM, David Holmes >>>>> <david.holmes at oracle.com> wrote: >>>>> >>>>> On 11/04/2016 10:57 AM, David Holmes wrote: >>>>> >>>>> Hi Mikael, >>>>> >>>>> I think we need to be able to answer the question as to >>>>> why the stubbed >>>>> and stubless forms of this code exist to ensure that >>>>> converting all >>>>> platforms to the same form is appropriate. >>>>> >>>>> >>>>> The more I look at this the more the stubs make no sense :) >>>>> AIII a stub is generated when we need runtime code that may >>>>> be different to that which we could write directly for >>>>> compiling at build time - ie to use CPU specific features of >>>>> the actual CPU. But I see nothing here that suggests any such >>>>> usage. >>>>> >>>>> So I agree with removing the stubs. >>>>> >>>>> I'm still going through this but my initial reaction is >>>>> to wonder why we >>>>> don't use the same form of handle_unsafe_access on all >>>>> platforms and >>>>> always pass in npc? (That seems to be the only difference >>>>> in code that >>>>> otherwise seems platform independent.) >>>>> >>>>> >>>>> Futher to this and Thomas's comments I think >>>>> handle_unsafe_access(thread, pc, npc) can be defined in >>>>> shared code (where? not sure). Further, if we always pass in >>>>> npc then we don't need to pass in pc as it is unused (seems >>>>> unused in original code too for sparc). >>>>> >>>>> >>>>> I agree. We commonized ucontext_set_pc for all Posix platforms, >>>>> so we can make a common function "handle_unsafe_access(thread, >>>>> npc)" and inside use os::Posix::ucontext_set_pc to modify the >>>>> context. Then we can get rid of the special handling in the >>>>> signal handlers inside os_aix_ppc.cpp and os_linux_ppc.cpp (for >>>>> both the compiled and the interpreted case). >>>> >>>> There is definitely room for unification and simplification here. >>>> Right now the signal handling code is, sadly, different on all the >>>> different platforms, despite the fact that in many cases it should >>>> be similar or the exact same. That said, as much as a >>>> refactoring/rewrite of the signal handler code is needed, it will >>>> very quickly turn into a much larger effort... >>>> >>>> In this specific case, it would probably make more sense to pass >>>> in the full context to the handle_unsafe_access method and have it >>>> do whatever it feels is necessary to update it. However, a lot of >>>> the signal handler code assumes that a "stub" variable gets set up >>>> and only at the end of the main signal handler function does the >>>> actual context get updated. Changing how that works only for this >>>> specific case is obviously not a good idea, which means it's back >>>> to the full scale refactoring and out of scope for the bug fix. >>>> >>>> So to me the fact that the method prototypes differ depending on >>>> the exact platform is just a reflection of how the contexts >>>> differ. In lack of the full context the handler method needs to >>>> take whatever parts of the context is needed to do it's job. I >>>> could of course change the handler method to only take a single >>>> "next_pc" argument, but to me that feels like putting a key part >>>> of the logic which handles the unsafe access (specifically, the >>>> part which calculates the next pc) in the wrong place - IMHO that >>>> should really be tightly coupled with the rest of the logic needed >>>> to handle an unsafe access (updating the thread state etc.), and >>>> hence I feel that it really belongs in the handle_unsafe_access >>>> method itself. Happy to hear your thoughts, but I hope we can >>>> agree that the suggested fix, even in its current state, is still >>>> significantly better than what is there now. >>>> >>>> >>>> Unless somebody has a better suggestion, I'm going to be moving >>>> the implementations of the handle_unsafe_access methods to >>>> sharedRuntime (instead of stubRoutines) and will send out a new >>>> webrev shortly. >>>> >>>> >>>> I am unhappy with the fact that we factor unsafe handling out for x86 >>>> and sparc but do it inline for ppc. I know that was done before your >>>> change as well but would be happy if that could be improved. I would >>>> prefer either one of: >>> >>> Fully agree - this is an example of the more general problem of logic >>> which is /almost/ the same across different platforms, but which has >>> been effectively copy/pasted and drifted apart over time. >>> >>>> >>>> 1) flatten out the coding into the signal handlers like it is done in >>>> os_linux_ppc.cpp and os_aix_ppc.cpp or >>>> 2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case >>>> >>>> I would actually prefer (1) even though this would multiply the code >>>> out for all os cases into ; we are only talking about 1-2 >>>> lines of additional coding, and it would improve the readability of >>>> the signal handlers. >>>> >>>> But this is only my personal opinion, and I do not have strong >>>> emotions. I agree with you that a full cleanup of the signal coding is >>>> out of scope for this issue. >>> >>> I spent yesterday going back and forth on the various alternatives and >>> the only thing I can say with certainty now is that apart from >>> refactoring the whole thing everything else is ugly... For example, I >>> agree that consistency is an important goal here, but since there's >>> little to no consistency there today it's really hard to make a relevant >>> dent in it. :( >>> >>> Flattening it out is an alternative (and a good one), but that is not >>> something I'm willing to do as part of this change because only >>> flattening this specific case/return will actually add to the >>> inconstency... So ultimately yesterday I chose to do something closer to >>> your alternative 2). Is it still ugly? Yes; lipstick on pig and all of >>> that. Have a look at it and see how you feel about it. I try to keep in >>> mind that what is there today is (more) broken. :) >>> >>> Webrev: >>> >>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02/hotspot/webrev/ >>> >> >> Now I see this in code form I really don't understand why next_pc is >> passed in, unused and then returned ?? > > Given that the full context can't easily be passed in and updated (which > I still think is the right, long term way of doing this), I chose to do > it this way instead. It is a signal to a caller that *only* calling > handle_unsafe_access is not enough, there's more to it in that the > context also needs to be updated. I see it as a way to make sure the > actual next_pc calculation and update is not forgotten, and make it > clear that it goes hand in hand with updating the thread state. It's > obviously not perfect, but I do feel like it ever so slightly helps > clarify how the access fault needs to be handled. Okay. Thanks, David ----- >> >> Otherwise in src/share/vm/runtime/sharedRuntime.cpp in the comment >> block - capitals after periods please :) > > Fixed, I'll not post a new webrev for it though :) > > Cheers, > Mikael > >> >> Stub removal seems fine. >> >> Thanks, >> David >> >>> Incremental from webrev.01: >>> >>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02.incr/hotspot/webrev/ >>> >>> >>> Cheers, >>> Mikael >>> >>>> >>>> >>>> Cheers, >>>> Mikael >>>> >>>>> >>>>> >>>>> BTW I found this comment somewhat unfathomable (both now and >>>>> in original code): >>>>> >>>>> + // pc is the instruction which we must emulate >>>>> + // doing a no-op is fine: return garbage from the load >>>>> >>>>> but finally realized that it means that after the load that >>>>> raised the signal the native code proceeds normally but the >>>>> value apparently loaded is just garbage/arbitrary, and the >>>>> only sign something went wrong is the setting of the pending >>>>> unsafe-access-error bit. This would be a potential source of >>>>> bugs I think, except that when we hit the Java level, we >>>>> throw the exception and so never actually "return" the >>>>> garbage value. But it does mean we would have to be careful >>>>> if calling the unsafe routines from native code. >>>>> >>>>> >>>>> I admit I do not understand fully how the >>>>> _special_runtime_exit_condition flag is processed later, at least >>>>> not for all cases: If I have a java method A using >>>>> sun.misc.unsafe, which gets compiled, the sun.misc.unsafe >>>>> intrinsic gets inlined into that method. So, the whole method A >>>>> gets marked as "has unsafe access"? So, any SIGBUS happening >>>>> inside this method - which may be larger than the inlined >>>>> sun.misc.unsafe call - will yield an InternalError? And when is >>>>> the flag checked if that method A is called from another java >>>>> method B? >>>>> >>>>> Sorry if the questions are stupid, I am not a JIT expert, but I >>>>> try to understand how much can happen between the SIGBUS and the >>>>> InternalError getting thrown. >>>> >>>> No questions are stupid here. As you may have seen in the other >>>> thread, I filed JDK-8154592[1] to cover making the handling of the >>>> faults synchronous. Hope that helps. >>>> >>>> >>>> Thank you! >>>> >>>> Kind Regards, Thomas >>>> >>>> Cheers, >>>> Mikael >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8154592 >>>> >>>> >>>>> >>>>> Thanks, Thomas >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> On 9/04/2016 8:33 AM, Mikael Vidstedt wrote: >>>>> >>>>> >>>>> Please review: >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153892 >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> * Note: this is patch 2 in a set of 3 all aiming to >>>>> clean up and unify >>>>> the unsafe memory getters/setters, along with the >>>>> handling of unsafe >>>>> access errors. The other two issues are: >>>>> >>>>> https://bugs.openjdk.java.net/browse/JDK-8153890 - >>>>> Handle unsafe access >>>>> error as an asynchronous exception >>>>> https://bugs.openjdk.java.net/browse/JDK-8150921 - >>>>> Update Unsafe >>>>> getters/setters to use double-register variants >>>>> >>>>> >>>>> * Summary (copied from the bug description) >>>>> >>>>> >>>>> In certain cases, such as accessing a region of a >>>>> memory mapped file >>>>> which has been truncated on unix-style operating >>>>> systems, a SIGBUS >>>>> signal will be raised and the VM will process it in >>>>> the signal handler. >>>>> >>>>> How the signal is processed differs depending on the >>>>> operating system >>>>> and/or CPU architecture, with two major alternatives: >>>>> >>>>> * "stubless" >>>>> >>>>> Do the necessary thread state updates directly in the >>>>> signal handler, >>>>> and modify the context so that the signal handler >>>>> returns to the place >>>>> where the execution should continue >>>>> >>>>> * Using a stub >>>>> >>>>> Update the context so that when the signal handler >>>>> returns the thread >>>>> will continue execution in a generated stub, which in >>>>> turn will call >>>>> some native code in the VM to update the thread state >>>>> and figure out >>>>> where execution should continue. The stub will then >>>>> jump to that new >>>>> place. >>>>> >>>>> >>>>> It should be noted that the work of updating the >>>>> thread state is very >>>>> small - it's setting a flag or two in the thread >>>>> structure, and figures >>>>> out where the next instruction starts. It should also >>>>> be noted that the >>>>> generated stubs today are broken, because they do not >>>>> preserve all the >>>>> live registers over the call into the VM. There are >>>>> two ways to address >>>>> this: >>>>> >>>>> * Preserve all the necessary registers >>>>> >>>>> This would mean implementing, in macro assembly, the >>>>> necessary logic for >>>>> preserving all the live registers, including, but not >>>>> limited to, >>>>> floating point registers, flag registers, etc. It >>>>> quickly becomes >>>>> obvious that this platform specific and error prone. >>>>> >>>>> * Leverage the fact that the operating system already >>>>> does this as part >>>>> of the signal handling >>>>> >>>>> Do the necessary work in the signal handler instead, >>>>> removing the need >>>>> for the stub alltogether >>>>> >>>>> As mentioned, on some platforms the latter model is >>>>> already in use. It >>>>> is dramatically easier and all platforms should be >>>>> updated to do it the >>>>> same way. >>>>> >>>>> >>>>> * Testing >>>>> >>>>> Just as mentioned in the RFR for JDK-8153890, a new >>>>> test was developed >>>>> to test this code path: >>>>> >>>>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java >>>>> >>>>> >>>>> >>>>> >>>>> In fact, it was when running this test I found the >>>>> register preservation >>>>> issue. JPRT also passes. Much like JDK-8153890 I >>>>> wanted to get some >>>>> feedback on this before running additional tests. >>>>> >>>>> >>>>> Cheers, >>>>> Mikael >>>>> >>>>> >>>> >>>> >>> > From mikael.vidstedt at oracle.com Tue May 3 04:12:09 2016 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Mon, 2 May 2016 21:12:09 -0700 Subject: RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub In-Reply-To: <71ac3531-8c7b-44dc-23ac-09c9867f8c20@oracle.com> References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com> <570C417F.20600@oracle.com> <571FB50E.6090108@oracle.com> <5720E0C8.608@oracle.com> <57220515.6080309@oracle.com> <71ac3531-8c7b-44dc-23ac-09c9867f8c20@oracle.com> Message-ID: <50fd5d01-2186-8b07-20bf-72ad382eb6d6@oracle.com> On 5/2/2016 4:04 PM, David Holmes wrote: > On 29/04/2016 1:44 AM, Mikael Vidstedt wrote: >> >> >> On 4/28/2016 5:41 AM, David Holmes wrote: >>> Hi Mikael, >>> >>> On 28/04/2016 1:54 AM, Mikael Vidstedt wrote: >>>> >>>> >>>> On 4/27/2016 12:24 AM, Thomas St?fe wrote: >>>>> Hi Mikael, >>>>> >>>>> On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt >>>>> <mikael.vidstedt at oracle.com> >>>>> wrote: >>>>> >>>>> >>>>> >>>>> On 4/12/2016 2:15 AM, Thomas St?fe wrote: >>>>>> Hi Mikael, David, >>>>>> >>>>>> On Tue, Apr 12, 2016 at 2:29 AM, David Holmes >>>>>> <david.holmes at oracle.com> wrote: >>>>>> >>>>>> On 11/04/2016 10:57 AM, David Holmes wrote: >>>>>> >>>>>> Hi Mikael, >>>>>> >>>>>> I think we need to be able to answer the question as to >>>>>> why the stubbed >>>>>> and stubless forms of this code exist to ensure that >>>>>> converting all >>>>>> platforms to the same form is appropriate. >>>>>> >>>>>> >>>>>> The more I look at this the more the stubs make no sense :) >>>>>> AIII a stub is generated when we need runtime code that may >>>>>> be different to that which we could write directly for >>>>>> compiling at build time - ie to use CPU specific features of >>>>>> the actual CPU. But I see nothing here that suggests any >>>>>> such >>>>>> usage. >>>>>> >>>>>> So I agree with removing the stubs. >>>>>> >>>>>> I'm still going through this but my initial reaction is >>>>>> to wonder why we >>>>>> don't use the same form of handle_unsafe_access on all >>>>>> platforms and >>>>>> always pass in npc? (That seems to be the only >>>>>> difference >>>>>> in code that >>>>>> otherwise seems platform independent.) >>>>>> >>>>>> >>>>>> Futher to this and Thomas's comments I think >>>>>> handle_unsafe_access(thread, pc, npc) can be defined in >>>>>> shared code (where? not sure). Further, if we always pass in >>>>>> npc then we don't need to pass in pc as it is unused (seems >>>>>> unused in original code too for sparc). >>>>>> >>>>>> >>>>>> I agree. We commonized ucontext_set_pc for all Posix platforms, >>>>>> so we can make a common function "handle_unsafe_access(thread, >>>>>> npc)" and inside use os::Posix::ucontext_set_pc to modify the >>>>>> context. Then we can get rid of the special handling in the >>>>>> signal handlers inside os_aix_ppc.cpp and os_linux_ppc.cpp (for >>>>>> both the compiled and the interpreted case). >>>>> >>>>> There is definitely room for unification and simplification here. >>>>> Right now the signal handling code is, sadly, different on all >>>>> the >>>>> different platforms, despite the fact that in many cases it >>>>> should >>>>> be similar or the exact same. That said, as much as a >>>>> refactoring/rewrite of the signal handler code is needed, it will >>>>> very quickly turn into a much larger effort... >>>>> >>>>> In this specific case, it would probably make more sense to pass >>>>> in the full context to the handle_unsafe_access method and >>>>> have it >>>>> do whatever it feels is necessary to update it. However, a lot of >>>>> the signal handler code assumes that a "stub" variable gets >>>>> set up >>>>> and only at the end of the main signal handler function does the >>>>> actual context get updated. Changing how that works only for this >>>>> specific case is obviously not a good idea, which means it's back >>>>> to the full scale refactoring and out of scope for the bug fix. >>>>> >>>>> So to me the fact that the method prototypes differ depending on >>>>> the exact platform is just a reflection of how the contexts >>>>> differ. In lack of the full context the handler method needs to >>>>> take whatever parts of the context is needed to do it's job. I >>>>> could of course change the handler method to only take a single >>>>> "next_pc" argument, but to me that feels like putting a key part >>>>> of the logic which handles the unsafe access (specifically, the >>>>> part which calculates the next pc) in the wrong place - IMHO that >>>>> should really be tightly coupled with the rest of the logic >>>>> needed >>>>> to handle an unsafe access (updating the thread state etc.), and >>>>> hence I feel that it really belongs in the handle_unsafe_access >>>>> method itself. Happy to hear your thoughts, but I hope we can >>>>> agree that the suggested fix, even in its current state, is still >>>>> significantly better than what is there now. >>>>> >>>>> >>>>> Unless somebody has a better suggestion, I'm going to be moving >>>>> the implementations of the handle_unsafe_access methods to >>>>> sharedRuntime (instead of stubRoutines) and will send out a new >>>>> webrev shortly. >>>>> >>>>> >>>>> I am unhappy with the fact that we factor unsafe handling out for x86 >>>>> and sparc but do it inline for ppc. I know that was done before your >>>>> change as well but would be happy if that could be improved. I would >>>>> prefer either one of: >>>> >>>> Fully agree - this is an example of the more general problem of logic >>>> which is /almost/ the same across different platforms, but which has >>>> been effectively copy/pasted and drifted apart over time. >>>> >>>>> >>>>> 1) flatten out the coding into the signal handlers like it is done in >>>>> os_linux_ppc.cpp and os_aix_ppc.cpp or >>>>> 2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case >>>>> >>>>> I would actually prefer (1) even though this would multiply the code >>>>> out for all os cases into ; we are only talking about 1-2 >>>>> lines of additional coding, and it would improve the readability of >>>>> the signal handlers. >>>>> >>>>> But this is only my personal opinion, and I do not have strong >>>>> emotions. I agree with you that a full cleanup of the signal >>>>> coding is >>>>> out of scope for this issue. >>>> >>>> I spent yesterday going back and forth on the various alternatives and >>>> the only thing I can say with certainty now is that apart from >>>> refactoring the whole thing everything else is ugly... For example, I >>>> agree that consistency is an important goal here, but since there's >>>> little to no consistency there today it's really hard to make a >>>> relevant >>>> dent in it. :( >>>> >>>> Flattening it out is an alternative (and a good one), but that is not >>>> something I'm willing to do as part of this change because only >>>> flattening this specific case/return will actually add to the >>>> inconstency... So ultimately yesterday I chose to do something >>>> closer to >>>> your alternative 2). Is it still ugly? Yes; lipstick on pig and all of >>>> that. Have a look at it and see how you feel about it. I try to >>>> keep in >>>> mind that what is there today is (more) broken. :) >>>> >>>> Webrev: >>>> >>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02/hotspot/webrev/ >>>> >>>> >>> >>> Now I see this in code form I really don't understand why next_pc is >>> passed in, unused and then returned ?? >> >> Given that the full context can't easily be passed in and updated (which >> I still think is the right, long term way of doing this), I chose to do >> it this way instead. It is a signal to a caller that *only* calling >> handle_unsafe_access is not enough, there's more to it in that the >> context also needs to be updated. I see it as a way to make sure the >> actual next_pc calculation and update is not forgotten, and make it >> clear that it goes hand in hand with updating the thread state. It's >> obviously not perfect, but I do feel like it ever so slightly helps >> clarify how the access fault needs to be handled. > > Okay. > > Thanks, > David > ----- Thomas/David - thank you very much for the feedback and reviews! Cheers, Mikael > > >>> >>> Otherwise in src/share/vm/runtime/sharedRuntime.cpp in the comment >>> block - capitals after periods please :) >> >> Fixed, I'll not post a new webrev for it though :) >> >> Cheers, >> Mikael >> >>> >>> Stub removal seems fine. >>> >>> Thanks, >>> David >>> >>>> Incremental from webrev.01: >>>> >>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02.incr/hotspot/webrev/ >>>> >>>> >>>> >>>> Cheers, >>>> Mikael >>>> >>>>> >>>>> >>>>> Cheers, >>>>> Mikael >>>>> >>>>>> >>>>>> >>>>>> BTW I found this comment somewhat unfathomable (both now and >>>>>> in original code): >>>>>> >>>>>> + // pc is the instruction which we must emulate >>>>>> + // doing a no-op is fine: return garbage from the load >>>>>> >>>>>> but finally realized that it means that after the load that >>>>>> raised the signal the native code proceeds normally but the >>>>>> value apparently loaded is just garbage/arbitrary, and the >>>>>> only sign something went wrong is the setting of the pending >>>>>> unsafe-access-error bit. This would be a potential source of >>>>>> bugs I think, except that when we hit the Java level, we >>>>>> throw the exception and so never actually "return" the >>>>>> garbage value. But it does mean we would have to be careful >>>>>> if calling the unsafe routines from native code. >>>>>> >>>>>> >>>>>> I admit I do not understand fully how the >>>>>> _special_runtime_exit_condition flag is processed later, at >>>>>> least >>>>>> not for all cases: If I have a java method A using >>>>>> sun.misc.unsafe, which gets compiled, the sun.misc.unsafe >>>>>> intrinsic gets inlined into that method. So, the whole method A >>>>>> gets marked as "has unsafe access"? So, any SIGBUS happening >>>>>> inside this method - which may be larger than the inlined >>>>>> sun.misc.unsafe call - will yield an InternalError? And when is >>>>>> the flag checked if that method A is called from another java >>>>>> method B? >>>>>> >>>>>> Sorry if the questions are stupid, I am not a JIT expert, but I >>>>>> try to understand how much can happen between the SIGBUS and the >>>>>> InternalError getting thrown. >>>>> >>>>> No questions are stupid here. As you may have seen in the other >>>>> thread, I filed JDK-8154592[1] to cover making the handling of >>>>> the >>>>> faults synchronous. Hope that helps. >>>>> >>>>> >>>>> Thank you! >>>>> >>>>> Kind Regards, Thomas >>>>> >>>>> Cheers, >>>>> Mikael >>>>> >>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8154592 >>>>> >>>>> >>>>>> >>>>>> Thanks, Thomas >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 9/04/2016 8:33 AM, Mikael Vidstedt wrote: >>>>>> >>>>>> >>>>>> Please review: >>>>>> >>>>>> Bug: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8153892 >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> * Note: this is patch 2 in a set of 3 all aiming to >>>>>> clean up and unify >>>>>> the unsafe memory getters/setters, along with the >>>>>> handling of unsafe >>>>>> access errors. The other two issues are: >>>>>> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8153890 - >>>>>> Handle unsafe access >>>>>> error as an asynchronous exception >>>>>> https://bugs.openjdk.java.net/browse/JDK-8150921 - >>>>>> Update Unsafe >>>>>> getters/setters to use double-register variants >>>>>> >>>>>> >>>>>> * Summary (copied from the bug description) >>>>>> >>>>>> >>>>>> In certain cases, such as accessing a region of a >>>>>> memory mapped file >>>>>> which has been truncated on unix-style operating >>>>>> systems, a SIGBUS >>>>>> signal will be raised and the VM will process it in >>>>>> the signal handler. >>>>>> >>>>>> How the signal is processed differs depending on the >>>>>> operating system >>>>>> and/or CPU architecture, with two major >>>>>> alternatives: >>>>>> >>>>>> * "stubless" >>>>>> >>>>>> Do the necessary thread state updates directly in >>>>>> the >>>>>> signal handler, >>>>>> and modify the context so that the signal handler >>>>>> returns to the place >>>>>> where the execution should continue >>>>>> >>>>>> * Using a stub >>>>>> >>>>>> Update the context so that when the signal handler >>>>>> returns the thread >>>>>> will continue execution in a generated stub, >>>>>> which in >>>>>> turn will call >>>>>> some native code in the VM to update the thread >>>>>> state >>>>>> and figure out >>>>>> where execution should continue. The stub will then >>>>>> jump to that new >>>>>> place. >>>>>> >>>>>> >>>>>> It should be noted that the work of updating the >>>>>> thread state is very >>>>>> small - it's setting a flag or two in the thread >>>>>> structure, and figures >>>>>> out where the next instruction starts. It should >>>>>> also >>>>>> be noted that the >>>>>> generated stubs today are broken, because they do >>>>>> not >>>>>> preserve all the >>>>>> live registers over the call into the VM. There are >>>>>> two ways to address >>>>>> this: >>>>>> >>>>>> * Preserve all the necessary registers >>>>>> >>>>>> This would mean implementing, in macro assembly, the >>>>>> necessary logic for >>>>>> preserving all the live registers, including, but >>>>>> not >>>>>> limited to, >>>>>> floating point registers, flag registers, etc. It >>>>>> quickly becomes >>>>>> obvious that this platform specific and error prone. >>>>>> >>>>>> * Leverage the fact that the operating system >>>>>> already >>>>>> does this as part >>>>>> of the signal handling >>>>>> >>>>>> Do the necessary work in the signal handler instead, >>>>>> removing the need >>>>>> for the stub alltogether >>>>>> >>>>>> As mentioned, on some platforms the latter model is >>>>>> already in use. It >>>>>> is dramatically easier and all platforms should be >>>>>> updated to do it the >>>>>> same way. >>>>>> >>>>>> >>>>>> * Testing >>>>>> >>>>>> Just as mentioned in the RFR for JDK-8153890, a new >>>>>> test was developed >>>>>> to test this code path: >>>>>> >>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> In fact, it was when running this test I found the >>>>>> register preservation >>>>>> issue. JPRT also passes. Much like JDK-8153890 I >>>>>> wanted to get some >>>>>> feedback on this before running additional tests. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Mikael >>>>>> >>>>>> >>>>> >>>>> >>>> >> From david.holmes at oracle.com Wed May 4 05:55:29 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 4 May 2016 15:55:29 +1000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <571A1FA3.9030006@oracle.com> <201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com> Message-ID: <1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com> Hi Hiroshi, Sorry for the delay on getting back to this. On 25/04/2016 5:09 PM, Hiroshi H Horii wrote: > Hi David, > > Thank you for your comments and questions. > >> 1. Are the current cmpxchg semantics exactly the same as >> memory_order_seq_cst? > > This is very good question.. > > I guess, cmpxchg needs a more conservative constraint for memory ordering > than C++11, to add sync after a compare-and-exchange operation. > > Could someone give comments or thoughts? I don't want to comment on the comparison with C++11. What I would prefer to see is an additional memory_order value (such as memory_order_ignored) which is the default for all methods declared to take a memory_order parameter. That way existing implementations are clearly ignoring the memory_order attribute and there is no potential for confusion as to whether the existing implementations equate to memory_order_seq_cst or not. That said, I'm not sure it makes sense to add the memory_order parameter to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark, oopDesc::cas_forward_to, unless those methods can sensibly be called with any value for memory_order - which seems highly unlikely. Perhaps those methods should identify the weakest form of memory_order they support and that should be hard-wired into them? Thanks, David > memory_order_seq_cst is defined as > "Any operation with this memory order is both an acquire operation and > a release operation, plus a single total order exists in which all > threads > observe all modifications (see below) in the same order." > (http://en.cppreference.com/w/cpp/atomic/memory_order) > > In my environment, g++ and xlc generate following assemblies on ppc64le. > (interestingly, they generates the same assemblies for any memory_order) > > g++ (4.9.2) > 100008a4: ac 04 00 7c sync > 100008a8: 28 50 20 7d lwarx r9,0,r10 > 100008ac: 00 18 09 7c cmpw r9,r3 > 100008b0: 0c 00 c2 40 bne- 100008bc > 100008b4: 2d 51 80 7c stwcx. r4,0,r10 > 100008b8: f0 ff c2 40 bne- 100008a8 > 100008bc: 2c 01 00 4c isync > > xlc (13.1.3) > 10000888: ac 04 00 7c sync > 1000088c: 28 28 c0 7c lwarx r6,0,r5 > 10000890: 40 00 26 7c cmpld r6,r0 > 10000894: 0c 00 82 40 bne 100008a0 > 10000898: 2d 29 80 7c stwcx. r4,0,r5 > 1000089c: f0 ff e2 40 bne+ 1000088c > 100008a0: 2c 01 00 4c isync > > On the other hand, the current OpenJDK generates following assemblies. > > 508: ac 04 00 7c sync > 50c: 00 00 5c e9 ld r10,0(r28) > 510: 00 50 3b 7c cmpd r27,r10 > 514: 1c 00 c2 40 bne- 530 > 518: a8 40 5c 7d ldarx r10,r28,r8 > 51c: 00 50 3b 7c cmpd r27,r10 > 520: 10 00 c2 40 bne- 530 > 524: ad 41 3c 7d stdcx. r9,r28,r8 > 528: f0 ff c2 40 bne- 518 > 52c: ac 04 00 7c sync > 530: 00 50 bb 7f ... > > Though we can ignore 50c-514 (because they are a duplicated guard > condition), > the last sync instruction (52c) makes cmpxchg more strict than > memory_order_seq_cst. > > In some cases, the last sync is necessary when this thread must be able > to read > all of the changes in the other threads while executing from 508 to 530 > (that processes compare-and-exchange). > >> 2. Has there been a discussion already, establishing that the modified >> GC code can indeed use memory_order_relaxed? Otherwise who is >> postulating that and based on what evidence? > > Volker and his colleagues have investigated the current GC codes > according to this. > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019079.html > However, I believe, we need comments of other GC experts to change > the shared codes. > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > David Holmes wrote on 04/22/2016 21:57:07: > >> From: David Holmes >> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime- >> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net >> Cc: Tim Ellison , > ppc-aix-port-dev at openjdk.java.net >> Date: 04/22/2016 21:58 >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> copy_to_survivor for ppc64 >> >> Hi Hiroshi, >> >> Two initial questions: >> >> 1. Are the current cmpxchg semantics exactly the same as >> memory_order_seq_cst? >> >> 2. Has there been a discussion already, establishing that the modified >> GC code can indeed use memory_order_relaxed? Otherwise who is >> postulating that and based on what evidence? >> >> Missing memory barriers have caused very difficult to track down bugs in >> the past - very rare race conditions. So any relaxation here has to be >> done with extreme confidence. >> >> Thanks, >> David >> >> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote: >> > Dear all: >> > >> > Can I please request reviews for the following change? >> > >> > Code change: >> > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/ >> > (I initially created and Martin enhanced so much) >> > >> > This change follows the discussion started from this mail. >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >> April/018960.html >> > >> > Description: >> > This change provides relaxed compare-and-exchange by introducing >> > similar semantics of C++ atomic memory operators, enum memory_order. >> > As described in atomic_linux_ppc.inline.hpp, the current > implementation of >> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for >> > general purposes because twice calls of sync before and after > cmpxchg will >> > provide strict consistency. However, they sometimes cause overheads >> > because >> > sync instructions are very expensive in the current POWER chip design. >> > In addition, for the other platforms, such as aarch64, this strict >> > semantics >> > may cause some overheads (according to the Andrew's mail). >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >> April/019073.html >> > >> > With this change, callers can explicitly specify constraints of memory >> > ordering >> > for cmpxchg with an additional parameter, memory_order order. >> > >> > typedef enum memory_order { >> > memory_order_relaxed, >> > memory_order_consume, >> > memory_order_acquire, >> > memory_order_release, >> > memory_order_acq_rel, >> > memory_order_seq_cst >> > } memory_order; >> > >> > Because the default value of the parameter is memory_order_seq_cst, >> > existing codes can use the same semantics of cmpxchg without any >> > modification. The relaxed cmpxchg is implemented only on ppc >> > in this changeset. Therefore, the behavior on the other platforms will >> > not be changed with this changeset. >> > >> > In addition, with the new parameter of cmpxchg, this change improves >> > performance of copy_to_survivor in the parallel GC. >> > copy_to_survivor changes forward pointers by using cmpxchg. This >> > operation doesn't require any sync instructions. A pointer is changed >> > at most once in a GC and when cmpxchg fails, the latest pointer is >> > available for the caller. cas_set_mark and cas_forward_to are extended >> > with an additional memory_order parameter as cmpxchg and > copy_to_survivor >> > uses memory_order_relaxed to modify the forward pointers. >> > >> > Summary of source code changes: >> > >> > * src/share/vm/runtime/atomic.hpp >> > - Defines enum memory_order and adds a parameter to cmpxchg. >> > >> > * src/share/vm/runtime/atomic.cpp >> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp >> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp >> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp >> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp >> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp >> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp >> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp >> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp >> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp >> > - Added a parameter for each cmpxchg function to follow >> > the change of atomic.hpp. Their implementations are not > changed. >> > >> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp >> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >> > - Added a parameter for each cmpxchg function to follow >> > the change of atomic.hpp. In addition, implementations >> > are changed corresponding to the specified memory_order. >> > >> > * src/share/vm/oops/oop.hpp >> > * src/share/vm/oops/oop.inline.hpp >> > - Add a memory_order parameter to use relaxed cmpxchg in >> > cas_set_mark and cas_forward_to. >> > >> > * src/share/vm/gc/parallel/psPromotionManager.cpp >> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp >> > >> > Martin tested this changeset on linuxx86_64, linuxppc64le and >> > darwinintel64. >> > Though more time is needed to test on the other platform, we would > like to >> > ask >> > reviews and start discussion on this changeset. >> > I also tested this changeset with SPECjbb2013 and confirmed that gc > pause >> > time >> > is reduced. >> > >> > Regards, >> > Hiroshi >> > ----------------------- >> > Hiroshi Horii, Ph.D. >> > IBM Research - Tokyo >> > >> > >> > From HORII at jp.ibm.com Fri May 6 10:11:24 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Fri, 6 May 2016 19:11:24 +0900 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <571A1FA3.9030006@oracle.com> <201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com> <1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com> Message-ID: <201605061011.u46ABbAa024898@d19av08.sagamino.japan.ibm.com> Hi David, Thank you for your comments. As Martin suggested me, I would like to separate this proposal to - relaxing memory order of cmpxchg - improvement of copy_to_survivior with relaxed cmpxchg and discuss the former first. Martin thankfully created a new webrev that include a change of cmpxchg. http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/ He has already tested it with AIX, linuxx86_64, linuxppc64le and darwinintel64. (Please tell me if I need to send a new mail for this PFR) > What I would prefer to see is an additional memory_order value (such as > memory_order_ignored) which is the default for all methods declared to > take a memory_order parameter. We added simple enum to specify memory order in atomic.hpp as follows. typedef enum cmpxchg_cmpxchg_memory_order { memory_order_relaxed, memory_order_conservative } cmpxchg_memory_order; All of cmpxchg functions have an argument of cmpxchg_memory_order with a default value memory_order_conservative that uses the same semantics with the existing cmpxchg and requires no change for the existing callers. If you think "memory_order_ignored" is better than "memory_order_conservative", I will be happy to modify this change. (I just thought, "ignored" may resemble "relaxed" and may make people who are familiar with C++11's memory semantics confused. I would like to know thoughts of native speakers.) Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo David Holmes wrote on 05/04/2016 14:55:29: > From: David Holmes > To: Hiroshi H Horii/Japan/IBM at IBMJP > Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime- > dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison > , Volker Simonis , > "Doerr, Martin" , "Lindenmaier, Goetz" > > Date: 05/04/2016 14:57 > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > copy_to_survivor for ppc64 > > Hi Hiroshi, > > Sorry for the delay on getting back to this. > > On 25/04/2016 5:09 PM, Hiroshi H Horii wrote: > > Hi David, > > > > Thank you for your comments and questions. > > > >> 1. Are the current cmpxchg semantics exactly the same as > >> memory_order_seq_cst? > > > > This is very good question.. > > > > I guess, cmpxchg needs a more conservative constraint for memory ordering > > than C++11, to add sync after a compare-and-exchange operation. > > > > Could someone give comments or thoughts? > > I don't want to comment on the comparison with C++11. What I would > prefer to see is an additional memory_order value (such as > memory_order_ignored) which is the default for all methods declared to > take a memory_order parameter. That way existing implementations are > clearly ignoring the memory_order attribute and there is no potential > for confusion as to whether the existing implementations equate to > memory_order_seq_cst or not. > > That said, I'm not sure it makes sense to add the memory_order parameter > to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark, > oopDesc::cas_forward_to, unless those methods can sensibly be called > with any value for memory_order - which seems highly unlikely. Perhaps > those methods should identify the weakest form of memory_order they > support and that should be hard-wired into them? > > Thanks, > David > > > memory_order_seq_cst is defined as > > "Any operation with this memory order is both an acquire operation and > > a release operation, plus a single total order exists in which all > > threads > > observe all modifications (see below) in the same order." > > (http://en.cppreference.com/w/cpp/atomic/memory_order) > > > > In my environment, g++ and xlc generate following assemblies on ppc64le. > > (interestingly, they generates the same assemblies for any memory_order) > > > > g++ (4.9.2) > > 100008a4: ac 04 00 7c sync > > 100008a8: 28 50 20 7d lwarx r9,0,r10 > > 100008ac: 00 18 09 7c cmpw r9,r3 > > 100008b0: 0c 00 c2 40 bne- 100008bc > > 100008b4: 2d 51 80 7c stwcx. r4,0,r10 > > 100008b8: f0 ff c2 40 bne- 100008a8 > > 100008bc: 2c 01 00 4c isync > > > > xlc (13.1.3) > > 10000888: ac 04 00 7c sync > > 1000088c: 28 28 c0 7c lwarx r6,0,r5 > > 10000890: 40 00 26 7c cmpld r6,r0 > > 10000894: 0c 00 82 40 bne 100008a0 > > 10000898: 2d 29 80 7c stwcx. r4,0,r5 > > 1000089c: f0 ff e2 40 bne+ 1000088c > > 100008a0: 2c 01 00 4c isync > > > > On the other hand, the current OpenJDK generates following assemblies. > > > > 508: ac 04 00 7c sync > > 50c: 00 00 5c e9 ld r10,0(r28) > > 510: 00 50 3b 7c cmpd r27,r10 > > 514: 1c 00 c2 40 bne- 530 > > 518: a8 40 5c 7d ldarx r10,r28,r8 > > 51c: 00 50 3b 7c cmpd r27,r10 > > 520: 10 00 c2 40 bne- 530 > > 524: ad 41 3c 7d stdcx. r9,r28,r8 > > 528: f0 ff c2 40 bne- 518 > > 52c: ac 04 00 7c sync > > 530: 00 50 bb 7f ... > > > > Though we can ignore 50c-514 (because they are a duplicated guard > > condition), > > the last sync instruction (52c) makes cmpxchg more strict than > > memory_order_seq_cst. > > > > In some cases, the last sync is necessary when this thread must be able > > to read > > all of the changes in the other threads while executing from 508 to 530 > > (that processes compare-and-exchange). > > > >> 2. Has there been a discussion already, establishing that the modified > >> GC code can indeed use memory_order_relaxed? Otherwise who is > >> postulating that and based on what evidence? > > > > Volker and his colleagues have investigated the current GC codes > > according to this. > > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > April/019079.html > > However, I believe, we need comments of other GC experts to change > > the shared codes. > > > > Regards, > > Hiroshi > > ----------------------- > > Hiroshi Horii, Ph.D. > > IBM Research - Tokyo > > > > > > David Holmes wrote on 04/22/2016 21:57:07: > > > >> From: David Holmes > >> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime- > >> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net > >> Cc: Tim Ellison , > > ppc-aix-port-dev at openjdk.java.net > >> Date: 04/22/2016 21:58 > >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > >> copy_to_survivor for ppc64 > >> > >> Hi Hiroshi, > >> > >> Two initial questions: > >> > >> 1. Are the current cmpxchg semantics exactly the same as > >> memory_order_seq_cst? > >> > >> 2. Has there been a discussion already, establishing that the modified > >> GC code can indeed use memory_order_relaxed? Otherwise who is > >> postulating that and based on what evidence? > >> > >> Missing memory barriers have caused very difficult to track down bugs in > >> the past - very rare race conditions. So any relaxation here has to be > >> done with extreme confidence. > >> > >> Thanks, > >> David > >> > >> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote: > >> > Dear all: > >> > > >> > Can I please request reviews for the following change? > >> > > >> > Code change: > >> > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/ > >> > (I initially created and Martin enhanced so much) > >> > > >> > This change follows the discussion started from this mail. > >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > >> April/018960.html > >> > > >> > Description: > >> > This change provides relaxed compare-and-exchange by introducing > >> > similar semantics of C++ atomic memory operators, enum memory_order. > >> > As described in atomic_linux_ppc.inline.hpp, the current > > implementation of > >> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for > >> > general purposes because twice calls of sync before and after > > cmpxchg will > >> > provide strict consistency. However, they sometimes cause overheads > >> > because > >> > sync instructions are very expensive in the current POWER chip design. > >> > In addition, for the other platforms, such as aarch64, this strict > >> > semantics > >> > may cause some overheads (according to the Andrew's mail). > >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > >> April/019073.html > >> > > >> > With this change, callers can explicitly specify constraints of memory > >> > ordering > >> > for cmpxchg with an additional parameter, memory_order order. > >> > > >> > typedef enum memory_order { > >> > memory_order_relaxed, > >> > memory_order_consume, > >> > memory_order_acquire, > >> > memory_order_release, > >> > memory_order_acq_rel, > >> > memory_order_seq_cst > >> > } memory_order; > >> > > >> > Because the default value of the parameter is memory_order_seq_cst, > >> > existing codes can use the same semantics of cmpxchg without any > >> > modification. The relaxed cmpxchg is implemented only on ppc > >> > in this changeset. Therefore, the behavior on the other platforms will > >> > not be changed with this changeset. > >> > > >> > In addition, with the new parameter of cmpxchg, this change improves > >> > performance of copy_to_survivor in the parallel GC. > >> > copy_to_survivor changes forward pointers by using cmpxchg. This > >> > operation doesn't require any sync instructions. A pointer is changed > >> > at most once in a GC and when cmpxchg fails, the latest pointer is > >> > available for the caller. cas_set_mark and cas_forward_to are extended > >> > with an additional memory_order parameter as cmpxchg and > > copy_to_survivor > >> > uses memory_order_relaxed to modify the forward pointers. > >> > > >> > Summary of source code changes: > >> > > >> > * src/share/vm/runtime/atomic.hpp > >> > - Defines enum memory_order and adds a parameter to cmpxchg. > >> > > >> > * src/share/vm/runtime/atomic.cpp > >> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp > >> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > >> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp > >> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp > >> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp > >> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp > >> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp > >> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp > >> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp > >> > - Added a parameter for each cmpxchg function to follow > >> > the change of atomic.hpp. Their implementations are not > > changed. > >> > > >> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp > >> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > >> > - Added a parameter for each cmpxchg function to follow > >> > the change of atomic.hpp. In addition, implementations > >> > are changed corresponding to the specified memory_order. > >> > > >> > * src/share/vm/oops/oop.hpp > >> > * src/share/vm/oops/oop.inline.hpp > >> > - Add a memory_order parameter to use relaxed cmpxchg in > >> > cas_set_mark and cas_forward_to. > >> > > >> > * src/share/vm/gc/parallel/psPromotionManager.cpp > >> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp > >> > > >> > Martin tested this changeset on linuxx86_64, linuxppc64le and > >> > darwinintel64. > >> > Though more time is needed to test on the other platform, we would > > like to > >> > ask > >> > reviews and start discussion on this changeset. > >> > I also tested this changeset with SPECjbb2013 and confirmed that gc > > pause > >> > time > >> > is reduced. > >> > > >> > Regards, > >> > Hiroshi > >> > ----------------------- > >> > Hiroshi Horii, Ph.D. > >> > IBM Research - Tokyo > >> > > >> > > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Tue May 10 07:34:32 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 May 2016 17:34:32 +1000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <571A1FA3.9030006@oracle.com> <201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com> <1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com> <201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com> Message-ID: <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com> Hi Hiroshi, On 6/05/2016 8:11 PM, Hiroshi H Horii wrote: > Hi David, > > Thank you for your comments. > > As Martin suggested me, I would like to separate this proposal to > - relaxing memory order of cmpxchg > - improvement of copy_to_survivior with relaxed cmpxchg > and discuss the former first. > > Martin thankfully created a new webrev that include a change of cmpxchg. > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/ > He has already tested it with AIX, linuxx86_64, linuxppc64le and > darwinintel64. > (Please tell me if I need to send a new mail for this PFR) Please do as it will be simpler to track that way. >> What I would prefer to see is an additional memory_order value (such as >> memory_order_ignored) which is the default for all methods declared to >> take a memory_order parameter. > > We added simple enum to specify memory order in atomic.hpp as follows. > > typedef enum cmpxchg_cmpxchg_memory_order { > memory_order_relaxed, > memory_order_conservative > } cmpxchg_memory_order; > > All of cmpxchg functions have an argument of cmpxchg_memory_order > with a default value memory_order_conservative that uses the same > semantics with the existing cmpxchg and requires no change for the existing > callers. If you think "memory_order_ignored" is better than > "memory_order_conservative", I will be happy to modify this change. > (I just thought, "ignored" may resemble "relaxed" and may make > people who are familiar with C++11's memory semantics confused. > I would like to know thoughts of native speakers.) That is fine by me. I don't think "ignored" would be confused with "relaxed", but "conservative" is fine. I will run the patch through our internal build system while you prepare the updated RFR. My only concern is "unused argument" warnings from the compiler. :) We are quickly running into a hard deadline with Feature Complete however - possibly less than 24 hours - for hotspot changes. If this doesn't get in in time I will see if I can shepherd it through the approval process. Thanks, David > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > David Holmes wrote on 05/04/2016 14:55:29: > >> From: David Holmes >> To: Hiroshi H Horii/Japan/IBM at IBMJP >> Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime- >> dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison >> , Volker Simonis , >> "Doerr, Martin" , "Lindenmaier, Goetz" >> >> Date: 05/04/2016 14:57 >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> copy_to_survivor for ppc64 >> >> Hi Hiroshi, >> >> Sorry for the delay on getting back to this. >> >> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote: >> > Hi David, >> > >> > Thank you for your comments and questions. >> > >> >> 1. Are the current cmpxchg semantics exactly the same as >> >> memory_order_seq_cst? >> > >> > This is very good question.. >> > >> > I guess, cmpxchg needs a more conservative constraint for memory > ordering >> > than C++11, to add sync after a compare-and-exchange operation. >> > >> > Could someone give comments or thoughts? >> >> I don't want to comment on the comparison with C++11. What I would >> prefer to see is an additional memory_order value (such as >> memory_order_ignored) which is the default for all methods declared to >> take a memory_order parameter. That way existing implementations are >> clearly ignoring the memory_order attribute and there is no potential >> for confusion as to whether the existing implementations equate to >> memory_order_seq_cst or not. >> >> That said, I'm not sure it makes sense to add the memory_order parameter >> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark, >> oopDesc::cas_forward_to, unless those methods can sensibly be called >> with any value for memory_order - which seems highly unlikely. Perhaps >> those methods should identify the weakest form of memory_order they >> support and that should be hard-wired into them? >> >> Thanks, >> David >> >> > memory_order_seq_cst is defined as >> > "Any operation with this memory order is both an acquire > operation and >> > a release operation, plus a single total order exists in which all >> > threads >> > observe all modifications (see below) in the same order." >> > (http://en.cppreference.com/w/cpp/atomic/memory_order) >> > >> > In my environment, g++ and xlc generate following assemblies on ppc64le. >> > (interestingly, they generates the same assemblies for any memory_order) >> > >> > g++ (4.9.2) >> > 100008a4: ac 04 00 7c sync >> > 100008a8: 28 50 20 7d lwarx r9,0,r10 >> > 100008ac: 00 18 09 7c cmpw r9,r3 >> > 100008b0: 0c 00 c2 40 bne- 100008bc >> > 100008b4: 2d 51 80 7c stwcx. r4,0,r10 >> > 100008b8: f0 ff c2 40 bne- 100008a8 >> > 100008bc: 2c 01 00 4c isync >> > >> > xlc (13.1.3) >> > 10000888: ac 04 00 7c sync >> > 1000088c: 28 28 c0 7c lwarx r6,0,r5 >> > 10000890: 40 00 26 7c cmpld r6,r0 >> > 10000894: 0c 00 82 40 bne 100008a0 >> > 10000898: 2d 29 80 7c stwcx. r4,0,r5 >> > 1000089c: f0 ff e2 40 bne+ 1000088c >> > 100008a0: 2c 01 00 4c isync >> > >> > On the other hand, the current OpenJDK generates following assemblies. >> > >> > 508: ac 04 00 7c sync >> > 50c: 00 00 5c e9 ld r10,0(r28) >> > 510: 00 50 3b 7c cmpd r27,r10 >> > 514: 1c 00 c2 40 bne- 530 >> > 518: a8 40 5c 7d ldarx r10,r28,r8 >> > 51c: 00 50 3b 7c cmpd r27,r10 >> > 520: 10 00 c2 40 bne- 530 >> > 524: ad 41 3c 7d stdcx. r9,r28,r8 >> > 528: f0 ff c2 40 bne- 518 >> > 52c: ac 04 00 7c sync >> > 530: 00 50 bb 7f ... >> > >> > Though we can ignore 50c-514 (because they are a duplicated guard >> > condition), >> > the last sync instruction (52c) makes cmpxchg more strict than >> > memory_order_seq_cst. >> > >> > In some cases, the last sync is necessary when this thread must be able >> > to read >> > all of the changes in the other threads while executing from 508 to 530 >> > (that processes compare-and-exchange). >> > >> >> 2. Has there been a discussion already, establishing that the modified >> >> GC code can indeed use memory_order_relaxed? Otherwise who is >> >> postulating that and based on what evidence? >> > >> > Volker and his colleagues have investigated the current GC codes >> > according to this. >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >> April/019079.html >> > However, I believe, we need comments of other GC experts to change >> > the shared codes. >> > >> > Regards, >> > Hiroshi >> > ----------------------- >> > Hiroshi Horii, Ph.D. >> > IBM Research - Tokyo >> > >> > >> > David Holmes wrote on 04/22/2016 21:57:07: >> > >> >> From: David Holmes >> >> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime- >> >> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net >> >> Cc: Tim Ellison , >> > ppc-aix-port-dev at openjdk.java.net >> >> Date: 04/22/2016 21:58 >> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> >> copy_to_survivor for ppc64 >> >> >> >> Hi Hiroshi, >> >> >> >> Two initial questions: >> >> >> >> 1. Are the current cmpxchg semantics exactly the same as >> >> memory_order_seq_cst? >> >> >> >> 2. Has there been a discussion already, establishing that the modified >> >> GC code can indeed use memory_order_relaxed? Otherwise who is >> >> postulating that and based on what evidence? >> >> >> >> Missing memory barriers have caused very difficult to track down > bugs in >> >> the past - very rare race conditions. So any relaxation here has to be >> >> done with extreme confidence. >> >> >> >> Thanks, >> >> David >> >> >> >> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote: >> >> > Dear all: >> >> > >> >> > Can I please request reviews for the following change? >> >> > >> >> > Code change: >> >> > > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/ >> >> > (I initially created and Martin enhanced so much) >> >> > >> >> > This change follows the discussion started from this mail. >> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >> >> April/018960.html >> >> > >> >> > Description: >> >> > This change provides relaxed compare-and-exchange by introducing >> >> > similar semantics of C++ atomic memory operators, enum memory_order. >> >> > As described in atomic_linux_ppc.inline.hpp, the current >> > implementation of >> >> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for >> >> > general purposes because twice calls of sync before and after >> > cmpxchg will >> >> > provide strict consistency. However, they sometimes cause overheads >> >> > because >> >> > sync instructions are very expensive in the current POWER chip > design. >> >> > In addition, for the other platforms, such as aarch64, this strict >> >> > semantics >> >> > may cause some overheads (according to the Andrew's mail). >> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >> >> April/019073.html >> >> > >> >> > With this change, callers can explicitly specify constraints of > memory >> >> > ordering >> >> > for cmpxchg with an additional parameter, memory_order order. >> >> > >> >> > typedef enum memory_order { >> >> > memory_order_relaxed, >> >> > memory_order_consume, >> >> > memory_order_acquire, >> >> > memory_order_release, >> >> > memory_order_acq_rel, >> >> > memory_order_seq_cst >> >> > } memory_order; >> >> > >> >> > Because the default value of the parameter is memory_order_seq_cst, >> >> > existing codes can use the same semantics of cmpxchg without any >> >> > modification. The relaxed cmpxchg is implemented only on ppc >> >> > in this changeset. Therefore, the behavior on the other platforms > will >> >> > not be changed with this changeset. >> >> > >> >> > In addition, with the new parameter of cmpxchg, this change improves >> >> > performance of copy_to_survivor in the parallel GC. >> >> > copy_to_survivor changes forward pointers by using cmpxchg. This >> >> > operation doesn't require any sync instructions. A pointer is > changed >> >> > at most once in a GC and when cmpxchg fails, the latest pointer is >> >> > available for the caller. cas_set_mark and cas_forward_to are > extended >> >> > with an additional memory_order parameter as cmpxchg and >> > copy_to_survivor >> >> > uses memory_order_relaxed to modify the forward pointers. >> >> > >> >> > Summary of source code changes: >> >> > >> >> > * src/share/vm/runtime/atomic.hpp >> >> > - Defines enum memory_order and adds a parameter to cmpxchg. >> >> > >> >> > * src/share/vm/runtime/atomic.cpp >> >> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp >> >> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp >> >> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp >> >> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp >> >> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp >> >> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp >> >> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp >> >> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp >> >> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp >> >> > - Added a parameter for each cmpxchg function to follow >> >> > the change of atomic.hpp. Their implementations are not >> > changed. >> >> > >> >> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp >> >> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >> >> > - Added a parameter for each cmpxchg function to follow >> >> > the change of atomic.hpp. In addition, implementations >> >> > are changed corresponding to the specified memory_order. >> >> > >> >> > * src/share/vm/oops/oop.hpp >> >> > * src/share/vm/oops/oop.inline.hpp >> >> > - Add a memory_order parameter to use relaxed cmpxchg in >> >> > cas_set_mark and cas_forward_to. >> >> > >> >> > * src/share/vm/gc/parallel/psPromotionManager.cpp >> >> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp >> >> > >> >> > Martin tested this changeset on linuxx86_64, linuxppc64le and >> >> > darwinintel64. >> >> > Though more time is needed to test on the other platform, we would >> > like to >> >> > ask >> >> > reviews and start discussion on this changeset. >> >> > I also tested this changeset with SPECjbb2013 and confirmed that gc >> > pause >> >> > time >> >> > is reduced. >> >> > >> >> > Regards, >> >> > Hiroshi >> >> > ----------------------- >> >> > Hiroshi Horii, Ph.D. >> >> > IBM Research - Tokyo >> >> > >> >> > >> >> >> > >> > From david.holmes at oracle.com Tue May 10 09:11:20 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 May 2016 19:11:20 +1000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <571A1FA3.9030006@oracle.com> <201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com> <1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com> <201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com> <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com> Message-ID: <347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com> The fix seems incomplete for solaris: make/Main.gmk:232: recipe for target 'hotspot' failed "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp", line 124: Error: Too many arguments in call to "_Atomic_cmpxchg_long(long, volatile long*, long)". "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp", line 128: Error: Too many arguments in call to "_Atomic_cmpxchg_long(long, volatile long*, long)". David On 10/05/2016 5:34 PM, David Holmes wrote: > Hi Hiroshi, > > On 6/05/2016 8:11 PM, Hiroshi H Horii wrote: >> Hi David, >> >> Thank you for your comments. >> >> As Martin suggested me, I would like to separate this proposal to >> - relaxing memory order of cmpxchg >> - improvement of copy_to_survivior with relaxed cmpxchg >> and discuss the former first. >> >> Martin thankfully created a new webrev that include a change of cmpxchg. >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/ >> He has already tested it with AIX, linuxx86_64, linuxppc64le and >> darwinintel64. >> (Please tell me if I need to send a new mail for this PFR) > > Please do as it will be simpler to track that way. > >>> What I would prefer to see is an additional memory_order value (such as >>> memory_order_ignored) which is the default for all methods declared to >>> take a memory_order parameter. >> >> We added simple enum to specify memory order in atomic.hpp as follows. >> >> typedef enum cmpxchg_cmpxchg_memory_order { >> memory_order_relaxed, >> memory_order_conservative >> } cmpxchg_memory_order; >> >> All of cmpxchg functions have an argument of cmpxchg_memory_order >> with a default value memory_order_conservative that uses the same >> semantics with the existing cmpxchg and requires no change for the >> existing >> callers. If you think "memory_order_ignored" is better than >> "memory_order_conservative", I will be happy to modify this change. >> (I just thought, "ignored" may resemble "relaxed" and may make >> people who are familiar with C++11's memory semantics confused. >> I would like to know thoughts of native speakers.) > > That is fine by me. I don't think "ignored" would be confused with > "relaxed", but "conservative" is fine. > > I will run the patch through our internal build system while you prepare > the updated RFR. My only concern is "unused argument" warnings from the > compiler. :) > > We are quickly running into a hard deadline with Feature Complete > however - possibly less than 24 hours - for hotspot changes. If this > doesn't get in in time I will see if I can shepherd it through the > approval process. > > Thanks, > David > > >> Regards, >> Hiroshi >> ----------------------- >> Hiroshi Horii, Ph.D. >> IBM Research - Tokyo >> >> >> David Holmes wrote on 05/04/2016 14:55:29: >> >>> From: David Holmes >>> To: Hiroshi H Horii/Japan/IBM at IBMJP >>> Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime- >>> dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison >>> , Volker Simonis , >>> "Doerr, Martin" , "Lindenmaier, Goetz" >>> >>> Date: 05/04/2016 14:57 >>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >>> copy_to_survivor for ppc64 >>> >>> Hi Hiroshi, >>> >>> Sorry for the delay on getting back to this. >>> >>> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote: >>> > Hi David, >>> > >>> > Thank you for your comments and questions. >>> > >>> >> 1. Are the current cmpxchg semantics exactly the same as >>> >> memory_order_seq_cst? >>> > >>> > This is very good question.. >>> > >>> > I guess, cmpxchg needs a more conservative constraint for memory >> ordering >>> > than C++11, to add sync after a compare-and-exchange operation. >>> > >>> > Could someone give comments or thoughts? >>> >>> I don't want to comment on the comparison with C++11. What I would >>> prefer to see is an additional memory_order value (such as >>> memory_order_ignored) which is the default for all methods declared to >>> take a memory_order parameter. That way existing implementations are >>> clearly ignoring the memory_order attribute and there is no potential >>> for confusion as to whether the existing implementations equate to >>> memory_order_seq_cst or not. >>> >>> That said, I'm not sure it makes sense to add the memory_order parameter >>> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark, >>> oopDesc::cas_forward_to, unless those methods can sensibly be called >>> with any value for memory_order - which seems highly unlikely. Perhaps >>> those methods should identify the weakest form of memory_order they >>> support and that should be hard-wired into them? >>> >>> Thanks, >>> David >>> >>> > memory_order_seq_cst is defined as >>> > "Any operation with this memory order is both an acquire >> operation and >>> > a release operation, plus a single total order exists in which >>> all >>> > threads >>> > observe all modifications (see below) in the same order." >>> > (http://en.cppreference.com/w/cpp/atomic/memory_order) >>> > >>> > In my environment, g++ and xlc generate following assemblies on >>> ppc64le. >>> > (interestingly, they generates the same assemblies for any >>> memory_order) >>> > >>> > g++ (4.9.2) >>> > 100008a4: ac 04 00 7c sync >>> > 100008a8: 28 50 20 7d lwarx r9,0,r10 >>> > 100008ac: 00 18 09 7c cmpw r9,r3 >>> > 100008b0: 0c 00 c2 40 bne- 100008bc >>> > 100008b4: 2d 51 80 7c stwcx. r4,0,r10 >>> > 100008b8: f0 ff c2 40 bne- 100008a8 >>> > 100008bc: 2c 01 00 4c isync >>> > >>> > xlc (13.1.3) >>> > 10000888: ac 04 00 7c sync >>> > 1000088c: 28 28 c0 7c lwarx r6,0,r5 >>> > 10000890: 40 00 26 7c cmpld r6,r0 >>> > 10000894: 0c 00 82 40 bne 100008a0 >>> > 10000898: 2d 29 80 7c stwcx. r4,0,r5 >>> > 1000089c: f0 ff e2 40 bne+ 1000088c >>> > 100008a0: 2c 01 00 4c isync >>> > >>> > On the other hand, the current OpenJDK generates following assemblies. >>> > >>> > 508: ac 04 00 7c sync >>> > 50c: 00 00 5c e9 ld r10,0(r28) >>> > 510: 00 50 3b 7c cmpd r27,r10 >>> > 514: 1c 00 c2 40 bne- 530 >>> > 518: a8 40 5c 7d ldarx r10,r28,r8 >>> > 51c: 00 50 3b 7c cmpd r27,r10 >>> > 520: 10 00 c2 40 bne- 530 >>> > 524: ad 41 3c 7d stdcx. r9,r28,r8 >>> > 528: f0 ff c2 40 bne- 518 >>> > 52c: ac 04 00 7c sync >>> > 530: 00 50 bb 7f ... >>> > >>> > Though we can ignore 50c-514 (because they are a duplicated guard >>> > condition), >>> > the last sync instruction (52c) makes cmpxchg more strict than >>> > memory_order_seq_cst. >>> > >>> > In some cases, the last sync is necessary when this thread must be >>> able >>> > to read >>> > all of the changes in the other threads while executing from 508 to >>> 530 >>> > (that processes compare-and-exchange). >>> > >>> >> 2. Has there been a discussion already, establishing that the >>> modified >>> >> GC code can indeed use memory_order_relaxed? Otherwise who is >>> >> postulating that and based on what evidence? >>> > >>> > Volker and his colleagues have investigated the current GC codes >>> > according to this. >>> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>> April/019079.html >>> > However, I believe, we need comments of other GC experts to change >>> > the shared codes. >>> > >>> > Regards, >>> > Hiroshi >>> > ----------------------- >>> > Hiroshi Horii, Ph.D. >>> > IBM Research - Tokyo >>> > >>> > >>> > David Holmes wrote on 04/22/2016 21:57:07: >>> > >>> >> From: David Holmes >>> >> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime- >>> >> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net >>> >> Cc: Tim Ellison , >>> > ppc-aix-port-dev at openjdk.java.net >>> >> Date: 04/22/2016 21:58 >>> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >>> >> copy_to_survivor for ppc64 >>> >> >>> >> Hi Hiroshi, >>> >> >>> >> Two initial questions: >>> >> >>> >> 1. Are the current cmpxchg semantics exactly the same as >>> >> memory_order_seq_cst? >>> >> >>> >> 2. Has there been a discussion already, establishing that the >>> modified >>> >> GC code can indeed use memory_order_relaxed? Otherwise who is >>> >> postulating that and based on what evidence? >>> >> >>> >> Missing memory barriers have caused very difficult to track down >> bugs in >>> >> the past - very rare race conditions. So any relaxation here has >>> to be >>> >> done with extreme confidence. >>> >> >>> >> Thanks, >>> >> David >>> >> >>> >> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote: >>> >> > Dear all: >>> >> > >>> >> > Can I please request reviews for the following change? >>> >> > >>> >> > Code change: >>> >> > >> http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/ >>> >> > (I initially created and Martin enhanced so much) >>> >> > >>> >> > This change follows the discussion started from this mail. >>> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>> >> April/018960.html >>> >> > >>> >> > Description: >>> >> > This change provides relaxed compare-and-exchange by introducing >>> >> > similar semantics of C++ atomic memory operators, enum >>> memory_order. >>> >> > As described in atomic_linux_ppc.inline.hpp, the current >>> > implementation of >>> >> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for >>> >> > general purposes because twice calls of sync before and after >>> > cmpxchg will >>> >> > provide strict consistency. However, they sometimes cause overheads >>> >> > because >>> >> > sync instructions are very expensive in the current POWER chip >> design. >>> >> > In addition, for the other platforms, such as aarch64, this strict >>> >> > semantics >>> >> > may cause some overheads (according to the Andrew's mail). >>> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>> >> April/019073.html >>> >> > >>> >> > With this change, callers can explicitly specify constraints of >> memory >>> >> > ordering >>> >> > for cmpxchg with an additional parameter, memory_order order. >>> >> > >>> >> > typedef enum memory_order { >>> >> > memory_order_relaxed, >>> >> > memory_order_consume, >>> >> > memory_order_acquire, >>> >> > memory_order_release, >>> >> > memory_order_acq_rel, >>> >> > memory_order_seq_cst >>> >> > } memory_order; >>> >> > >>> >> > Because the default value of the parameter is memory_order_seq_cst, >>> >> > existing codes can use the same semantics of cmpxchg without any >>> >> > modification. The relaxed cmpxchg is implemented only on ppc >>> >> > in this changeset. Therefore, the behavior on the other platforms >> will >>> >> > not be changed with this changeset. >>> >> > >>> >> > In addition, with the new parameter of cmpxchg, this change >>> improves >>> >> > performance of copy_to_survivor in the parallel GC. >>> >> > copy_to_survivor changes forward pointers by using cmpxchg. This >>> >> > operation doesn't require any sync instructions. A pointer is >> changed >>> >> > at most once in a GC and when cmpxchg fails, the latest pointer is >>> >> > available for the caller. cas_set_mark and cas_forward_to are >> extended >>> >> > with an additional memory_order parameter as cmpxchg and >>> > copy_to_survivor >>> >> > uses memory_order_relaxed to modify the forward pointers. >>> >> > >>> >> > Summary of source code changes: >>> >> > >>> >> > * src/share/vm/runtime/atomic.hpp >>> >> > - Defines enum memory_order and adds a parameter to cmpxchg. >>> >> > >>> >> > * src/share/vm/runtime/atomic.cpp >>> >> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp >>> >> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp >>> >> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp >>> >> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp >>> >> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp >>> >> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp >>> >> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp >>> >> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp >>> >> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp >>> >> > - Added a parameter for each cmpxchg function to follow >>> >> > the change of atomic.hpp. Their implementations are not >>> > changed. >>> >> > >>> >> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp >>> >> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >>> >> > - Added a parameter for each cmpxchg function to follow >>> >> > the change of atomic.hpp. In addition, implementations >>> >> > are changed corresponding to the specified memory_order. >>> >> > >>> >> > * src/share/vm/oops/oop.hpp >>> >> > * src/share/vm/oops/oop.inline.hpp >>> >> > - Add a memory_order parameter to use relaxed cmpxchg in >>> >> > cas_set_mark and cas_forward_to. >>> >> > >>> >> > * src/share/vm/gc/parallel/psPromotionManager.cpp >>> >> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp >>> >> > >>> >> > Martin tested this changeset on linuxx86_64, linuxppc64le and >>> >> > darwinintel64. >>> >> > Though more time is needed to test on the other platform, we would >>> > like to >>> >> > ask >>> >> > reviews and start discussion on this changeset. >>> >> > I also tested this changeset with SPECjbb2013 and confirmed that gc >>> > pause >>> >> > time >>> >> > is reduced. >>> >> > >>> >> > Regards, >>> >> > Hiroshi >>> >> > ----------------------- >>> >> > Hiroshi Horii, Ph.D. >>> >> > IBM Research - Tokyo >>> >> > >>> >> > >>> >> >>> > >>> >> From HORII at jp.ibm.com Tue May 10 09:35:51 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Tue, 10 May 2016 18:35:51 +0900 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <571A1FA3.9030006@oracle.com> <201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com> <1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com> <201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com> <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com> Message-ID: <201605100936.u4A9a6IP008871@d19av05.sagamino.japan.ibm.com> Hi David, > > Martin thankfully created a new webrev that include a change of cmpxchg. > > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/ > > He has already tested it with AIX, linuxx86_64, linuxppc64le and > > darwinintel64. > > (Please tell me if I need to send a new mail for this PFR) > > Please do as it will be simpler to track that way. I will send a new PFR for this cmxchg change. > That is fine by me. I don't think "ignored" would be confused with > "relaxed", but "conservative" is fine. Sure. > We are quickly running into a hard deadline with Feature Complete > however - possibly less than 24 hours - for hotspot changes. If this > doesn't get in in time I will see if I can shepherd it through the > approval process. Thanks. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo David Holmes wrote on 05/10/2016 16:34:32: > From: David Holmes > To: Hiroshi H Horii/Japan/IBM at IBMJP > Cc: "Lindenmaier, Goetz" , hotspot-gc- > dev at openjdk.java.net, hotspot-runtime-dev at openjdk.java.net, "Doerr, > Martin" , ppc-aix-port-dev at openjdk.java.net, > Tim Ellison , Volker Simonis > > Date: 05/10/2016 16:35 > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > copy_to_survivor for ppc64 > > Hi Hiroshi, > > On 6/05/2016 8:11 PM, Hiroshi H Horii wrote: > > Hi David, > > > > Thank you for your comments. > > > > As Martin suggested me, I would like to separate this proposal to > > - relaxing memory order of cmpxchg > > - improvement of copy_to_survivior with relaxed cmpxchg > > and discuss the former first. > > > > Martin thankfully created a new webrev that include a change of cmpxchg. > > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/ > > He has already tested it with AIX, linuxx86_64, linuxppc64le and > > darwinintel64. > > (Please tell me if I need to send a new mail for this PFR) > > Please do as it will be simpler to track that way. > > >> What I would prefer to see is an additional memory_order value (such as > >> memory_order_ignored) which is the default for all methods declared to > >> take a memory_order parameter. > > > > We added simple enum to specify memory order in atomic.hpp as follows. > > > > typedef enum cmpxchg_cmpxchg_memory_order { > > memory_order_relaxed, > > memory_order_conservative > > } cmpxchg_memory_order; > > > > All of cmpxchg functions have an argument of cmpxchg_memory_order > > with a default value memory_order_conservative that uses the same > > semantics with the existing cmpxchg and requires no change for the existing > > callers. If you think "memory_order_ignored" is better than > > "memory_order_conservative", I will be happy to modify this change. > > (I just thought, "ignored" may resemble "relaxed" and may make > > people who are familiar with C++11's memory semantics confused. > > I would like to know thoughts of native speakers.) > > That is fine by me. I don't think "ignored" would be confused with > "relaxed", but "conservative" is fine. > > I will run the patch through our internal build system while you prepare > the updated RFR. My only concern is "unused argument" warnings from the > compiler. :) > > We are quickly running into a hard deadline with Feature Complete > however - possibly less than 24 hours - for hotspot changes. If this > doesn't get in in time I will see if I can shepherd it through the > approval process. > > Thanks, > David > > > > Regards, > > Hiroshi > > ----------------------- > > Hiroshi Horii, Ph.D. > > IBM Research - Tokyo > > > > > > David Holmes wrote on 05/04/2016 14:55:29: > > > >> From: David Holmes > >> To: Hiroshi H Horii/Japan/IBM at IBMJP > >> Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime- > >> dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison > >> , Volker Simonis , > >> "Doerr, Martin" , "Lindenmaier, Goetz" > >> > >> Date: 05/04/2016 14:57 > >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > >> copy_to_survivor for ppc64 > >> > >> Hi Hiroshi, > >> > >> Sorry for the delay on getting back to this. > >> > >> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote: > >> > Hi David, > >> > > >> > Thank you for your comments and questions. > >> > > >> >> 1. Are the current cmpxchg semantics exactly the same as > >> >> memory_order_seq_cst? > >> > > >> > This is very good question.. > >> > > >> > I guess, cmpxchg needs a more conservative constraint for memory > > ordering > >> > than C++11, to add sync after a compare-and-exchange operation. > >> > > >> > Could someone give comments or thoughts? > >> > >> I don't want to comment on the comparison with C++11. What I would > >> prefer to see is an additional memory_order value (such as > >> memory_order_ignored) which is the default for all methods declared to > >> take a memory_order parameter. That way existing implementations are > >> clearly ignoring the memory_order attribute and there is no potential > >> for confusion as to whether the existing implementations equate to > >> memory_order_seq_cst or not. > >> > >> That said, I'm not sure it makes sense to add the memory_order parameter > >> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark, > >> oopDesc::cas_forward_to, unless those methods can sensibly be called > >> with any value for memory_order - which seems highly unlikely. Perhaps > >> those methods should identify the weakest form of memory_order they > >> support and that should be hard-wired into them? > >> > >> Thanks, > >> David > >> > >> > memory_order_seq_cst is defined as > >> > "Any operation with this memory order is both an acquire > > operation and > >> > a release operation, plus a single total order exists in which all > >> > threads > >> > observe all modifications (see below) in the same order." > >> > (http://en.cppreference.com/w/cpp/atomic/memory_order) > >> > > >> > In my environment, g++ and xlc generate following assemblies on ppc64le. > >> > (interestingly, they generates the same assemblies for any memory_order) > >> > > >> > g++ (4.9.2) > >> > 100008a4: ac 04 00 7c sync > >> > 100008a8: 28 50 20 7d lwarx r9,0,r10 > >> > 100008ac: 00 18 09 7c cmpw r9,r3 > >> > 100008b0: 0c 00 c2 40 bne- 100008bc > >> > 100008b4: 2d 51 80 7c stwcx. r4,0,r10 > >> > 100008b8: f0 ff c2 40 bne- 100008a8 > >> > 100008bc: 2c 01 00 4c isync > >> > > >> > xlc (13.1.3) > >> > 10000888: ac 04 00 7c sync > >> > 1000088c: 28 28 c0 7c lwarx r6,0,r5 > >> > 10000890: 40 00 26 7c cmpld r6,r0 > >> > 10000894: 0c 00 82 40 bne 100008a0 > >> > 10000898: 2d 29 80 7c stwcx. r4,0,r5 > >> > 1000089c: f0 ff e2 40 bne+ 1000088c > >> > 100008a0: 2c 01 00 4c isync > >> > > >> > On the other hand, the current OpenJDK generates following assemblies. > >> > > >> > 508: ac 04 00 7c sync > >> > 50c: 00 00 5c e9 ld r10,0(r28) > >> > 510: 00 50 3b 7c cmpd r27,r10 > >> > 514: 1c 00 c2 40 bne- 530 > >> > 518: a8 40 5c 7d ldarx r10,r28,r8 > >> > 51c: 00 50 3b 7c cmpd r27,r10 > >> > 520: 10 00 c2 40 bne- 530 > >> > 524: ad 41 3c 7d stdcx. r9,r28,r8 > >> > 528: f0 ff c2 40 bne- 518 > >> > 52c: ac 04 00 7c sync > >> > 530: 00 50 bb 7f ... > >> > > >> > Though we can ignore 50c-514 (because they are a duplicated guard > >> > condition), > >> > the last sync instruction (52c) makes cmpxchg more strict than > >> > memory_order_seq_cst. > >> > > >> > In some cases, the last sync is necessary when this thread must be able > >> > to read > >> > all of the changes in the other threads while executing from 508 to 530 > >> > (that processes compare-and-exchange). > >> > > >> >> 2. Has there been a discussion already, establishing that the modified > >> >> GC code can indeed use memory_order_relaxed? Otherwise who is > >> >> postulating that and based on what evidence? > >> > > >> > Volker and his colleagues have investigated the current GC codes > >> > according to this. > >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > >> April/019079.html > >> > However, I believe, we need comments of other GC experts to change > >> > the shared codes. > >> > > >> > Regards, > >> > Hiroshi > >> > ----------------------- > >> > Hiroshi Horii, Ph.D. > >> > IBM Research - Tokyo > >> > > >> > > >> > David Holmes wrote on 04/22/2016 21:57:07: > >> > > >> >> From: David Holmes > >> >> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime- > >> >> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net > >> >> Cc: Tim Ellison , > >> > ppc-aix-port-dev at openjdk.java.net > >> >> Date: 04/22/2016 21:58 > >> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > >> >> copy_to_survivor for ppc64 > >> >> > >> >> Hi Hiroshi, > >> >> > >> >> Two initial questions: > >> >> > >> >> 1. Are the current cmpxchg semantics exactly the same as > >> >> memory_order_seq_cst? > >> >> > >> >> 2. Has there been a discussion already, establishing that the modified > >> >> GC code can indeed use memory_order_relaxed? Otherwise who is > >> >> postulating that and based on what evidence? > >> >> > >> >> Missing memory barriers have caused very difficult to track down > > bugs in > >> >> the past - very rare race conditions. So any relaxation here has to be > >> >> done with extreme confidence. > >> >> > >> >> Thanks, > >> >> David > >> >> > >> >> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote: > >> >> > Dear all: > >> >> > > >> >> > Can I please request reviews for the following change? > >> >> > > >> >> > Code change: > >> >> > > > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/ > >> >> > (I initially created and Martin enhanced so much) > >> >> > > >> >> > This change follows the discussion started from this mail. > >> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > >> >> April/018960.html > >> >> > > >> >> > Description: > >> >> > This change provides relaxed compare-and-exchange by introducing > >> >> > similar semantics of C++ atomic memory operators, enum memory_order. > >> >> > As described in atomic_linux_ppc.inline.hpp, the current > >> > implementation of > >> >> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for > >> >> > general purposes because twice calls of sync before and after > >> > cmpxchg will > >> >> > provide strict consistency. However, they sometimes cause overheads > >> >> > because > >> >> > sync instructions are very expensive in the current POWER chip > > design. > >> >> > In addition, for the other platforms, such as aarch64, this strict > >> >> > semantics > >> >> > may cause some overheads (according to the Andrew's mail). > >> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > >> >> April/019073.html > >> >> > > >> >> > With this change, callers can explicitly specify constraints of > > memory > >> >> > ordering > >> >> > for cmpxchg with an additional parameter, memory_order order. > >> >> > > >> >> > typedef enum memory_order { > >> >> > memory_order_relaxed, > >> >> > memory_order_consume, > >> >> > memory_order_acquire, > >> >> > memory_order_release, > >> >> > memory_order_acq_rel, > >> >> > memory_order_seq_cst > >> >> > } memory_order; > >> >> > > >> >> > Because the default value of the parameter is memory_order_seq_cst, > >> >> > existing codes can use the same semantics of cmpxchg without any > >> >> > modification. The relaxed cmpxchg is implemented only on ppc > >> >> > in this changeset. Therefore, the behavior on the other platforms > > will > >> >> > not be changed with this changeset. > >> >> > > >> >> > In addition, with the new parameter of cmpxchg, this change improves > >> >> > performance of copy_to_survivor in the parallel GC. > >> >> > copy_to_survivor changes forward pointers by using cmpxchg. This > >> >> > operation doesn't require any sync instructions. A pointer is > > changed > >> >> > at most once in a GC and when cmpxchg fails, the latest pointer is > >> >> > available for the caller. cas_set_mark and cas_forward_to are > > extended > >> >> > with an additional memory_order parameter as cmpxchg and > >> > copy_to_survivor > >> >> > uses memory_order_relaxed to modify the forward pointers. > >> >> > > >> >> > Summary of source code changes: > >> >> > > >> >> > * src/share/vm/runtime/atomic.hpp > >> >> > - Defines enum memory_order and adds a parameter to cmpxchg. > >> >> > > >> >> > * src/share/vm/runtime/atomic.cpp > >> >> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp > >> >> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > >> >> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp > >> >> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp > >> >> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp > >> >> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp > >> >> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp > >> >> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp > >> >> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp > >> >> > - Added a parameter for each cmpxchg function to follow > >> >> > the change of atomic.hpp. Their implementations are not > >> > changed. > >> >> > > >> >> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp > >> >> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > >> >> > - Added a parameter for each cmpxchg function to follow > >> >> > the change of atomic.hpp. In addition, implementations > >> >> > are changed corresponding to the specified memory_order. > >> >> > > >> >> > * src/share/vm/oops/oop.hpp > >> >> > * src/share/vm/oops/oop.inline.hpp > >> >> > - Add a memory_order parameter to use relaxed cmpxchg in > >> >> > cas_set_mark and cas_forward_to. > >> >> > > >> >> > * src/share/vm/gc/parallel/psPromotionManager.cpp > >> >> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp > >> >> > > >> >> > Martin tested this changeset on linuxx86_64, linuxppc64le and > >> >> > darwinintel64. > >> >> > Though more time is needed to test on the other platform, we would > >> > like to > >> >> > ask > >> >> > reviews and start discussion on this changeset. > >> >> > I also tested this changeset with SPECjbb2013 and confirmed that gc > >> > pause > >> >> > time > >> >> > is reduced. > >> >> > > >> >> > Regards, > >> >> > Hiroshi > >> >> > ----------------------- > >> >> > Hiroshi Horii, Ph.D. > >> >> > IBM Research - Tokyo > >> >> > > >> >> > > >> >> > >> > > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Tue May 10 09:41:31 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 10 May 2016 09:41:31 +0000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <571A1FA3.9030006@oracle.com> <201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com> <1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com> <201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com> <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com> <347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com> Message-ID: <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> Hi David, thank you very much for testing the other platforms. Here's an updated webrev: http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/ Best regards, Martin -----Original Message----- From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of David Holmes Sent: Dienstag, 10. Mai 2016 11:11 To: Hiroshi H Horii Cc: Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 The fix seems incomplete for solaris: make/Main.gmk:232: recipe for target 'hotspot' failed "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp", line 124: Error: Too many arguments in call to "_Atomic_cmpxchg_long(long, volatile long*, long)". "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp", line 128: Error: Too many arguments in call to "_Atomic_cmpxchg_long(long, volatile long*, long)". David On 10/05/2016 5:34 PM, David Holmes wrote: > Hi Hiroshi, > > On 6/05/2016 8:11 PM, Hiroshi H Horii wrote: >> Hi David, >> >> Thank you for your comments. >> >> As Martin suggested me, I would like to separate this proposal to >> - relaxing memory order of cmpxchg >> - improvement of copy_to_survivior with relaxed cmpxchg >> and discuss the former first. >> >> Martin thankfully created a new webrev that include a change of cmpxchg. >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/ >> He has already tested it with AIX, linuxx86_64, linuxppc64le and >> darwinintel64. >> (Please tell me if I need to send a new mail for this PFR) > > Please do as it will be simpler to track that way. > >>> What I would prefer to see is an additional memory_order value (such as >>> memory_order_ignored) which is the default for all methods declared to >>> take a memory_order parameter. >> >> We added simple enum to specify memory order in atomic.hpp as follows. >> >> typedef enum cmpxchg_cmpxchg_memory_order { >> memory_order_relaxed, >> memory_order_conservative >> } cmpxchg_memory_order; >> >> All of cmpxchg functions have an argument of cmpxchg_memory_order >> with a default value memory_order_conservative that uses the same >> semantics with the existing cmpxchg and requires no change for the >> existing >> callers. If you think "memory_order_ignored" is better than >> "memory_order_conservative", I will be happy to modify this change. >> (I just thought, "ignored" may resemble "relaxed" and may make >> people who are familiar with C++11's memory semantics confused. >> I would like to know thoughts of native speakers.) > > That is fine by me. I don't think "ignored" would be confused with > "relaxed", but "conservative" is fine. > > I will run the patch through our internal build system while you prepare > the updated RFR. My only concern is "unused argument" warnings from the > compiler. :) > > We are quickly running into a hard deadline with Feature Complete > however - possibly less than 24 hours - for hotspot changes. If this > doesn't get in in time I will see if I can shepherd it through the > approval process. > > Thanks, > David > > >> Regards, >> Hiroshi >> ----------------------- >> Hiroshi Horii, Ph.D. >> IBM Research - Tokyo >> >> >> David Holmes wrote on 05/04/2016 14:55:29: >> >>> From: David Holmes >>> To: Hiroshi H Horii/Japan/IBM at IBMJP >>> Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime- >>> dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison >>> , Volker Simonis , >>> "Doerr, Martin" , "Lindenmaier, Goetz" >>> >>> Date: 05/04/2016 14:57 >>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >>> copy_to_survivor for ppc64 >>> >>> Hi Hiroshi, >>> >>> Sorry for the delay on getting back to this. >>> >>> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote: >>> > Hi David, >>> > >>> > Thank you for your comments and questions. >>> > >>> >> 1. Are the current cmpxchg semantics exactly the same as >>> >> memory_order_seq_cst? >>> > >>> > This is very good question.. >>> > >>> > I guess, cmpxchg needs a more conservative constraint for memory >> ordering >>> > than C++11, to add sync after a compare-and-exchange operation. >>> > >>> > Could someone give comments or thoughts? >>> >>> I don't want to comment on the comparison with C++11. What I would >>> prefer to see is an additional memory_order value (such as >>> memory_order_ignored) which is the default for all methods declared to >>> take a memory_order parameter. That way existing implementations are >>> clearly ignoring the memory_order attribute and there is no potential >>> for confusion as to whether the existing implementations equate to >>> memory_order_seq_cst or not. >>> >>> That said, I'm not sure it makes sense to add the memory_order parameter >>> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark, >>> oopDesc::cas_forward_to, unless those methods can sensibly be called >>> with any value for memory_order - which seems highly unlikely. Perhaps >>> those methods should identify the weakest form of memory_order they >>> support and that should be hard-wired into them? >>> >>> Thanks, >>> David >>> >>> > memory_order_seq_cst is defined as >>> > "Any operation with this memory order is both an acquire >> operation and >>> > a release operation, plus a single total order exists in which >>> all >>> > threads >>> > observe all modifications (see below) in the same order." >>> > (http://en.cppreference.com/w/cpp/atomic/memory_order) >>> > >>> > In my environment, g++ and xlc generate following assemblies on >>> ppc64le. >>> > (interestingly, they generates the same assemblies for any >>> memory_order) >>> > >>> > g++ (4.9.2) >>> > 100008a4: ac 04 00 7c sync >>> > 100008a8: 28 50 20 7d lwarx r9,0,r10 >>> > 100008ac: 00 18 09 7c cmpw r9,r3 >>> > 100008b0: 0c 00 c2 40 bne- 100008bc >>> > 100008b4: 2d 51 80 7c stwcx. r4,0,r10 >>> > 100008b8: f0 ff c2 40 bne- 100008a8 >>> > 100008bc: 2c 01 00 4c isync >>> > >>> > xlc (13.1.3) >>> > 10000888: ac 04 00 7c sync >>> > 1000088c: 28 28 c0 7c lwarx r6,0,r5 >>> > 10000890: 40 00 26 7c cmpld r6,r0 >>> > 10000894: 0c 00 82 40 bne 100008a0 >>> > 10000898: 2d 29 80 7c stwcx. r4,0,r5 >>> > 1000089c: f0 ff e2 40 bne+ 1000088c >>> > 100008a0: 2c 01 00 4c isync >>> > >>> > On the other hand, the current OpenJDK generates following assemblies. >>> > >>> > 508: ac 04 00 7c sync >>> > 50c: 00 00 5c e9 ld r10,0(r28) >>> > 510: 00 50 3b 7c cmpd r27,r10 >>> > 514: 1c 00 c2 40 bne- 530 >>> > 518: a8 40 5c 7d ldarx r10,r28,r8 >>> > 51c: 00 50 3b 7c cmpd r27,r10 >>> > 520: 10 00 c2 40 bne- 530 >>> > 524: ad 41 3c 7d stdcx. r9,r28,r8 >>> > 528: f0 ff c2 40 bne- 518 >>> > 52c: ac 04 00 7c sync >>> > 530: 00 50 bb 7f ... >>> > >>> > Though we can ignore 50c-514 (because they are a duplicated guard >>> > condition), >>> > the last sync instruction (52c) makes cmpxchg more strict than >>> > memory_order_seq_cst. >>> > >>> > In some cases, the last sync is necessary when this thread must be >>> able >>> > to read >>> > all of the changes in the other threads while executing from 508 to >>> 530 >>> > (that processes compare-and-exchange). >>> > >>> >> 2. Has there been a discussion already, establishing that the >>> modified >>> >> GC code can indeed use memory_order_relaxed? Otherwise who is >>> >> postulating that and based on what evidence? >>> > >>> > Volker and his colleagues have investigated the current GC codes >>> > according to this. >>> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>> April/019079.html >>> > However, I believe, we need comments of other GC experts to change >>> > the shared codes. >>> > >>> > Regards, >>> > Hiroshi >>> > ----------------------- >>> > Hiroshi Horii, Ph.D. >>> > IBM Research - Tokyo >>> > >>> > >>> > David Holmes wrote on 04/22/2016 21:57:07: >>> > >>> >> From: David Holmes >>> >> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime- >>> >> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net >>> >> Cc: Tim Ellison , >>> > ppc-aix-port-dev at openjdk.java.net >>> >> Date: 04/22/2016 21:58 >>> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >>> >> copy_to_survivor for ppc64 >>> >> >>> >> Hi Hiroshi, >>> >> >>> >> Two initial questions: >>> >> >>> >> 1. Are the current cmpxchg semantics exactly the same as >>> >> memory_order_seq_cst? >>> >> >>> >> 2. Has there been a discussion already, establishing that the >>> modified >>> >> GC code can indeed use memory_order_relaxed? Otherwise who is >>> >> postulating that and based on what evidence? >>> >> >>> >> Missing memory barriers have caused very difficult to track down >> bugs in >>> >> the past - very rare race conditions. So any relaxation here has >>> to be >>> >> done with extreme confidence. >>> >> >>> >> Thanks, >>> >> David >>> >> >>> >> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote: >>> >> > Dear all: >>> >> > >>> >> > Can I please request reviews for the following change? >>> >> > >>> >> > Code change: >>> >> > >> http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/ >>> >> > (I initially created and Martin enhanced so much) >>> >> > >>> >> > This change follows the discussion started from this mail. >>> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>> >> April/018960.html >>> >> > >>> >> > Description: >>> >> > This change provides relaxed compare-and-exchange by introducing >>> >> > similar semantics of C++ atomic memory operators, enum >>> memory_order. >>> >> > As described in atomic_linux_ppc.inline.hpp, the current >>> > implementation of >>> >> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for >>> >> > general purposes because twice calls of sync before and after >>> > cmpxchg will >>> >> > provide strict consistency. However, they sometimes cause overheads >>> >> > because >>> >> > sync instructions are very expensive in the current POWER chip >> design. >>> >> > In addition, for the other platforms, such as aarch64, this strict >>> >> > semantics >>> >> > may cause some overheads (according to the Andrew's mail). >>> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>> >> April/019073.html >>> >> > >>> >> > With this change, callers can explicitly specify constraints of >> memory >>> >> > ordering >>> >> > for cmpxchg with an additional parameter, memory_order order. >>> >> > >>> >> > typedef enum memory_order { >>> >> > memory_order_relaxed, >>> >> > memory_order_consume, >>> >> > memory_order_acquire, >>> >> > memory_order_release, >>> >> > memory_order_acq_rel, >>> >> > memory_order_seq_cst >>> >> > } memory_order; >>> >> > >>> >> > Because the default value of the parameter is memory_order_seq_cst, >>> >> > existing codes can use the same semantics of cmpxchg without any >>> >> > modification. The relaxed cmpxchg is implemented only on ppc >>> >> > in this changeset. Therefore, the behavior on the other platforms >> will >>> >> > not be changed with this changeset. >>> >> > >>> >> > In addition, with the new parameter of cmpxchg, this change >>> improves >>> >> > performance of copy_to_survivor in the parallel GC. >>> >> > copy_to_survivor changes forward pointers by using cmpxchg. This >>> >> > operation doesn't require any sync instructions. A pointer is >> changed >>> >> > at most once in a GC and when cmpxchg fails, the latest pointer is >>> >> > available for the caller. cas_set_mark and cas_forward_to are >> extended >>> >> > with an additional memory_order parameter as cmpxchg and >>> > copy_to_survivor >>> >> > uses memory_order_relaxed to modify the forward pointers. >>> >> > >>> >> > Summary of source code changes: >>> >> > >>> >> > * src/share/vm/runtime/atomic.hpp >>> >> > - Defines enum memory_order and adds a parameter to cmpxchg. >>> >> > >>> >> > * src/share/vm/runtime/atomic.cpp >>> >> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp >>> >> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp >>> >> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp >>> >> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp >>> >> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp >>> >> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp >>> >> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp >>> >> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp >>> >> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp >>> >> > - Added a parameter for each cmpxchg function to follow >>> >> > the change of atomic.hpp. Their implementations are not >>> > changed. >>> >> > >>> >> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp >>> >> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >>> >> > - Added a parameter for each cmpxchg function to follow >>> >> > the change of atomic.hpp. In addition, implementations >>> >> > are changed corresponding to the specified memory_order. >>> >> > >>> >> > * src/share/vm/oops/oop.hpp >>> >> > * src/share/vm/oops/oop.inline.hpp >>> >> > - Add a memory_order parameter to use relaxed cmpxchg in >>> >> > cas_set_mark and cas_forward_to. >>> >> > >>> >> > * src/share/vm/gc/parallel/psPromotionManager.cpp >>> >> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp >>> >> > >>> >> > Martin tested this changeset on linuxx86_64, linuxppc64le and >>> >> > darwinintel64. >>> >> > Though more time is needed to test on the other platform, we would >>> > like to >>> >> > ask >>> >> > reviews and start discussion on this changeset. >>> >> > I also tested this changeset with SPECjbb2013 and confirmed that gc >>> > pause >>> >> > time >>> >> > is reduced. >>> >> > >>> >> > Regards, >>> >> > Hiroshi >>> >> > ----------------------- >>> >> > Hiroshi Horii, Ph.D. >>> >> > IBM Research - Tokyo >>> >> > >>> >> > >>> >> >>> > >>> >> From david.holmes at oracle.com Tue May 10 10:30:36 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 May 2016 20:30:36 +1000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <571A1FA3.9030006@oracle.com> <201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com> <1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com> <201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com> <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com> <347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> Message-ID: On 10/05/2016 7:41 PM, Doerr, Martin wrote: > Hi David, > > thank you very much for testing the other platforms. > > Here's an updated webrev: > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/ Thanks. Second test run on its way. David ----- > Best regards, > Martin > > -----Original Message----- > From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of David Holmes > Sent: Dienstag, 10. Mai 2016 11:11 > To: Hiroshi H Horii > Cc: Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 > > The fix seems incomplete for solaris: > > make/Main.gmk:232: recipe for target 'hotspot' failed > "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp", > line 124: Error: Too many arguments in call to > "_Atomic_cmpxchg_long(long, volatile long*, long)". > "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp", > line 128: Error: Too many arguments in call to > "_Atomic_cmpxchg_long(long, volatile long*, long)". > > David > > On 10/05/2016 5:34 PM, David Holmes wrote: >> Hi Hiroshi, >> >> On 6/05/2016 8:11 PM, Hiroshi H Horii wrote: >>> Hi David, >>> >>> Thank you for your comments. >>> >>> As Martin suggested me, I would like to separate this proposal to >>> - relaxing memory order of cmpxchg >>> - improvement of copy_to_survivior with relaxed cmpxchg >>> and discuss the former first. >>> >>> Martin thankfully created a new webrev that include a change of cmpxchg. >>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/ >>> He has already tested it with AIX, linuxx86_64, linuxppc64le and >>> darwinintel64. >>> (Please tell me if I need to send a new mail for this PFR) >> >> Please do as it will be simpler to track that way. >> >>>> What I would prefer to see is an additional memory_order value (such as >>>> memory_order_ignored) which is the default for all methods declared to >>>> take a memory_order parameter. >>> >>> We added simple enum to specify memory order in atomic.hpp as follows. >>> >>> typedef enum cmpxchg_cmpxchg_memory_order { >>> memory_order_relaxed, >>> memory_order_conservative >>> } cmpxchg_memory_order; >>> >>> All of cmpxchg functions have an argument of cmpxchg_memory_order >>> with a default value memory_order_conservative that uses the same >>> semantics with the existing cmpxchg and requires no change for the >>> existing >>> callers. If you think "memory_order_ignored" is better than >>> "memory_order_conservative", I will be happy to modify this change. >>> (I just thought, "ignored" may resemble "relaxed" and may make >>> people who are familiar with C++11's memory semantics confused. >>> I would like to know thoughts of native speakers.) >> >> That is fine by me. I don't think "ignored" would be confused with >> "relaxed", but "conservative" is fine. >> >> I will run the patch through our internal build system while you prepare >> the updated RFR. My only concern is "unused argument" warnings from the >> compiler. :) >> >> We are quickly running into a hard deadline with Feature Complete >> however - possibly less than 24 hours - for hotspot changes. If this >> doesn't get in in time I will see if I can shepherd it through the >> approval process. >> >> Thanks, >> David >> >> >>> Regards, >>> Hiroshi >>> ----------------------- >>> Hiroshi Horii, Ph.D. >>> IBM Research - Tokyo >>> >>> >>> David Holmes wrote on 05/04/2016 14:55:29: >>> >>>> From: David Holmes >>>> To: Hiroshi H Horii/Japan/IBM at IBMJP >>>> Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime- >>>> dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison >>>> , Volker Simonis , >>>> "Doerr, Martin" , "Lindenmaier, Goetz" >>>> >>>> Date: 05/04/2016 14:57 >>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >>>> copy_to_survivor for ppc64 >>>> >>>> Hi Hiroshi, >>>> >>>> Sorry for the delay on getting back to this. >>>> >>>> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote: >>>>> Hi David, >>>>> >>>>> Thank you for your comments and questions. >>>>> >>>>>> 1. Are the current cmpxchg semantics exactly the same as >>>>>> memory_order_seq_cst? >>>>> >>>>> This is very good question.. >>>>> >>>>> I guess, cmpxchg needs a more conservative constraint for memory >>> ordering >>>>> than C++11, to add sync after a compare-and-exchange operation. >>>>> >>>>> Could someone give comments or thoughts? >>>> >>>> I don't want to comment on the comparison with C++11. What I would >>>> prefer to see is an additional memory_order value (such as >>>> memory_order_ignored) which is the default for all methods declared to >>>> take a memory_order parameter. That way existing implementations are >>>> clearly ignoring the memory_order attribute and there is no potential >>>> for confusion as to whether the existing implementations equate to >>>> memory_order_seq_cst or not. >>>> >>>> That said, I'm not sure it makes sense to add the memory_order parameter >>>> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark, >>>> oopDesc::cas_forward_to, unless those methods can sensibly be called >>>> with any value for memory_order - which seems highly unlikely. Perhaps >>>> those methods should identify the weakest form of memory_order they >>>> support and that should be hard-wired into them? >>>> >>>> Thanks, >>>> David >>>> >>>>> memory_order_seq_cst is defined as >>>>> "Any operation with this memory order is both an acquire >>> operation and >>>>> a release operation, plus a single total order exists in which >>>> all >>>>> threads >>>>> observe all modifications (see below) in the same order." >>>>> (http://en.cppreference.com/w/cpp/atomic/memory_order) >>>>> >>>>> In my environment, g++ and xlc generate following assemblies on >>>> ppc64le. >>>>> (interestingly, they generates the same assemblies for any >>>> memory_order) >>>>> >>>>> g++ (4.9.2) >>>>> 100008a4: ac 04 00 7c sync >>>>> 100008a8: 28 50 20 7d lwarx r9,0,r10 >>>>> 100008ac: 00 18 09 7c cmpw r9,r3 >>>>> 100008b0: 0c 00 c2 40 bne- 100008bc >>>>> 100008b4: 2d 51 80 7c stwcx. r4,0,r10 >>>>> 100008b8: f0 ff c2 40 bne- 100008a8 >>>>> 100008bc: 2c 01 00 4c isync >>>>> >>>>> xlc (13.1.3) >>>>> 10000888: ac 04 00 7c sync >>>>> 1000088c: 28 28 c0 7c lwarx r6,0,r5 >>>>> 10000890: 40 00 26 7c cmpld r6,r0 >>>>> 10000894: 0c 00 82 40 bne 100008a0 >>>>> 10000898: 2d 29 80 7c stwcx. r4,0,r5 >>>>> 1000089c: f0 ff e2 40 bne+ 1000088c >>>>> 100008a0: 2c 01 00 4c isync >>>>> >>>>> On the other hand, the current OpenJDK generates following assemblies. >>>>> >>>>> 508: ac 04 00 7c sync >>>>> 50c: 00 00 5c e9 ld r10,0(r28) >>>>> 510: 00 50 3b 7c cmpd r27,r10 >>>>> 514: 1c 00 c2 40 bne- 530 >>>>> 518: a8 40 5c 7d ldarx r10,r28,r8 >>>>> 51c: 00 50 3b 7c cmpd r27,r10 >>>>> 520: 10 00 c2 40 bne- 530 >>>>> 524: ad 41 3c 7d stdcx. r9,r28,r8 >>>>> 528: f0 ff c2 40 bne- 518 >>>>> 52c: ac 04 00 7c sync >>>>> 530: 00 50 bb 7f ... >>>>> >>>>> Though we can ignore 50c-514 (because they are a duplicated guard >>>>> condition), >>>>> the last sync instruction (52c) makes cmpxchg more strict than >>>>> memory_order_seq_cst. >>>>> >>>>> In some cases, the last sync is necessary when this thread must be >>>> able >>>>> to read >>>>> all of the changes in the other threads while executing from 508 to >>>> 530 >>>>> (that processes compare-and-exchange). >>>>> >>>>>> 2. Has there been a discussion already, establishing that the >>>> modified >>>>>> GC code can indeed use memory_order_relaxed? Otherwise who is >>>>>> postulating that and based on what evidence? >>>>> >>>>> Volker and his colleagues have investigated the current GC codes >>>>> according to this. >>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>>> April/019079.html >>>>> However, I believe, we need comments of other GC experts to change >>>>> the shared codes. >>>>> >>>>> Regards, >>>>> Hiroshi >>>>> ----------------------- >>>>> Hiroshi Horii, Ph.D. >>>>> IBM Research - Tokyo >>>>> >>>>> >>>>> David Holmes wrote on 04/22/2016 21:57:07: >>>>> >>>>>> From: David Holmes >>>>>> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime- >>>>>> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net >>>>>> Cc: Tim Ellison , >>>>> ppc-aix-port-dev at openjdk.java.net >>>>>> Date: 04/22/2016 21:58 >>>>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >>>>>> copy_to_survivor for ppc64 >>>>>> >>>>>> Hi Hiroshi, >>>>>> >>>>>> Two initial questions: >>>>>> >>>>>> 1. Are the current cmpxchg semantics exactly the same as >>>>>> memory_order_seq_cst? >>>>>> >>>>>> 2. Has there been a discussion already, establishing that the >>>> modified >>>>>> GC code can indeed use memory_order_relaxed? Otherwise who is >>>>>> postulating that and based on what evidence? >>>>>> >>>>>> Missing memory barriers have caused very difficult to track down >>> bugs in >>>>>> the past - very rare race conditions. So any relaxation here has >>>> to be >>>>>> done with extreme confidence. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote: >>>>>>> Dear all: >>>>>>> >>>>>>> Can I please request reviews for the following change? >>>>>>> >>>>>>> Code change: >>>>>>> >>> http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/ >>>>>>> (I initially created and Martin enhanced so much) >>>>>>> >>>>>>> This change follows the discussion started from this mail. >>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>>>>> April/018960.html >>>>>>> >>>>>>> Description: >>>>>>> This change provides relaxed compare-and-exchange by introducing >>>>>>> similar semantics of C++ atomic memory operators, enum >>>> memory_order. >>>>>>> As described in atomic_linux_ppc.inline.hpp, the current >>>>> implementation of >>>>>>> cmpxchg is fence_cmpxchg_acquire. This implementation is useful for >>>>>>> general purposes because twice calls of sync before and after >>>>> cmpxchg will >>>>>>> provide strict consistency. However, they sometimes cause overheads >>>>>>> because >>>>>>> sync instructions are very expensive in the current POWER chip >>> design. >>>>>>> In addition, for the other platforms, such as aarch64, this strict >>>>>>> semantics >>>>>>> may cause some overheads (according to the Andrew's mail). >>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>>>>> April/019073.html >>>>>>> >>>>>>> With this change, callers can explicitly specify constraints of >>> memory >>>>>>> ordering >>>>>>> for cmpxchg with an additional parameter, memory_order order. >>>>>>> >>>>>>> typedef enum memory_order { >>>>>>> memory_order_relaxed, >>>>>>> memory_order_consume, >>>>>>> memory_order_acquire, >>>>>>> memory_order_release, >>>>>>> memory_order_acq_rel, >>>>>>> memory_order_seq_cst >>>>>>> } memory_order; >>>>>>> >>>>>>> Because the default value of the parameter is memory_order_seq_cst, >>>>>>> existing codes can use the same semantics of cmpxchg without any >>>>>>> modification. The relaxed cmpxchg is implemented only on ppc >>>>>>> in this changeset. Therefore, the behavior on the other platforms >>> will >>>>>>> not be changed with this changeset. >>>>>>> >>>>>>> In addition, with the new parameter of cmpxchg, this change >>>> improves >>>>>>> performance of copy_to_survivor in the parallel GC. >>>>>>> copy_to_survivor changes forward pointers by using cmpxchg. This >>>>>>> operation doesn't require any sync instructions. A pointer is >>> changed >>>>>>> at most once in a GC and when cmpxchg fails, the latest pointer is >>>>>>> available for the caller. cas_set_mark and cas_forward_to are >>> extended >>>>>>> with an additional memory_order parameter as cmpxchg and >>>>> copy_to_survivor >>>>>>> uses memory_order_relaxed to modify the forward pointers. >>>>>>> >>>>>>> Summary of source code changes: >>>>>>> >>>>>>> * src/share/vm/runtime/atomic.hpp >>>>>>> - Defines enum memory_order and adds a parameter to cmpxchg. >>>>>>> >>>>>>> * src/share/vm/runtime/atomic.cpp >>>>>>> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp >>>>>>> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp >>>>>>> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp >>>>>>> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp >>>>>>> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp >>>>>>> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp >>>>>>> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp >>>>>>> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp >>>>>>> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp >>>>>>> - Added a parameter for each cmpxchg function to follow >>>>>>> the change of atomic.hpp. Their implementations are not >>>>> changed. >>>>>>> >>>>>>> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp >>>>>>> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >>>>>>> - Added a parameter for each cmpxchg function to follow >>>>>>> the change of atomic.hpp. In addition, implementations >>>>>>> are changed corresponding to the specified memory_order. >>>>>>> >>>>>>> * src/share/vm/oops/oop.hpp >>>>>>> * src/share/vm/oops/oop.inline.hpp >>>>>>> - Add a memory_order parameter to use relaxed cmpxchg in >>>>>>> cas_set_mark and cas_forward_to. >>>>>>> >>>>>>> * src/share/vm/gc/parallel/psPromotionManager.cpp >>>>>>> * src/share/vm/gc/parallel/psPromotionManager.inline.hpp >>>>>>> >>>>>>> Martin tested this changeset on linuxx86_64, linuxppc64le and >>>>>>> darwinintel64. >>>>>>> Though more time is needed to test on the other platform, we would >>>>> like to >>>>>>> ask >>>>>>> reviews and start discussion on this changeset. >>>>>>> I also tested this changeset with SPECjbb2013 and confirmed that gc >>>>> pause >>>>>>> time >>>>>>> is reduced. >>>>>>> >>>>>>> Regards, >>>>>>> Hiroshi >>>>>>> ----------------------- >>>>>>> Hiroshi Horii, Ph.D. >>>>>>> IBM Research - Tokyo >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> From HORII at jp.ibm.com Tue May 10 10:44:36 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Tue, 10 May 2016 19:44:36 +0900 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg Message-ID: <201605101044.u4AAinRB025272@d19av05.sagamino.japan.ibm.com> Hi All, Can I please request reviews for the following change? Code change: http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/ This change follows the discussion started from these mails. http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/018960.html http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019148.html http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-May/019320.html Description: This change provides relaxed compare-and-exchange by introducing relaxed memory order. As described in atomic_linux_ppc.inline.hpp, the current implementation of cmpxchg is fence_cmpxchg_acquire. This implementation is useful for general purposes because twice calls of sync before and after cmpxchg will provide strict consistency. However, they sometimes cause overheads because sync instructions are very expensive in the current POWER chip design. We confirmed this change improves performance of copy_to_survivor in the parallel GC. However, we will need more investigation of GC by more experts. So, We would like to request a review of the change of cmpxchg first (as Martin requested). http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019188.html Summary of source code changes: * src/share/vm/runtime/atomic.hpp - Defines enum memory_order and adds a parameter to cmpxchg. * src/share/vm/runtime/atomic.cpp * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp - Added a parameter for each cmpxchg function to follow the change of atomic.hpp. Their implementations are not changed. * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp - Added a parameter for each cmpxchg function to follow the change of atomic.hpp. In addition, implementations are changed corresponding to the specified memory_order. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Tue May 10 11:04:15 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 May 2016 21:04:15 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> Message-ID: <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> Hi Hiroshi, On 10/05/2016 8:44 PM, Hiroshi H Horii wrote: > Hi All, > > Can I please request reviews for the following change? > > Code change: > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/ Changes look good. I'm currently running them through our internal build system. I will sponsor this and push the change through JPRT. Just need another reviewer to chime in - given you and Martin are both contributors. Or are you the main contributor with Martin being a reviewer? Thanks, David PS. It's my night now so I'll be signing off and will pick this up in the morning. > This change follows the discussion started from these mails. > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/018960.html > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019148.html > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-May/019320.html > > Description: > This change provides relaxed compare-and-exchange by introducing > relaxed memory order. As described in atomic_linux_ppc.inline.hpp, > the current implementation of cmpxchg is fence_cmpxchg_acquire. > This implementation is useful for general purposes because twice calls of > sync before and after cmpxchg will provide strict consistency. > However, they sometimes cause overheads because sync instructions are > very expensive in the current POWER chip design. > > We confirmed this change improves performance of copy_to_survivor > in the parallel GC. However, we will need more investigation of GC > by more experts. So, We would like to request a review of the change > of cmpxchg first (as Martin requested). > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019188.html > > Summary of source code changes: > > * src/share/vm/runtime/atomic.hpp > - Defines enum memory_order and adds a parameter to cmpxchg. > > * src/share/vm/runtime/atomic.cpp > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp > - Added a parameter for each cmpxchg function to follow > the change of atomic.hpp. Their implementations are not changed. > > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > - Added a parameter for each cmpxchg function to follow > the change of atomic.hpp. In addition, implementations > are changed corresponding to the specified memory_order. > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > From david.holmes at oracle.com Tue May 10 12:29:53 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 10 May 2016 22:29:53 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> Message-ID: <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> On 10/05/2016 9:04 PM, David Holmes wrote: > Hi Hiroshi, > > On 10/05/2016 8:44 PM, Hiroshi H Horii wrote: >> Hi All, >> >> Can I please request reviews for the following change? >> >> Code change: >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/ > > Changes look good. I'm currently running them through our internal build > system. I will sponsor this and push the change through JPRT. Still a problem on Solaris sparc: "/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/atomic.inline.hpp", line 96: Error: Could not find a match for static Atomic::cmpxchg(signed char, volatile signed char*, signed char). 1 Error(s) detected. Needs this patch: diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp --- a/src/share/vm/runtime/atomic.inline.hpp +++ b/src/share/vm/runtime/atomic.inline.hpp @@ -92,7 +92,7 @@ #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE // See comment in atomic.cpp how to override. -inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte *dest, jbyte comparand) +inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte *dest, jbyte comparand, cmpxchg_memory_order order) { return cmpxchg_general(exchange_value, dest, comparand); } David ----- > Just need another reviewer to chime in - given you and Martin are both > contributors. Or are you the main contributor with Martin being a reviewer? > > Thanks, > David > > PS. It's my night now so I'll be signing off and will pick this up in > the morning. > >> This change follows the discussion started from these mails. >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/018960.html >> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019148.html >> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-May/019320.html >> >> >> Description: >> This change provides relaxed compare-and-exchange by introducing >> relaxed memory order. As described in atomic_linux_ppc.inline.hpp, >> the current implementation of cmpxchg is fence_cmpxchg_acquire. >> This implementation is useful for general purposes because twice calls of >> sync before and after cmpxchg will provide strict consistency. >> However, they sometimes cause overheads because sync instructions are >> very expensive in the current POWER chip design. >> >> We confirmed this change improves performance of copy_to_survivor >> in the parallel GC. However, we will need more investigation of GC >> by more experts. So, We would like to request a review of the change >> of cmpxchg first (as Martin requested). >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019188.html >> >> >> Summary of source code changes: >> >> * src/share/vm/runtime/atomic.hpp >> - Defines enum memory_order and adds a parameter to cmpxchg. >> >> * src/share/vm/runtime/atomic.cpp >> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp >> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp >> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp >> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp >> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp >> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp >> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp >> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp >> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp >> - Added a parameter for each cmpxchg function to follow >> the change of atomic.hpp. Their implementations are not changed. >> >> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >> - Added a parameter for each cmpxchg function to follow >> the change of atomic.hpp. In addition, implementations >> are changed corresponding to the specified memory_order. >> >> Regards, >> Hiroshi >> ----------------------- >> Hiroshi Horii, Ph.D. >> IBM Research - Tokyo >> From HORII at jp.ibm.com Tue May 10 13:17:46 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Tue, 10 May 2016 22:17:46 +0900 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> Message-ID: <201605101318.u4ADI0TG028528@d19av07.sagamino.japan.ibm.com> Hi David, > Just need another reviewer to chime in - given you and Martin are both > contributors. Or are you the main contributor with Martin being a reviewer? Martin and I are contributors of this change. > Still a problem on Solaris sparc: Martin, could you create a new change in webrev with the patch that David sent? Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo David Holmes wrote on 05/10/2016 21:29:53: > From: David Holmes > To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-runtime- > dev at openjdk.java.net" > Cc: Tim Ellison , "ppc-aix-port- > dev at openjdk.java.net" , "hotspot- > gc-dev at openjdk.java.net" > Date: 05/10/2016 21:31 > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > > On 10/05/2016 9:04 PM, David Holmes wrote: > > Hi Hiroshi, > > > > On 10/05/2016 8:44 PM, Hiroshi H Horii wrote: > >> Hi All, > >> > >> Can I please request reviews for the following change? > >> > >> Code change: > >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/ > > > > Changes look good. I'm currently running them through our internal build > > system. I will sponsor this and push the change through JPRT. > > Still a problem on Solaris sparc: > > "/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/ > atomic.inline.hpp", > line 96: Error: Could not find a match for static Atomic::cmpxchg(signed > char, volatile signed char*, signed char). > 1 Error(s) detected. > > Needs this patch: > > diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp > --- a/src/share/vm/runtime/atomic.inline.hpp > +++ b/src/share/vm/runtime/atomic.inline.hpp > @@ -92,7 +92,7 @@ > > #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE > // See comment in atomic.cpp how to override. > -inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte > *dest, jbyte comparand) > +inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte > *dest, jbyte comparand, cmpxchg_memory_order order) > { > return cmpxchg_general(exchange_value, dest, comparand); > } > > David > ----- > > > Just need another reviewer to chime in - given you and Martin are both > > contributors. Or are you the main contributor with Martin being a reviewer? > > > > Thanks, > > David > > > > PS. It's my night now so I'll be signing off and will pick this up in > > the morning. > > > >> This change follows the discussion started from these mails. > >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > April/018960.html > >> > >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > April/019148.html > >> > >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > May/019320.html > >> > >> > >> Description: > >> This change provides relaxed compare-and-exchange by introducing > >> relaxed memory order. As described in atomic_linux_ppc.inline.hpp, > >> the current implementation of cmpxchg is fence_cmpxchg_acquire. > >> This implementation is useful for general purposes because twice calls of > >> sync before and after cmpxchg will provide strict consistency. > >> However, they sometimes cause overheads because sync instructions are > >> very expensive in the current POWER chip design. > >> > >> We confirmed this change improves performance of copy_to_survivor > >> in the parallel GC. However, we will need more investigation of GC > >> by more experts. So, We would like to request a review of the change > >> of cmpxchg first (as Martin requested). > >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > April/019188.html > >> > >> > >> Summary of source code changes: > >> > >> * src/share/vm/runtime/atomic.hpp > >> - Defines enum memory_order and adds a parameter to cmpxchg. > >> > >> * src/share/vm/runtime/atomic.cpp > >> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp > >> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > >> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp > >> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp > >> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp > >> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp > >> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp > >> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp > >> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp > >> - Added a parameter for each cmpxchg function to follow > >> the change of atomic.hpp. Their implementations are not changed. > >> > >> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp > >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > >> - Added a parameter for each cmpxchg function to follow > >> the change of atomic.hpp. In addition, implementations > >> are changed corresponding to the specified memory_order. > >> > >> Regards, > >> Hiroshi > >> ----------------------- > >> Hiroshi Horii, Ph.D. > >> IBM Research - Tokyo > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.doerr at sap.com Tue May 10 14:27:52 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 10 May 2016 14:27:52 +0000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> Message-ID: <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> Hello everybody, thanks for finding this issue. New webrev is here: http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/ Best regards, Martin From: Hiroshi H Horii [mailto:HORII at jp.ibm.com] Sent: Dienstag, 10. Mai 2016 15:18 To: David Holmes Cc: hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Tim Ellison ; Doerr, Martin Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg Hi David, > Just need another reviewer to chime in - given you and Martin are both > contributors. Or are you the main contributor with Martin being a reviewer? Martin and I are contributors of this change. > Still a problem on Solaris sparc: Martin, could you create a new change in webrev with the patch that David sent? Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo David Holmes > wrote on 05/10/2016 21:29:53: > From: David Holmes > > To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-runtime- > dev at openjdk.java.net" > > Cc: Tim Ellison >, "ppc-aix-port- > dev at openjdk.java.net" >, "hotspot- > gc-dev at openjdk.java.net" > > Date: 05/10/2016 21:31 > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > > On 10/05/2016 9:04 PM, David Holmes wrote: > > Hi Hiroshi, > > > > On 10/05/2016 8:44 PM, Hiroshi H Horii wrote: > >> Hi All, > >> > >> Can I please request reviews for the following change? > >> > >> Code change: > >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/ > > > > Changes look good. I'm currently running them through our internal build > > system. I will sponsor this and push the change through JPRT. > > Still a problem on Solaris sparc: > > "/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/ > atomic.inline.hpp", > line 96: Error: Could not find a match for static Atomic::cmpxchg(signed > char, volatile signed char*, signed char). > 1 Error(s) detected. > > Needs this patch: > > diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp > --- a/src/share/vm/runtime/atomic.inline.hpp > +++ b/src/share/vm/runtime/atomic.inline.hpp > @@ -92,7 +92,7 @@ > > #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE > // See comment in atomic.cpp how to override. > -inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte > *dest, jbyte comparand) > +inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte > *dest, jbyte comparand, cmpxchg_memory_order order) > { > return cmpxchg_general(exchange_value, dest, comparand); > } > > David > ----- > > > Just need another reviewer to chime in - given you and Martin are both > > contributors. Or are you the main contributor with Martin being a reviewer? > > > > Thanks, > > David > > > > PS. It's my night now so I'll be signing off and will pick this up in > > the morning. > > > >> This change follows the discussion started from these mails. > >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > April/018960.html > >> > >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > April/019148.html > >> > >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > May/019320.html > >> > >> > >> Description: > >> This change provides relaxed compare-and-exchange by introducing > >> relaxed memory order. As described in atomic_linux_ppc.inline.hpp, > >> the current implementation of cmpxchg is fence_cmpxchg_acquire. > >> This implementation is useful for general purposes because twice calls of > >> sync before and after cmpxchg will provide strict consistency. > >> However, they sometimes cause overheads because sync instructions are > >> very expensive in the current POWER chip design. > >> > >> We confirmed this change improves performance of copy_to_survivor > >> in the parallel GC. However, we will need more investigation of GC > >> by more experts. So, We would like to request a review of the change > >> of cmpxchg first (as Martin requested). > >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > April/019188.html > >> > >> > >> Summary of source code changes: > >> > >> * src/share/vm/runtime/atomic.hpp > >> - Defines enum memory_order and adds a parameter to cmpxchg. > >> > >> * src/share/vm/runtime/atomic.cpp > >> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp > >> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > >> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp > >> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp > >> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp > >> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp > >> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp > >> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp > >> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp > >> - Added a parameter for each cmpxchg function to follow > >> the change of atomic.hpp. Their implementations are not changed. > >> > >> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp > >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > >> - Added a parameter for each cmpxchg function to follow > >> the change of atomic.hpp. In addition, implementations > >> are changed corresponding to the specified memory_order. > >> > >> Regards, > >> Hiroshi > >> ----------------------- > >> Hiroshi Horii, Ph.D. > >> IBM Research - Tokyo > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.holmes at oracle.com Tue May 10 20:56:09 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 11 May 2016 06:56:09 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> Message-ID: On 11/05/2016 12:27 AM, Doerr, Martin wrote: > Hello everybody, > > thanks for finding this issue. New webrev is here: > > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/ Unfortunately my test run hit a crash on Solaris sparc: # Problematic frame: # V [libjvm.so+0xcc35c4] markOopDesc*markOopDesc::displaced_mark_helper()const+0x64 I'm going to have to do some more testing to see if that is actually related to the change. I know it should not be, but given we CAS marks I have to wonder if there's some subtle bad interaction here. :( David ----- > > > Best regards, > > Martin > > > > *From:*Hiroshi H Horii [mailto:HORII at jp.ibm.com] > *Sent:* Dienstag, 10. Mai 2016 15:18 > *To:* David Holmes > *Cc:* hotspot-gc-dev at openjdk.java.net; > hotspot-runtime-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; > Tim Ellison ; Doerr, Martin > *Subject:* Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > > > > Hi David, > >> Just need another reviewer to chime in - given you and Martin are both >> contributors. Or are you the main contributor with Martin being a reviewer? > > Martin and I are contributors of this change. > >> Still a problem on Solaris sparc: > > Martin, could you create a new change in webrev with the patch that > David sent? > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > David Holmes > > wrote on 05/10/2016 21:29:53: > >> From: David Holmes > >> To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-runtime- >> dev at openjdk.java.net " > > >> Cc: Tim Ellison >, "ppc-aix-port- >> dev at openjdk.java.net " > >, "hotspot- >> gc-dev at openjdk.java.net " > > >> Date: 05/10/2016 21:31 >> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >> >> On 10/05/2016 9:04 PM, David Holmes wrote: >> > Hi Hiroshi, >> > >> > On 10/05/2016 8:44 PM, Hiroshi H Horii wrote: >> >> Hi All, >> >> >> >> Can I please request reviews for the following change? >> >> >> >> Code change: >> >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/ >> > >> > Changes look good. I'm currently running them through our internal build >> > system. I will sponsor this and push the change through JPRT. >> >> Still a problem on Solaris sparc: >> >> "/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/ >> atomic.inline.hpp", >> line 96: Error: Could not find a match for static Atomic::cmpxchg(signed >> char, volatile signed char*, signed char). >> 1 Error(s) detected. >> >> Needs this patch: >> >> diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp >> --- a/src/share/vm/runtime/atomic.inline.hpp >> +++ b/src/share/vm/runtime/atomic.inline.hpp >> @@ -92,7 +92,7 @@ >> >> #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE >> // See comment in atomic.cpp how to override. >> -inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte >> *dest, jbyte comparand) >> +inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte >> *dest, jbyte comparand, cmpxchg_memory_order order) >> { >> return cmpxchg_general(exchange_value, dest, comparand); >> } >> >> David >> ----- >> >> > Just need another reviewer to chime in - given you and Martin are both >> > contributors. Or are you the main contributor with Martin being a reviewer? >> > >> > Thanks, >> > David >> > >> > PS. It's my night now so I'll be signing off and will pick this up in >> > the morning. >> > >> >> This change follows the discussion started from these mails. >> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >> April/018960.html >> >> >> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >> April/019148.html >> >> >> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >> May/019320.html >> >> >> >> >> >> Description: >> >> This change provides relaxed compare-and-exchange by introducing >> >> relaxed memory order. As described in atomic_linux_ppc.inline.hpp, >> >> the current implementation of cmpxchg is fence_cmpxchg_acquire. >> >> This implementation is useful for general purposes because twice calls of >> >> sync before and after cmpxchg will provide strict consistency. >> >> However, they sometimes cause overheads because sync instructions are >> >> very expensive in the current POWER chip design. >> >> >> >> We confirmed this change improves performance of copy_to_survivor >> >> in the parallel GC. However, we will need more investigation of GC >> >> by more experts. So, We would like to request a review of the change >> >> of cmpxchg first (as Martin requested). >> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >> April/019188.html >> >> >> >> >> >> Summary of source code changes: >> >> >> >> * src/share/vm/runtime/atomic.hpp >> >> - Defines enum memory_order and adds a parameter to cmpxchg. >> >> >> >> * src/share/vm/runtime/atomic.cpp >> >> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp >> >> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp >> >> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp >> >> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp >> >> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp >> >> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp >> >> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp >> >> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp >> >> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp >> >> - Added a parameter for each cmpxchg function to follow >> >> the change of atomic.hpp. Their implementations are not changed. >> >> >> >> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp >> >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >> >> - Added a parameter for each cmpxchg function to follow >> >> the change of atomic.hpp. In addition, implementations >> >> are changed corresponding to the specified memory_order. >> >> >> >> Regards, >> >> Hiroshi >> >> ----------------------- >> >> Hiroshi Horii, Ph.D. >> >> IBM Research - Tokyo >> >> >> > From david.holmes at oracle.com Wed May 11 04:41:06 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 11 May 2016 14:41:06 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> Message-ID: Adding hotspot-dev to cc to expand scope of reviewer pool :) On 11/05/2016 6:56 AM, David Holmes wrote: > On 11/05/2016 12:27 AM, Doerr, Martin wrote: >> Hello everybody, >> >> thanks for finding this issue. New webrev is here: >> >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/ > > Unfortunately my test run hit a crash on Solaris sparc: > > # Problematic frame: > # V [libjvm.so+0xcc35c4] > markOopDesc*markOopDesc::displaced_mark_helper()const+0x64 > > I'm going to have to do some more testing to see if that is actually > related to the change. I know it should not be, but given we CAS marks I > have to wonder if there's some subtle bad interaction here. :( Further testing has not shown any failures on Solaris sparc, and the same testing showed some spurious failures on other platforms even without these changes. So while I will file a bug for this crash I think it unlikely to be related to the current changes. So on that note we just need a second hotspot reviewer to sign off on this. Thanks, David > David > ----- > >> >> >> Best regards, >> >> Martin >> >> >> >> *From:*Hiroshi H Horii [mailto:HORII at jp.ibm.com] >> *Sent:* Dienstag, 10. Mai 2016 15:18 >> *To:* David Holmes >> *Cc:* hotspot-gc-dev at openjdk.java.net; >> hotspot-runtime-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; >> Tim Ellison ; Doerr, Martin >> >> *Subject:* Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >> >> >> >> Hi David, >> >>> Just need another reviewer to chime in - given you and Martin are both >>> contributors. Or are you the main contributor with Martin being a >>> reviewer? >> >> Martin and I are contributors of this change. >> >>> Still a problem on Solaris sparc: >> >> Martin, could you create a new change in webrev with the patch that >> David sent? >> >> Regards, >> Hiroshi >> ----------------------- >> Hiroshi Horii, Ph.D. >> IBM Research - Tokyo >> >> >> David Holmes > >> wrote on 05/10/2016 21:29:53: >> >>> From: David Holmes >> > >>> To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-runtime- >>> dev at openjdk.java.net " >> > > >>> Cc: Tim Ellison >> >, "ppc-aix-port- >>> dev at openjdk.java.net " >> > >, "hotspot- >>> gc-dev at openjdk.java.net " >> > > >>> Date: 05/10/2016 21:31 >>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>> >>> On 10/05/2016 9:04 PM, David Holmes wrote: >>> > Hi Hiroshi, >>> > >>> > On 10/05/2016 8:44 PM, Hiroshi H Horii wrote: >>> >> Hi All, >>> >> >>> >> Can I please request reviews for the following change? >>> >> >>> >> Code change: >>> >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/ >>> > >>> > Changes look good. I'm currently running them through our internal >>> build >>> > system. I will sponsor this and push the change through JPRT. >>> >>> Still a problem on Solaris sparc: >>> >>> "/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/ >>> atomic.inline.hpp", >>> line 96: Error: Could not find a match for static Atomic::cmpxchg(signed >>> char, volatile signed char*, signed char). >>> 1 Error(s) detected. >>> >>> Needs this patch: >>> >>> diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp >>> --- a/src/share/vm/runtime/atomic.inline.hpp >>> +++ b/src/share/vm/runtime/atomic.inline.hpp >>> @@ -92,7 +92,7 @@ >>> >>> #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE >>> // See comment in atomic.cpp how to override. >>> -inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte >>> *dest, jbyte comparand) >>> +inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte >>> *dest, jbyte comparand, cmpxchg_memory_order order) >>> { >>> return cmpxchg_general(exchange_value, dest, comparand); >>> } >>> >>> David >>> ----- >>> >>> > Just need another reviewer to chime in - given you and Martin are both >>> > contributors. Or are you the main contributor with Martin being a >>> reviewer? >>> > >>> > Thanks, >>> > David >>> > >>> > PS. It's my night now so I'll be signing off and will pick this up in >>> > the morning. >>> > >>> >> This change follows the discussion started from these mails. >>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>> April/018960.html >>> >> >>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>> April/019148.html >>> >> >>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>> May/019320.html >>> >> >>> >> >>> >> Description: >>> >> This change provides relaxed compare-and-exchange by introducing >>> >> relaxed memory order. As described in atomic_linux_ppc.inline.hpp, >>> >> the current implementation of cmpxchg is fence_cmpxchg_acquire. >>> >> This implementation is useful for general purposes because twice >>> calls of >>> >> sync before and after cmpxchg will provide strict consistency. >>> >> However, they sometimes cause overheads because sync instructions are >>> >> very expensive in the current POWER chip design. >>> >> >>> >> We confirmed this change improves performance of copy_to_survivor >>> >> in the parallel GC. However, we will need more investigation of GC >>> >> by more experts. So, We would like to request a review of the change >>> >> of cmpxchg first (as Martin requested). >>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>> April/019188.html >>> >> >>> >> >>> >> Summary of source code changes: >>> >> >>> >> * src/share/vm/runtime/atomic.hpp >>> >> - Defines enum memory_order and adds a parameter to cmpxchg. >>> >> >>> >> * src/share/vm/runtime/atomic.cpp >>> >> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp >>> >> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp >>> >> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp >>> >> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp >>> >> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp >>> >> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp >>> >> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp >>> >> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp >>> >> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp >>> >> - Added a parameter for each cmpxchg function to follow >>> >> the change of atomic.hpp. Their implementations are not >>> changed. >>> >> >>> >> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp >>> >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >>> >> - Added a parameter for each cmpxchg function to follow >>> >> the change of atomic.hpp. In addition, implementations >>> >> are changed corresponding to the specified memory_order. >>> >> >>> >> Regards, >>> >> Hiroshi >>> >> ----------------------- >>> >> Hiroshi Horii, Ph.D. >>> >> IBM Research - Tokyo >>> >> >>> >> From gromero at linux.vnet.ibm.com Wed May 11 21:06:41 2016 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 11 May 2016 18:06:41 -0300 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: References: <56FEDBB3.5030106@linux.vnet.ibm.com> Message-ID: <57339EE1.2040500@linux.vnet.ibm.com> Hi Volker, Hi Martin Sincere apologies for the long delay. My initial approach to test the VSX load/store was from an extracted snippet regarding just the mass copy loop "grafted" inside an inline asm, performing isolated tests with "perf" tool focused only on aligned source and destination (best case). The extracted code, called "Original" in the plot below (black line), is here: https://github.com/gromero/arraycopy/blob/2pairs/arraycopy.c#L27-L36 That extracted, after some experiments, evolved into this one that employs VSX load/store, Data Stream deepest pre-fetch, d-cache touch, and backbranch aligned to 32-byte: https://github.com/gromero/arraycopy/blob/2pairs/arraycopy_vsx.c#L27-L41 All runs where "pinned" using `numactl --cpunodebind --membind` to avoid any scheduler decision that could add noise to the measure. VSX, deepest data pre-fetch, d-cache touch, and 32-bytes align proved to be better in the isolated code (red line) in comparison to the original extracted code (black line): http://gromero.github.io/openjdk/original_vsx_non_pf_vsx_pf_deepest.pdf So I proceeded to implement the VSX loop in OpenJDK based on the best case result (VSX, pre-fetch deepest, d-cache touch, and backbranch target align - goetz TODO note). OpenJDK 8 webrev: http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/8/ OpenJDK 9 webrev: http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/9/ I've tested the change on OpenJDK 8 using this script that calls System.arraycopy() on shorts: https://goo.gl/8UWtLm The results for all data alignment cases: http://gromero.github.io/openjdk/src_0_dst_0.pdf http://gromero.github.io/openjdk/src_1_dst_0.pdf http://gromero.github.io/openjdk/src_0_dst_1.pdf http://gromero.github.io/openjdk/src_1_dst_1.pdf Martin, I added the vsx test to the feature-string. Regarding the ABI, I'm just using two VSR: vsr0 and vsr1, both volatile. Volker, as the loop unrolling was removed now the loop copies 16 elemets a time, like the non-VSX loop, and not 32 elements. I just verified the change on Little endian. Sorry I didn't understand your question regarding "instructions for aligned load/stores". Did you mean instructions for unaligned load/stores? I think both fixed-point (ld/std) and VSX instructions will do load/store slower in unaligned scenario. However VMX load/store is different and expects aligned operands. Thank you very much for opening the bug https://bugs.openjdk.java.net/browse/JDK-8154156 I don't have the profiling per function for each SPEC{jbb,jvm} benchmark in order to determine which one would stress the proposed change better. Could I use a better benchmark? Thank you! Best regards, Gustavo On 05-04-2016 14:23, Volker Simonis wrote: > Hi Gustavo, > > thanks a lot for your contribution. > > Can you please describe if you've run benchmarks and which performance > improvements you saw? > > With your change if we're running on Power 8, we will only use the > fast path for arrays with at least 32 elements. For smaller arrays, we > will fall-back to copying only 2 elements at a time which will be > slower than the initial version which copied 4 at a time in that case. > > Did you verified your changes on both, little and big endian? > > And what about unaligned memory accesses? As far as I read, > lxvd2x/stxvd2x still work, but may be slower. I saw there also exist > instructions for aligned load/stores. Would it make sens > (performance-wise) to use them for the cases where we can be sure that > we have aligned memory accesses? > > Thank you and best regards, > Volker > > > On Fri, Apr 1, 2016 at 10:36 PM, Gustavo Romero > wrote: >> Hi Martin, Hi Volker >> >> Currently VSX load/store instructions are not being used in PPC64 stubs, >> particularly in arraycopy stubs inside generate_arraycopy_stubs() like, >> but not limited to, generate_disjoint_{byte,short,int,long}_copy. >> >> We can speed up mass copy using VSX (Vector-Scalar Extension) load/store >> instruction in processors >= POWER8, the same way it's already done for >> libc memcpy(). >> >> This is an initial patch just for jshort_disjoint_arraycopy() VSX vector >> load/store: >> >> http://81.de.7a9f.ip4.static.sl-reverse.com/202539/webrev >> >> What are your thoughts on that? Is there any impediment to use VSX >> instructions in OpenJDK at the moment? >> >> Thank you. >> >> Best regards, >> Gustavo >> > From gromero at linux.vnet.ibm.com Wed May 11 21:26:43 2016 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 11 May 2016 18:26:43 -0300 Subject: JVM 24.95 SIGSEGV on C2 Compiler Thread In-Reply-To: <56EAC8EB.9020609@linux.vnet.ibm.com> References: <56EAB89B.9050206@linux.vnet.ibm.com> <56EABF93.2070707@oracle.com> <56EAC8EB.9020609@linux.vnet.ibm.com> Message-ID: <5733A393.8010005@linux.vnet.ibm.com> Hi Tobias Your wild guess on https://bugs.openjdk.java.net/browse/JDK-6675699 was correct. Loop peeling issue was the culprit. Also I finally was able to test exhaustively the code on OpenJDK 8 and could not reproduce it. Fixed on 8 as you said. Thanks a lot. Best regards, Gustavo On 17-03-2016 12:10, Gustavo Romero wrote: > Hi Tobias, > > I'll try to reproduce it with 8 on PPC64 and let you know about the > result. > > Thank you. > > Regards, > Gustavo > > On 17-03-2016 11:30, Tobias Hartmann wrote: >> Hi Gustavo, >> >> just a wild guess, but this could be one of >> https://bugs.openjdk.java.net/browse/JDK-6675699 >> https://bugs.openjdk.java.net/browse/JDK-8027388 >> >> Both were not backported to 7. Did you try to reproduce this with 8? >> >> Best regards, >> Tobias >> >> On 17.03.2016 15:00, Gustavo Romero wrote: >>> Hi Martin, >>> >>> I'm facing a problem with a JVM 24.95 when running an application. >>> However it's being hard to reproduce it, ie many times C2 will >>> optimize the method fine and the application terminates fine. Just >>> after many complete runs, one of them will crash. Apparently it is >>> related to https://bugs.openjdk.java.net/browse/JDK-7068051 bug, >>> but it was fixed in hs22 and as I could not isolate it on PPC64, >>> I can't tell if it still exists upstream on PPC64. >>> >>> Do you have any clue on how to isolate/debug this problem? >>> >>> Hotspot error log: >>> http://hastebin.com/raw/pepajuwepu >>> >>> Backtrace from the thread that caused the segfault: >>> http://hastebin.com/raw/zirelokuto >>> >>> Thank you! >>> >>> Best regards, >>> Gustavo >>> >> > From gromero at linux.vnet.ibm.com Wed May 11 22:32:45 2016 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Wed, 11 May 2016 19:32:45 -0300 Subject: SIGILL crashes JVM on PPC64 LE Message-ID: <5733B30D.6010201@linux.vnet.ibm.com> Hi I'm getting a nasty SIGILL that crashes the JVM on PPC64 LE. hs_err log: http://hastebin.com/raw/fovagunaci The application employs methods from both java.nio.ByteBuffer and sun.misc.Unsafe classes in order to write and read from an allocated buffer. A interesting thing is that after debugging the instruction that caused the said SIGILL: 0x3fff902839a4: cmpwi cr6,r17,0 0x3fff902839a8: beq cr6,0x3fff90283ae4 0x3fff902839ac: .long 0xea2f0013 <============ illegal instruction 0x3fff902839b0: add r15,r15,r17 0x3fff902839b4: add r14,r17,r14 I found that when its endianness is changed it turns out to be a valid instruction: vsel v24,v0,v5,v31 However, I'm still unable to determine if it's an application issue, something with JVM unsafe interface code, or something else. Any clue on how to narrow down this SIGILL? Thank you! Regards, Gustavo From david.holmes at oracle.com Wed May 11 22:50:21 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 12 May 2016 08:50:21 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> Message-ID: <53b7bb82-89f0-bac5-5e5a-3234ffecdb50@oracle.com> This has about 3 hours to be reviewed and pushed to make the FC deadline. David On 11/05/2016 2:41 PM, David Holmes wrote: > Adding hotspot-dev to cc to expand scope of reviewer pool :) > > On 11/05/2016 6:56 AM, David Holmes wrote: >> On 11/05/2016 12:27 AM, Doerr, Martin wrote: >>> Hello everybody, >>> >>> thanks for finding this issue. New webrev is here: >>> >>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/ >> >> Unfortunately my test run hit a crash on Solaris sparc: >> >> # Problematic frame: >> # V [libjvm.so+0xcc35c4] >> markOopDesc*markOopDesc::displaced_mark_helper()const+0x64 >> >> I'm going to have to do some more testing to see if that is actually >> related to the change. I know it should not be, but given we CAS marks I >> have to wonder if there's some subtle bad interaction here. :( > > Further testing has not shown any failures on Solaris sparc, and the > same testing showed some spurious failures on other platforms even > without these changes. So while I will file a bug for this crash I think > it unlikely to be related to the current changes. > > So on that note we just need a second hotspot reviewer to sign off on this. > > Thanks, > David > > >> David >> ----- >> >>> >>> >>> Best regards, >>> >>> Martin >>> >>> >>> >>> *From:*Hiroshi H Horii [mailto:HORII at jp.ibm.com] >>> *Sent:* Dienstag, 10. Mai 2016 15:18 >>> *To:* David Holmes >>> *Cc:* hotspot-gc-dev at openjdk.java.net; >>> hotspot-runtime-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; >>> Tim Ellison ; Doerr, Martin >>> >>> *Subject:* Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>> >>> >>> >>> Hi David, >>> >>>> Just need another reviewer to chime in - given you and Martin are both >>>> contributors. Or are you the main contributor with Martin being a >>>> reviewer? >>> >>> Martin and I are contributors of this change. >>> >>>> Still a problem on Solaris sparc: >>> >>> Martin, could you create a new change in webrev with the patch that >>> David sent? >>> >>> Regards, >>> Hiroshi >>> ----------------------- >>> Hiroshi Horii, Ph.D. >>> IBM Research - Tokyo >>> >>> >>> David Holmes > >>> wrote on 05/10/2016 21:29:53: >>> >>>> From: David Holmes >>> > >>>> To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-runtime- >>>> dev at openjdk.java.net " >>> >> > >>>> Cc: Tim Ellison >>> >, "ppc-aix-port- >>>> dev at openjdk.java.net " >>> >> >, "hotspot- >>>> gc-dev at openjdk.java.net " >>> >> > >>>> Date: 05/10/2016 21:31 >>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>>> >>>> On 10/05/2016 9:04 PM, David Holmes wrote: >>>> > Hi Hiroshi, >>>> > >>>> > On 10/05/2016 8:44 PM, Hiroshi H Horii wrote: >>>> >> Hi All, >>>> >> >>>> >> Can I please request reviews for the following change? >>>> >> >>>> >> Code change: >>>> >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/ >>>> > >>>> > Changes look good. I'm currently running them through our internal >>>> build >>>> > system. I will sponsor this and push the change through JPRT. >>>> >>>> Still a problem on Solaris sparc: >>>> >>>> "/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/ >>>> atomic.inline.hpp", >>>> line 96: Error: Could not find a match for static >>>> Atomic::cmpxchg(signed >>>> char, volatile signed char*, signed char). >>>> 1 Error(s) detected. >>>> >>>> Needs this patch: >>>> >>>> diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp >>>> --- a/src/share/vm/runtime/atomic.inline.hpp >>>> +++ b/src/share/vm/runtime/atomic.inline.hpp >>>> @@ -92,7 +92,7 @@ >>>> >>>> #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE >>>> // See comment in atomic.cpp how to override. >>>> -inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte >>>> *dest, jbyte comparand) >>>> +inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte >>>> *dest, jbyte comparand, cmpxchg_memory_order order) >>>> { >>>> return cmpxchg_general(exchange_value, dest, comparand); >>>> } >>>> >>>> David >>>> ----- >>>> >>>> > Just need another reviewer to chime in - given you and Martin are >>>> both >>>> > contributors. Or are you the main contributor with Martin being a >>>> reviewer? >>>> > >>>> > Thanks, >>>> > David >>>> > >>>> > PS. It's my night now so I'll be signing off and will pick this up in >>>> > the morning. >>>> > >>>> >> This change follows the discussion started from these mails. >>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>>> April/018960.html >>>> >> >>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>>> April/019148.html >>>> >> >>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>>> May/019320.html >>>> >> >>>> >> >>>> >> Description: >>>> >> This change provides relaxed compare-and-exchange by introducing >>>> >> relaxed memory order. As described in atomic_linux_ppc.inline.hpp, >>>> >> the current implementation of cmpxchg is fence_cmpxchg_acquire. >>>> >> This implementation is useful for general purposes because twice >>>> calls of >>>> >> sync before and after cmpxchg will provide strict consistency. >>>> >> However, they sometimes cause overheads because sync instructions >>>> are >>>> >> very expensive in the current POWER chip design. >>>> >> >>>> >> We confirmed this change improves performance of copy_to_survivor >>>> >> in the parallel GC. However, we will need more investigation of GC >>>> >> by more experts. So, We would like to request a review of the change >>>> >> of cmpxchg first (as Martin requested). >>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>>> April/019188.html >>>> >> >>>> >> >>>> >> Summary of source code changes: >>>> >> >>>> >> * src/share/vm/runtime/atomic.hpp >>>> >> - Defines enum memory_order and adds a parameter to cmpxchg. >>>> >> >>>> >> * src/share/vm/runtime/atomic.cpp >>>> >> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp >>>> >> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp >>>> >> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp >>>> >> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp >>>> >> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp >>>> >> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp >>>> >> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp >>>> >> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp >>>> >> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp >>>> >> - Added a parameter for each cmpxchg function to follow >>>> >> the change of atomic.hpp. Their implementations are not >>>> changed. >>>> >> >>>> >> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp >>>> >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >>>> >> - Added a parameter for each cmpxchg function to follow >>>> >> the change of atomic.hpp. In addition, implementations >>>> >> are changed corresponding to the specified memory_order. >>>> >> >>>> >> Regards, >>>> >> Hiroshi >>>> >> ----------------------- >>>> >> Hiroshi Horii, Ph.D. >>>> >> IBM Research - Tokyo >>>> >> >>>> >>> From tobias.hartmann at oracle.com Thu May 12 06:36:00 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 12 May 2016 08:36:00 +0200 Subject: JVM 24.95 SIGSEGV on C2 Compiler Thread In-Reply-To: <5733A393.8010005@linux.vnet.ibm.com> References: <56EAB89B.9050206@linux.vnet.ibm.com> <56EABF93.2070707@oracle.com> <56EAC8EB.9020609@linux.vnet.ibm.com> <5733A393.8010005@linux.vnet.ibm.com> Message-ID: <57342450.5050901@oracle.com> Hi Gustavo, On 11.05.2016 23:26, Gustavo Romero wrote: > Hi Tobias > > Your wild guess on > https://bugs.openjdk.java.net/browse/JDK-6675699 > was correct. Loop peeling issue was the culprit. > > Also I finally was able to test exhaustively the code on OpenJDK 8 > and could not reproduce it. Fixed on 8 as you said. Good, thanks for the update! [CC'ing hotspot-dev for the record] Best regards, Tobias > Thanks a lot. > > Best regards, > Gustavo > > > On 17-03-2016 12:10, Gustavo Romero wrote: >> Hi Tobias, >> >> I'll try to reproduce it with 8 on PPC64 and let you know about the >> result. >> >> Thank you. >> >> Regards, >> Gustavo >> >> On 17-03-2016 11:30, Tobias Hartmann wrote: >>> Hi Gustavo, >>> >>> just a wild guess, but this could be one of >>> https://bugs.openjdk.java.net/browse/JDK-6675699 >>> https://bugs.openjdk.java.net/browse/JDK-8027388 >>> >>> Both were not backported to 7. Did you try to reproduce this with 8? >>> >>> Best regards, >>> Tobias >>> >>> On 17.03.2016 15:00, Gustavo Romero wrote: >>>> Hi Martin, >>>> >>>> I'm facing a problem with a JVM 24.95 when running an application. >>>> However it's being hard to reproduce it, ie many times C2 will >>>> optimize the method fine and the application terminates fine. Just >>>> after many complete runs, one of them will crash. Apparently it is >>>> related to https://bugs.openjdk.java.net/browse/JDK-7068051 bug, >>>> but it was fixed in hs22 and as I could not isolate it on PPC64, >>>> I can't tell if it still exists upstream on PPC64. >>>> >>>> Do you have any clue on how to isolate/debug this problem? >>>> >>>> Hotspot error log: >>>> http://hastebin.com/raw/pepajuwepu >>>> >>>> Backtrace from the thread that caused the segfault: >>>> http://hastebin.com/raw/zirelokuto >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> Gustavo >>>> >>> >> > From goetz.lindenmaier at sap.com Thu May 12 08:50:09 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 12 May 2016 08:50:09 +0000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <53b7bb82-89f0-bac5-5e5a-3234ffecdb50@oracle.com> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <53b7bb82-89f0-bac5-5e5a-3234ffecdb50@oracle.com> Message-ID: Hi, atomic_bsd_zero.inline.hpp:303 The order argument is not passed on to the inner cmpxchg_ptr call. But I guess this is not really relevant as the argument is not used anyways. (This method should be moved to the shared atomic.inline.hpp file, but not in this change.) Besides that the change looks good. Reviewed. In case this now really is too late (http://openjdk.java.net/projects/jdk9/ states 26.5. for dev close, but hs is more early?) will there be jdk10 repos soon, or jdk9u? Best regards, Goetz. > -----Original Message----- > From: hotspot-gc-dev [mailto:hotspot-gc-dev-bounces at openjdk.java.net] > On Behalf Of David Holmes > Sent: Donnerstag, 12. Mai 2016 00:50 > To: Doerr, Martin ; Hiroshi H Horii > > Cc: Tim Ellison ; ppc-aix-port- > dev at openjdk.java.net; hotspot-dev developers dev at openjdk.java.net>; hotspot-gc-dev at openjdk.java.net; hotspot- > runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > > This has about 3 hours to be reviewed and pushed to make the FC deadline. > > David > > On 11/05/2016 2:41 PM, David Holmes wrote: > > Adding hotspot-dev to cc to expand scope of reviewer pool :) > > > > On 11/05/2016 6:56 AM, David Holmes wrote: > >> On 11/05/2016 12:27 AM, Doerr, Martin wrote: > >>> Hello everybody, > >>> > >>> thanks for finding this issue. New webrev is here: > >>> > >>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/ > >> > >> Unfortunately my test run hit a crash on Solaris sparc: > >> > >> # Problematic frame: > >> # V [libjvm.so+0xcc35c4] > >> markOopDesc*markOopDesc::displaced_mark_helper()const+0x64 > >> > >> I'm going to have to do some more testing to see if that is actually > >> related to the change. I know it should not be, but given we CAS marks I > >> have to wonder if there's some subtle bad interaction here. :( > > > > Further testing has not shown any failures on Solaris sparc, and the > > same testing showed some spurious failures on other platforms even > > without these changes. So while I will file a bug for this crash I think > > it unlikely to be related to the current changes. > > > > So on that note we just need a second hotspot reviewer to sign off on this. > > > > Thanks, > > David > > > > > >> David > >> ----- > >> > >>> > >>> > >>> Best regards, > >>> > >>> Martin > >>> > >>> > >>> > >>> *From:*Hiroshi H Horii [mailto:HORII at jp.ibm.com] > >>> *Sent:* Dienstag, 10. Mai 2016 15:18 > >>> *To:* David Holmes > >>> *Cc:* hotspot-gc-dev at openjdk.java.net; > >>> hotspot-runtime-dev at openjdk.java.net; ppc-aix-port- > dev at openjdk.java.net; > >>> Tim Ellison ; Doerr, Martin > >>> > >>> *Subject:* Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > >>> > >>> > >>> > >>> Hi David, > >>> > >>>> Just need another reviewer to chime in - given you and Martin are both > >>>> contributors. Or are you the main contributor with Martin being a > >>>> reviewer? > >>> > >>> Martin and I are contributors of this change. > >>> > >>>> Still a problem on Solaris sparc: > >>> > >>> Martin, could you create a new change in webrev with the patch that > >>> David sent? > >>> > >>> Regards, > >>> Hiroshi > >>> ----------------------- > >>> Hiroshi Horii, Ph.D. > >>> IBM Research - Tokyo > >>> > >>> > >>> David Holmes > > >>> wrote on 05/10/2016 21:29:53: > >>> > >>>> From: David Holmes >>>> > > >>>> To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-runtime- > >>>> dev at openjdk.java.net " > >>> >>> > > >>>> Cc: Tim Ellison >>>> >, "ppc-aix-port- > >>>> dev at openjdk.java.net " > >>> >>> >, "hotspot- > >>>> gc-dev at openjdk.java.net " > >>> >>> > > >>>> Date: 05/10/2016 21:31 > >>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > >>>> > >>>> On 10/05/2016 9:04 PM, David Holmes wrote: > >>>> > Hi Hiroshi, > >>>> > > >>>> > On 10/05/2016 8:44 PM, Hiroshi H Horii wrote: > >>>> >> Hi All, > >>>> >> > >>>> >> Can I please request reviews for the following change? > >>>> >> > >>>> >> Code change: > >>>> >> > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/ > >>>> > > >>>> > Changes look good. I'm currently running them through our internal > >>>> build > >>>> > system. I will sponsor this and push the change through JPRT. > >>>> > >>>> Still a problem on Solaris sparc: > >>>> > >>>> "/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/ > >>>> atomic.inline.hpp", > >>>> line 96: Error: Could not find a match for static > >>>> Atomic::cmpxchg(signed > >>>> char, volatile signed char*, signed char). > >>>> 1 Error(s) detected. > >>>> > >>>> Needs this patch: > >>>> > >>>> diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp > >>>> --- a/src/share/vm/runtime/atomic.inline.hpp > >>>> +++ b/src/share/vm/runtime/atomic.inline.hpp > >>>> @@ -92,7 +92,7 @@ > >>>> > >>>> #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE > >>>> // See comment in atomic.cpp how to override. > >>>> -inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte > >>>> *dest, jbyte comparand) > >>>> +inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte > >>>> *dest, jbyte comparand, cmpxchg_memory_order order) > >>>> { > >>>> return cmpxchg_general(exchange_value, dest, comparand); > >>>> } > >>>> > >>>> David > >>>> ----- > >>>> > >>>> > Just need another reviewer to chime in - given you and Martin are > >>>> both > >>>> > contributors. Or are you the main contributor with Martin being a > >>>> reviewer? > >>>> > > >>>> > Thanks, > >>>> > David > >>>> > > >>>> > PS. It's my night now so I'll be signing off and will pick this up in > >>>> > the morning. > >>>> > > >>>> >> This change follows the discussion started from these mails. > >>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > >>>> April/018960.html > >>>> >> > >>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > >>>> April/019148.html > >>>> >> > >>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > >>>> May/019320.html > >>>> >> > >>>> >> > >>>> >> Description: > >>>> >> This change provides relaxed compare-and-exchange by introducing > >>>> >> relaxed memory order. As described in > atomic_linux_ppc.inline.hpp, > >>>> >> the current implementation of cmpxchg is fence_cmpxchg_acquire. > >>>> >> This implementation is useful for general purposes because twice > >>>> calls of > >>>> >> sync before and after cmpxchg will provide strict consistency. > >>>> >> However, they sometimes cause overheads because sync > instructions > >>>> are > >>>> >> very expensive in the current POWER chip design. > >>>> >> > >>>> >> We confirmed this change improves performance of > copy_to_survivor > >>>> >> in the parallel GC. However, we will need more investigation of GC > >>>> >> by more experts. So, We would like to request a review of the > change > >>>> >> of cmpxchg first (as Martin requested). > >>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- > >>>> April/019188.html > >>>> >> > >>>> >> > >>>> >> Summary of source code changes: > >>>> >> > >>>> >> * src/share/vm/runtime/atomic.hpp > >>>> >> - Defines enum memory_order and adds a parameter to > cmpxchg. > >>>> >> > >>>> >> * src/share/vm/runtime/atomic.cpp > >>>> >> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp > >>>> >> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > >>>> >> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp > >>>> >> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp > >>>> >> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp > >>>> >> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp > >>>> >> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp > >>>> >> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp > >>>> >> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp > >>>> >> - Added a parameter for each cmpxchg function to follow > >>>> >> the change of atomic.hpp. Their implementations are not > >>>> changed. > >>>> >> > >>>> >> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp > >>>> >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > >>>> >> - Added a parameter for each cmpxchg function to follow > >>>> >> the change of atomic.hpp. In addition, implementations > >>>> >> are changed corresponding to the specified memory_order. > >>>> >> > >>>> >> Regards, > >>>> >> Hiroshi > >>>> >> ----------------------- > >>>> >> Hiroshi Horii, Ph.D. > >>>> >> IBM Research - Tokyo > >>>> >> > >>>> > >>> From martin.doerr at sap.com Thu May 12 09:33:03 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 12 May 2016 09:33:03 +0000 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: <57339EE1.2040500@linux.vnet.ibm.com> References: <56FEDBB3.5030106@linux.vnet.ibm.com> <57339EE1.2040500@linux.vnet.ibm.com> Message-ID: Hi Gustavo, thanks for providing the webrevs. The change looks basically good. I only have the following concerns: - We basically support configuring dscr by various DSCR switches. Your code resets the value to hardware default instead of the possibly modified values. We're currently only using default DSCR values, but maybe we may want to play with them in the future. We could use a static variable for the default dscr value. It could be modified in VM_Version::config_dscr() and used by your restore code (load_const_optimized(tmp1, ...) instead of li(tmp1, 0)). - The PPC-elf64abi-1.9 says: "Functions must ensure that the appropriate bits in the vrsave register are set for any vector registers they use. ...". I think not touching vrsave is the right thing for AIX and ppc64le, but I think we will either have to skip the optimization on ppc64 big endian or handle vrsave. Do you agree? Best regards, Martin -----Original Message----- From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] Sent: Mittwoch, 11. Mai 2016 23:07 To: Volker Simonis Cc: Doerr, Martin ; Simonis, Volker ; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; brenohl at br.ibm.com Subject: Re: PPC64 VSX load/store instructions in stubs Importance: High Hi Volker, Hi Martin Sincere apologies for the long delay. My initial approach to test the VSX load/store was from an extracted snippet regarding just the mass copy loop "grafted" inside an inline asm, performing isolated tests with "perf" tool focused only on aligned source and destination (best case). The extracted code, called "Original" in the plot below (black line), is here: https://github.com/gromero/arraycopy/blob/2pairs/arraycopy.c#L27-L36 That extracted, after some experiments, evolved into this one that employs VSX load/store, Data Stream deepest pre-fetch, d-cache touch, and backbranch aligned to 32-byte: https://github.com/gromero/arraycopy/blob/2pairs/arraycopy_vsx.c#L27-L41 All runs where "pinned" using `numactl --cpunodebind --membind` to avoid any scheduler decision that could add noise to the measure. VSX, deepest data pre-fetch, d-cache touch, and 32-bytes align proved to be better in the isolated code (red line) in comparison to the original extracted code (black line): http://gromero.github.io/openjdk/original_vsx_non_pf_vsx_pf_deepest.pdf So I proceeded to implement the VSX loop in OpenJDK based on the best case result (VSX, pre-fetch deepest, d-cache touch, and backbranch target align - goetz TODO note). OpenJDK 8 webrev: http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/8/ OpenJDK 9 webrev: http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/9/ I've tested the change on OpenJDK 8 using this script that calls System.arraycopy() on shorts: https://goo.gl/8UWtLm The results for all data alignment cases: http://gromero.github.io/openjdk/src_0_dst_0.pdf http://gromero.github.io/openjdk/src_1_dst_0.pdf http://gromero.github.io/openjdk/src_0_dst_1.pdf http://gromero.github.io/openjdk/src_1_dst_1.pdf Martin, I added the vsx test to the feature-string. Regarding the ABI, I'm just using two VSR: vsr0 and vsr1, both volatile. Volker, as the loop unrolling was removed now the loop copies 16 elemets a time, like the non-VSX loop, and not 32 elements. I just verified the change on Little endian. Sorry I didn't understand your question regarding "instructions for aligned load/stores". Did you mean instructions for unaligned load/stores? I think both fixed-point (ld/std) and VSX instructions will do load/store slower in unaligned scenario. However VMX load/store is different and expects aligned operands. Thank you very much for opening the bug https://bugs.openjdk.java.net/browse/JDK-8154156 I don't have the profiling per function for each SPEC{jbb,jvm} benchmark in order to determine which one would stress the proposed change better. Could I use a better benchmark? Thank you! Best regards, Gustavo On 05-04-2016 14:23, Volker Simonis wrote: > Hi Gustavo, > > thanks a lot for your contribution. > > Can you please describe if you've run benchmarks and which performance > improvements you saw? > > With your change if we're running on Power 8, we will only use the > fast path for arrays with at least 32 elements. For smaller arrays, we > will fall-back to copying only 2 elements at a time which will be > slower than the initial version which copied 4 at a time in that case. > > Did you verified your changes on both, little and big endian? > > And what about unaligned memory accesses? As far as I read, > lxvd2x/stxvd2x still work, but may be slower. I saw there also exist > instructions for aligned load/stores. Would it make sens > (performance-wise) to use them for the cases where we can be sure that > we have aligned memory accesses? > > Thank you and best regards, > Volker > > > On Fri, Apr 1, 2016 at 10:36 PM, Gustavo Romero > wrote: >> Hi Martin, Hi Volker >> >> Currently VSX load/store instructions are not being used in PPC64 stubs, >> particularly in arraycopy stubs inside generate_arraycopy_stubs() like, >> but not limited to, generate_disjoint_{byte,short,int,long}_copy. >> >> We can speed up mass copy using VSX (Vector-Scalar Extension) load/store >> instruction in processors >= POWER8, the same way it's already done for >> libc memcpy(). >> >> This is an initial patch just for jshort_disjoint_arraycopy() VSX vector >> load/store: >> >> http://81.de.7a9f.ip4.static.sl-reverse.com/202539/webrev >> >> What are your thoughts on that? Is there any impediment to use VSX >> instructions in OpenJDK at the moment? >> >> Thank you. >> >> Best regards, >> Gustavo >> > From david.holmes at oracle.com Thu May 12 09:52:14 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 12 May 2016 19:52:14 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <53b7bb82-89f0-bac5-5e5a-3234ffecdb50@oracle.com> Message-ID: <63c2519a-7909-804c-b30c-bd8ee814a328@oracle.com> On 12/05/2016 6:50 PM, Lindenmaier, Goetz wrote: > Hi, > > atomic_bsd_zero.inline.hpp:303 > The order argument is not passed on to the inner cmpxchg_ptr call. > But I guess this is not really relevant as the argument is not used Right - that pattern is used throughout the changes. > anyways. (This method should be moved to the shared atomic.inline.hpp > file, but not in this change.) > > Besides that the change looks good. Reviewed. > > In case this now really is too late (http://openjdk.java.net/projects/jdk9/ states 26.5. for dev close, but hs is more early?) > will there be jdk10 repos soon, or jdk9u? Yes hs has to finalize sooner as the FC date is for things to be in jdk9/jdk9 and it takes time for changes to get from hs to jdk9. There will be a process for requesting approval for changes post FC but that hasn't yet been announced either. No word yet on when jdk10 forests will open up. David ----- > Best regards, > Goetz. > > > > > >> -----Original Message----- >> From: hotspot-gc-dev [mailto:hotspot-gc-dev-bounces at openjdk.java.net] >> On Behalf Of David Holmes >> Sent: Donnerstag, 12. Mai 2016 00:50 >> To: Doerr, Martin ; Hiroshi H Horii >> >> Cc: Tim Ellison ; ppc-aix-port- >> dev at openjdk.java.net; hotspot-dev developers > dev at openjdk.java.net>; hotspot-gc-dev at openjdk.java.net; hotspot- >> runtime-dev at openjdk.java.net >> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >> >> This has about 3 hours to be reviewed and pushed to make the FC deadline. >> >> David >> >> On 11/05/2016 2:41 PM, David Holmes wrote: >>> Adding hotspot-dev to cc to expand scope of reviewer pool :) >>> >>> On 11/05/2016 6:56 AM, David Holmes wrote: >>>> On 11/05/2016 12:27 AM, Doerr, Martin wrote: >>>>> Hello everybody, >>>>> >>>>> thanks for finding this issue. New webrev is here: >>>>> >>>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/ >>>> >>>> Unfortunately my test run hit a crash on Solaris sparc: >>>> >>>> # Problematic frame: >>>> # V [libjvm.so+0xcc35c4] >>>> markOopDesc*markOopDesc::displaced_mark_helper()const+0x64 >>>> >>>> I'm going to have to do some more testing to see if that is actually >>>> related to the change. I know it should not be, but given we CAS marks I >>>> have to wonder if there's some subtle bad interaction here. :( >>> >>> Further testing has not shown any failures on Solaris sparc, and the >>> same testing showed some spurious failures on other platforms even >>> without these changes. So while I will file a bug for this crash I think >>> it unlikely to be related to the current changes. >>> >>> So on that note we just need a second hotspot reviewer to sign off on this. >>> >>> Thanks, >>> David >>> >>> >>>> David >>>> ----- >>>> >>>>> >>>>> >>>>> Best regards, >>>>> >>>>> Martin >>>>> >>>>> >>>>> >>>>> *From:*Hiroshi H Horii [mailto:HORII at jp.ibm.com] >>>>> *Sent:* Dienstag, 10. Mai 2016 15:18 >>>>> *To:* David Holmes >>>>> *Cc:* hotspot-gc-dev at openjdk.java.net; >>>>> hotspot-runtime-dev at openjdk.java.net; ppc-aix-port- >> dev at openjdk.java.net; >>>>> Tim Ellison ; Doerr, Martin >>>>> >>>>> *Subject:* Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>>>> >>>>> >>>>> >>>>> Hi David, >>>>> >>>>>> Just need another reviewer to chime in - given you and Martin are both >>>>>> contributors. Or are you the main contributor with Martin being a >>>>>> reviewer? >>>>> >>>>> Martin and I are contributors of this change. >>>>> >>>>>> Still a problem on Solaris sparc: >>>>> >>>>> Martin, could you create a new change in webrev with the patch that >>>>> David sent? >>>>> >>>>> Regards, >>>>> Hiroshi >>>>> ----------------------- >>>>> Hiroshi Horii, Ph.D. >>>>> IBM Research - Tokyo >>>>> >>>>> >>>>> David Holmes > > >>>>> wrote on 05/10/2016 21:29:53: >>>>> >>>>>> From: David Holmes >>>>> > >>>>>> To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-runtime- >>>>>> dev at openjdk.java.net " >>>>> >>>> > >>>>>> Cc: Tim Ellison >>>>> >, "ppc-aix-port- >>>>>> dev at openjdk.java.net " >>>>> >>>> >, "hotspot- >>>>>> gc-dev at openjdk.java.net " >>>>> >>>> > >>>>>> Date: 05/10/2016 21:31 >>>>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>>>>> >>>>>> On 10/05/2016 9:04 PM, David Holmes wrote: >>>>>>> Hi Hiroshi, >>>>>>> >>>>>>> On 10/05/2016 8:44 PM, Hiroshi H Horii wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> Can I please request reviews for the following change? >>>>>>>> >>>>>>>> Code change: >>>>>>>> >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/ >>>>>>> >>>>>>> Changes look good. I'm currently running them through our internal >>>>>> build >>>>>>> system. I will sponsor this and push the change through JPRT. >>>>>> >>>>>> Still a problem on Solaris sparc: >>>>>> >>>>>> "/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/ >>>>>> atomic.inline.hpp", >>>>>> line 96: Error: Could not find a match for static >>>>>> Atomic::cmpxchg(signed >>>>>> char, volatile signed char*, signed char). >>>>>> 1 Error(s) detected. >>>>>> >>>>>> Needs this patch: >>>>>> >>>>>> diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp >>>>>> --- a/src/share/vm/runtime/atomic.inline.hpp >>>>>> +++ b/src/share/vm/runtime/atomic.inline.hpp >>>>>> @@ -92,7 +92,7 @@ >>>>>> >>>>>> #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE >>>>>> // See comment in atomic.cpp how to override. >>>>>> -inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte >>>>>> *dest, jbyte comparand) >>>>>> +inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte >>>>>> *dest, jbyte comparand, cmpxchg_memory_order order) >>>>>> { >>>>>> return cmpxchg_general(exchange_value, dest, comparand); >>>>>> } >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> Just need another reviewer to chime in - given you and Martin are >>>>>> both >>>>>>> contributors. Or are you the main contributor with Martin being a >>>>>> reviewer? >>>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> PS. It's my night now so I'll be signing off and will pick this up in >>>>>>> the morning. >>>>>>> >>>>>>>> This change follows the discussion started from these mails. >>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>>>>> April/018960.html >>>>>>>> >>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>>>>> April/019148.html >>>>>>>> >>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>>>>> May/019320.html >>>>>>>> >>>>>>>> >>>>>>>> Description: >>>>>>>> This change provides relaxed compare-and-exchange by introducing >>>>>>>> relaxed memory order. As described in >> atomic_linux_ppc.inline.hpp, >>>>>>>> the current implementation of cmpxchg is fence_cmpxchg_acquire. >>>>>>>> This implementation is useful for general purposes because twice >>>>>> calls of >>>>>>>> sync before and after cmpxchg will provide strict consistency. >>>>>>>> However, they sometimes cause overheads because sync >> instructions >>>>>> are >>>>>>>> very expensive in the current POWER chip design. >>>>>>>> >>>>>>>> We confirmed this change improves performance of >> copy_to_survivor >>>>>>>> in the parallel GC. However, we will need more investigation of GC >>>>>>>> by more experts. So, We would like to request a review of the >> change >>>>>>>> of cmpxchg first (as Martin requested). >>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016- >>>>>> April/019188.html >>>>>>>> >>>>>>>> >>>>>>>> Summary of source code changes: >>>>>>>> >>>>>>>> * src/share/vm/runtime/atomic.hpp >>>>>>>> - Defines enum memory_order and adds a parameter to >> cmpxchg. >>>>>>>> >>>>>>>> * src/share/vm/runtime/atomic.cpp >>>>>>>> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp >>>>>>>> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp >>>>>>>> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp >>>>>>>> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp >>>>>>>> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp >>>>>>>> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp >>>>>>>> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp >>>>>>>> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp >>>>>>>> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp >>>>>>>> - Added a parameter for each cmpxchg function to follow >>>>>>>> the change of atomic.hpp. Their implementations are not >>>>>> changed. >>>>>>>> >>>>>>>> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp >>>>>>>> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >>>>>>>> - Added a parameter for each cmpxchg function to follow >>>>>>>> the change of atomic.hpp. In addition, implementations >>>>>>>> are changed corresponding to the specified memory_order. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Hiroshi >>>>>>>> ----------------------- >>>>>>>> Hiroshi Horii, Ph.D. >>>>>>>> IBM Research - Tokyo >>>>>>>> >>>>>> >>>>> From volker.simonis at gmail.com Thu May 12 12:26:59 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Thu, 12 May 2016 14:26:59 +0200 Subject: SIGILL crashes JVM on PPC64 LE In-Reply-To: <5733B30D.6010201@linux.vnet.ibm.com> References: <5733B30D.6010201@linux.vnet.ibm.com> Message-ID: Hi Gustavo, thanks for the bug report. The hs_err file you provided indicates that this crash happened with Ubuntu's openjdk 8 version. Can you still reproduce this with the the newest jdk9 builds? Also, I can see from the hs_err file that the crash happened in the C2 compiled method java.util.TimSort.countRunAndMakeAscending which doesn't seem to be related to nio and unsafe. Ideally, you could post an easy test case to reproduce the problem. If that's not possible, it would be helpful if you could post the output of a failing run with "-XX:CompileCommand=print,java.util.TimSort::countRunAndMakeAscending -XX:CompileCommand=option,java.util.TimSort::countRunAndMakeAscending,PrintOptoAssembly". In order to get the disassembly output for compiled methods you have to build the hsdis library from hotspot/src/share/tools/hsdis (it has a README with build instructions). Regards, Volker On Thu, May 12, 2016 at 12:32 AM, Gustavo Romero wrote: > Hi > > I'm getting a nasty SIGILL that crashes the JVM on PPC64 LE. > > hs_err log: > http://hastebin.com/raw/fovagunaci > > The application employs methods from both java.nio.ByteBuffer and > sun.misc.Unsafe classes in order to write and read from an allocated buffer. > > A interesting thing is that after debugging the instruction that caused the > said SIGILL: > > 0x3fff902839a4: cmpwi cr6,r17,0 > 0x3fff902839a8: beq cr6,0x3fff90283ae4 > 0x3fff902839ac: .long 0xea2f0013 <============ illegal instruction > 0x3fff902839b0: add r15,r15,r17 > 0x3fff902839b4: add r14,r17,r14 > > I found that when its endianness is changed it turns out to be a valid > instruction: vsel v24,v0,v5,v31 > > However, I'm still unable to determine if it's an application issue, something > with JVM unsafe interface code, or something else. > > Any clue on how to narrow down this SIGILL? > > Thank you! > > Regards, > Gustavo > From volker.simonis at gmail.com Thu May 12 12:39:41 2016 From: volker.simonis at gmail.com (Volker Simonis) Date: Thu, 12 May 2016 14:39:41 +0200 Subject: SIGILL crashes JVM on PPC64 LE In-Reply-To: References: <5733B30D.6010201@linux.vnet.ibm.com> Message-ID: And I forgot to mention: I've checked and we don't emit vsel instructions in jdk8 on ppc. So it must be a coincidence that changing the endianess of the offending instruction yields a valid 'vsel' instruction. On Thu, May 12, 2016 at 2:26 PM, Volker Simonis wrote: > Hi Gustavo, > > thanks for the bug report. The hs_err file you provided indicates that > this crash happened with Ubuntu's openjdk 8 version. Can you still > reproduce this with the the newest jdk9 builds? > > Also, I can see from the hs_err file that the crash happened in the C2 > compiled method java.util.TimSort.countRunAndMakeAscending which > doesn't seem to be related to nio and unsafe. > > Ideally, you could post an easy test case to reproduce the problem. If > that's not possible, it would be helpful if you could post the output > of a failing run with > "-XX:CompileCommand=print,java.util.TimSort::countRunAndMakeAscending > -XX:CompileCommand=option,java.util.TimSort::countRunAndMakeAscending,PrintOptoAssembly". > In order to get the disassembly output for compiled methods you have > to build the hsdis library from hotspot/src/share/tools/hsdis (it has > a README with build instructions). > > Regards, > Volker > > > On Thu, May 12, 2016 at 12:32 AM, Gustavo Romero > wrote: >> Hi >> >> I'm getting a nasty SIGILL that crashes the JVM on PPC64 LE. >> >> hs_err log: >> http://hastebin.com/raw/fovagunaci >> >> The application employs methods from both java.nio.ByteBuffer and >> sun.misc.Unsafe classes in order to write and read from an allocated buffer. >> >> A interesting thing is that after debugging the instruction that caused the >> said SIGILL: >> >> 0x3fff902839a4: cmpwi cr6,r17,0 >> 0x3fff902839a8: beq cr6,0x3fff90283ae4 >> 0x3fff902839ac: .long 0xea2f0013 <============ illegal instruction >> 0x3fff902839b0: add r15,r15,r17 >> 0x3fff902839b4: add r14,r17,r14 >> >> I found that when its endianness is changed it turns out to be a valid >> instruction: vsel v24,v0,v5,v31 >> >> However, I'm still unable to determine if it's an application issue, something >> with JVM unsafe interface code, or something else. >> >> Any clue on how to narrow down this SIGILL? >> >> Thank you! >> >> Regards, >> Gustavo >> From ENOMIKI at jp.ibm.com Mon May 16 05:53:48 2016 From: ENOMIKI at jp.ibm.com (Miki M Enoki) Date: Mon, 16 May 2016 14:53:48 +0900 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: References: <56FEDBB3.5030106@linux.vnet.ibm.com><57339EE1.2040500@linux.vnet.ibm.com> Message-ID: <201605160554.u4G5s3nn030257@d19av05.sagamino.japan.ibm.com> Dear Gustavo, Volker, and Martin I also implemented VSX disjoint long arraycopy. I appreciate it if it is applied to OpenJDK, too. The performance was almost better than the original code. VSX(max) means aligned case, while VSX(min) is unaligned case. In addition, VMX can be better if unaligned. http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160417/fb12037e/result-0001.jpg The benchmark code is here. http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160417/fb12037e/ArrayCopyTest1-0001.java Server: 8247-22L (POWER8 (3.3GHz 12 cores) x2, 512GB memory), Ubuntu Linux 15.04 ppc64LE (kernel: 3.19.0-18-generic), OpenJDK (build based on 1.9), JVMARGS: ?-Xmx40g ?Xms40g -Xmn20g" created patches are for Java9. http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160417/fb12037e/ppc64le_vsx-0001.diff http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160417/fb12037e/ppc64le_vmx-0001.diff I would appreciate your comments. Best regards, Miki "ppc-aix-port-dev" wrote on 2016/05/12 18:33:03: > From: "Doerr, Martin" > To: Gustavo Romero , Volker Simonis > > Cc: "Simonis, Volker" , "ppc-aix-port- > dev at openjdk.java.net" , "hotspot- > dev at openjdk.java.net" , > "brenohl at br.ibm.com" > Date: 2016/05/12 18:34 > Subject: RE: PPC64 VSX load/store instructions in stubs > Sent by: "ppc-aix-port-dev" > > Hi Gustavo, > > thanks for providing the webrevs. The change looks basically good. > > I only have the following concerns: > - We basically support configuring dscr by various DSCR switches. > Your code resets the value to hardware default instead of the > possibly modified values. We're currently only using default DSCR > values, but maybe we may want to play with them in the future. > We could use a static variable for the default dscr value. It could > be modified in VM_Version::config_dscr() and used by your restore > code (load_const_optimized(tmp1, ...) instead of li(tmp1, 0)). > > - The PPC-elf64abi-1.9 says: "Functions must ensure that the > appropriate bits in the vrsave register are set for any vector > registers they use. ...". I think not touching vrsave is the right > thing for AIX and ppc64le, but I think we will either have to skip > the optimization on ppc64 big endian or handle vrsave. Do you agree? > > Best regards, > Martin > > > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Mittwoch, 11. Mai 2016 23:07 > To: Volker Simonis > Cc: Doerr, Martin ; Simonis, Volker > ; ppc-aix-port-dev at openjdk.java.net; > hotspot-dev at openjdk.java.net; brenohl at br.ibm.com > Subject: Re: PPC64 VSX load/store instructions in stubs > Importance: High > > Hi Volker, Hi Martin > > Sincere apologies for the long delay. > > My initial approach to test the VSX load/store was from an > extracted snippet regarding just the mass copy loop "grafted" insidean inline > asm, performing isolated tests with "perf" tool focused only on > aligned source and > destination (best case). > > The extracted code, called "Original" in the plot below (black line), is here: > https://github.com/gromero/arraycopy/blob/2pairs/arraycopy.c#L27-L36 > > That extracted, after some experiments, evolved into this one that employs VSX > load/store, Data Stream deepest pre-fetch, d-cache touch, and > backbranch aligned > to 32-byte: > https://github.com/gromero/arraycopy/blob/2pairs/arraycopy_vsx.c#L27-L41 > > All runs where "pinned" using `numactl --cpunodebind --membind` to avoid any > scheduler decision that could add noise to the measure. > > VSX, deepest data pre-fetch, d-cache touch, and 32-bytes align > proved to be better > in the isolated code (red line) in comparison to the original extracted code > (black line): > http://gromero.github.io/openjdk/original_vsx_non_pf_vsx_pf_deepest.pdf > > So I proceeded to implement the VSX loop in OpenJDK based on the best case > result (VSX, pre-fetch deepest, d-cache touch, and backbranch target align - > goetz TODO note). > > OpenJDK 8 webrev: > http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/8/ > > OpenJDK 9 webrev: > http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/9/ > > I've tested the change on OpenJDK 8 using this script that calls > System.arraycopy() on shorts: > https://goo.gl/8UWtLm > > The results for all data alignment cases: > http://gromero.github.io/openjdk/src_0_dst_0.pdf > http://gromero.github.io/openjdk/src_1_dst_0.pdf > http://gromero.github.io/openjdk/src_0_dst_1.pdf > http://gromero.github.io/openjdk/src_1_dst_1.pdf > > Martin, I added the vsx test to the feature-string. Regarding the > ABI, I'm just > using two VSR: vsr0 and vsr1, both volatile. > > Volker, as the loop unrolling was removed now the loop copies 16 > elemets a time, > like the non-VSX loop, and not 32 elements. I just verified the > change on Little > endian. Sorry I didn't understand your question regarding "instructions for > aligned load/stores". Did you mean instructions for unaligned load/ > stores? I think > both fixed-point (ld/std) and VSX instructions will do load/store slower in > unaligned scenario. However VMX load/store is different and expects aligned > operands. Thank you very much for opening the bug > https://bugs.openjdk.java.net/browse/JDK-8154156 > > I don't have the profiling per function for each SPEC{jbb,jvm} benchmark > in order to determine which one would stress the proposed change better. > Could I use a better benchmark? > > Thank you! > > Best regards, > Gustavo > > On 05-04-2016 14:23, Volker Simonis wrote: > > Hi Gustavo, > > > > thanks a lot for your contribution. > > > > Can you please describe if you've run benchmarks and which performance > > improvements you saw? > > > > With your change if we're running on Power 8, we will only use the > > fast path for arrays with at least 32 elements. For smaller arrays, we > > will fall-back to copying only 2 elements at a time which will be > > slower than the initial version which copied 4 at a time in that case. > > > > Did you verified your changes on both, little and big endian? > > > > And what about unaligned memory accesses? As far as I read, > > lxvd2x/stxvd2x still work, but may be slower. I saw there also exist > > instructions for aligned load/stores. Would it make sens > > (performance-wise) to use them for the cases where we can be sure that > > we have aligned memory accesses? > > > > Thank you and best regards, > > Volker > > > > > > On Fri, Apr 1, 2016 at 10:36 PM, Gustavo Romero > > wrote: > >> Hi Martin, Hi Volker > >> > >> Currently VSX load/store instructions are not being used in PPC64 stubs, > >> particularly in arraycopy stubs inside generate_arraycopy_stubs() like, > >> but not limited to, generate_disjoint_{byte,short,int,long}_copy. > >> > >> We can speed up mass copy using VSX (Vector-Scalar Extension) load/store > >> instruction in processors >= POWER8, the same way it's already done for > >> libc memcpy(). > >> > >> This is an initial patch just for jshort_disjoint_arraycopy() VSX vector > >> load/store: > >> > >> http://81.de.7a9f.ip4.static.sl-reverse.com/202539/webrev > >> > >> What are your thoughts on that? Is there any impediment to use VSX > >> instructions in OpenJDK at the moment? > >> > >> Thank you. > >> > >> Best regards, > >> Gustavo > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brenohl at br.ibm.com Mon May 16 17:28:44 2016 From: brenohl at br.ibm.com (Breno Leitao) Date: Mon, 16 May 2016 14:28:44 -0300 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: References: <56FEDBB3.5030106@linux.vnet.ibm.com><57339EE1.2040500@linux.vnet.ibm.com> Message-ID: <573A034C.9060602@br.ibm.com> Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote: > I also implemented VSX disjoint long arraycopy. > I appreciate it if it is applied to OpenJDK, too. Thanks for the summarized information, this is helpful. Based on your plot, I understand we can split the whole scenario in two: * Array size smaller than 4k, and then use VSX instructions to perform copy * Array size bigger than 4k, and then use VMX instructions to perform copy The same mechanism could be used to copy arrays of short elements, as Gustavo was working on. Do you agree? That said, I understand that a new patch should be generated that contemplates both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence a webrev should be generated mapping to bug id https://bugs.openjdk.java.net/browse/JDK-8154156 If you need any help on the webrev[1] creation and hosting, Gustavo might help, since he did this process already. [1] http://openjdk.java.net/guide/webrevHelp.html From gromero at linux.vnet.ibm.com Mon May 16 18:25:10 2016 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 16 May 2016 15:25:10 -0300 Subject: SIGILL crashes JVM on PPC64 LE In-Reply-To: References: <5733B30D.6010201@linux.vnet.ibm.com> Message-ID: <201605161825.u4GIOpXL005912@mx0b-001b2d01.pphosted.com> Hi Volker Thanks for inspecting the Hotspot crash log. At the moment it's no possible, AFIAK - and as I could try - to run Cassandra on OpenJDK 9. It will hit another missing class issue before it runs into the SIGILL issue. I'm still trying to reproduce it with an easy test case. However, I provide the C2 compiled method disasm: hs_err log: http://hastebin.com/raw/orufukacos hs_err method disasm: http://hastebin.com/raw/owoxamodok Source for one of the four problematic classes that will crash JVM when compiled (we can see them in hs_err method disasm comments): java/org/apache/cassandra/db/rows/NativeCell.java#L133 https://goo.gl/Uefq8Y Thanks for letting me know that `isel` is never emitted. Thank you! Best regards, Gustavo On 12-05-2016 09:39, Volker Simonis wrote: > And I forgot to mention: I've checked and we don't emit vsel > instructions in jdk8 on ppc. So it must be a coincidence that changing > the endianess of the offending instruction yields a valid 'vsel' > instruction. > > > > On Thu, May 12, 2016 at 2:26 PM, Volker Simonis > wrote: >> Hi Gustavo, >> >> thanks for the bug report. The hs_err file you provided indicates that >> this crash happened with Ubuntu's openjdk 8 version. Can you still >> reproduce this with the the newest jdk9 builds? >> >> Also, I can see from the hs_err file that the crash happened in the C2 >> compiled method java.util.TimSort.countRunAndMakeAscending which >> doesn't seem to be related to nio and unsafe. >> >> Ideally, you could post an easy test case to reproduce the problem. If >> that's not possible, it would be helpful if you could post the output >> of a failing run with >> "-XX:CompileCommand=print,java.util.TimSort::countRunAndMakeAscending >> -XX:CompileCommand=option,java.util.TimSort::countRunAndMakeAscending,PrintOptoAssembly". >> In order to get the disassembly output for compiled methods you have >> to build the hsdis library from hotspot/src/share/tools/hsdis (it has >> a README with build instructions). >> >> Regards, >> Volker >> >> >> On Thu, May 12, 2016 at 12:32 AM, Gustavo Romero >> wrote: >>> Hi >>> >>> I'm getting a nasty SIGILL that crashes the JVM on PPC64 LE. >>> >>> hs_err log: >>> http://hastebin.com/raw/fovagunaci >>> >>> The application employs methods from both java.nio.ByteBuffer and >>> sun.misc.Unsafe classes in order to write and read from an allocated buffer. >>> >>> A interesting thing is that after debugging the instruction that caused the >>> said SIGILL: >>> >>> 0x3fff902839a4: cmpwi cr6,r17,0 >>> 0x3fff902839a8: beq cr6,0x3fff90283ae4 >>> 0x3fff902839ac: .long 0xea2f0013 <============ illegal instruction >>> 0x3fff902839b0: add r15,r15,r17 >>> 0x3fff902839b4: add r14,r17,r14 >>> >>> I found that when its endianness is changed it turns out to be a valid >>> instruction: vsel v24,v0,v5,v31 >>> >>> However, I'm still unable to determine if it's an application issue, something >>> with JVM unsafe interface code, or something else. >>> >>> Any clue on how to narrow down this SIGILL? >>> >>> Thank you! >>> >>> Regards, >>> Gustavo >>> > From gromero at linux.vnet.ibm.com Mon May 16 22:09:40 2016 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 16 May 2016 19:09:40 -0300 Subject: SIGILL crashes JVM on PPC64 LE In-Reply-To: <201605161825.u4GIO8g8023200@mx0a-001b2d01.pphosted.com> References: <5733B30D.6010201@linux.vnet.ibm.com> <201605161825.u4GIO8g8023200@mx0a-001b2d01.pphosted.com> Message-ID: <201605162209.u4GM9Usm008246@mx0b-001b2d01.pphosted.com> Hi Volker I'm not sure, but it seems that the bytecode i2l is wrong for some reason. It should be mapped to extsw asm instruction I think. Is there just one code path that controls this mapping on PPC? Thank you. Regards, Gustavo On 16-05-2016 15:25, Gustavo Romero wrote: > Hi Volker > > Thanks for inspecting the Hotspot crash log. > > At the moment it's no possible, AFIAK - and as I could try - to run > Cassandra on OpenJDK 9. It will hit another missing class issue before > it runs into the SIGILL issue. > > I'm still trying to reproduce it with an easy test case. > > However, I provide the C2 compiled method disasm: > > hs_err log: > http://hastebin.com/raw/orufukacos > > hs_err method disasm: > http://hastebin.com/raw/owoxamodok > > Source for one of the four problematic classes that will crash JVM when > compiled (we can see them in hs_err method disasm comments): > java/org/apache/cassandra/db/rows/NativeCell.java#L133 > https://goo.gl/Uefq8Y > > Thanks for letting me know that `isel` is never emitted. > > Thank you! > > Best regards, > Gustavo > > On 12-05-2016 09:39, Volker Simonis wrote: >> And I forgot to mention: I've checked and we don't emit vsel >> instructions in jdk8 on ppc. So it must be a coincidence that changing >> the endianess of the offending instruction yields a valid 'vsel' >> instruction. >> >> >> >> On Thu, May 12, 2016 at 2:26 PM, Volker Simonis >> wrote: >>> Hi Gustavo, >>> >>> thanks for the bug report. The hs_err file you provided indicates that >>> this crash happened with Ubuntu's openjdk 8 version. Can you still >>> reproduce this with the the newest jdk9 builds? >>> >>> Also, I can see from the hs_err file that the crash happened in the C2 >>> compiled method java.util.TimSort.countRunAndMakeAscending which >>> doesn't seem to be related to nio and unsafe. >>> >>> Ideally, you could post an easy test case to reproduce the problem. If >>> that's not possible, it would be helpful if you could post the output >>> of a failing run with >>> "-XX:CompileCommand=print,java.util.TimSort::countRunAndMakeAscending >>> -XX:CompileCommand=option,java.util.TimSort::countRunAndMakeAscending,PrintOptoAssembly". >>> In order to get the disassembly output for compiled methods you have >>> to build the hsdis library from hotspot/src/share/tools/hsdis (it has >>> a README with build instructions). >>> >>> Regards, >>> Volker >>> >>> >>> On Thu, May 12, 2016 at 12:32 AM, Gustavo Romero >>> wrote: >>>> Hi >>>> >>>> I'm getting a nasty SIGILL that crashes the JVM on PPC64 LE. >>>> >>>> hs_err log: >>>> http://hastebin.com/raw/fovagunaci >>>> >>>> The application employs methods from both java.nio.ByteBuffer and >>>> sun.misc.Unsafe classes in order to write and read from an allocated buffer. >>>> >>>> A interesting thing is that after debugging the instruction that caused the >>>> said SIGILL: >>>> >>>> 0x3fff902839a4: cmpwi cr6,r17,0 >>>> 0x3fff902839a8: beq cr6,0x3fff90283ae4 >>>> 0x3fff902839ac: .long 0xea2f0013 <============ illegal instruction >>>> 0x3fff902839b0: add r15,r15,r17 >>>> 0x3fff902839b4: add r14,r17,r14 >>>> >>>> I found that when its endianness is changed it turns out to be a valid >>>> instruction: vsel v24,v0,v5,v31 >>>> >>>> However, I'm still unable to determine if it's an application issue, something >>>> with JVM unsafe interface code, or something else. >>>> >>>> Any clue on how to narrow down this SIGILL? >>>> >>>> Thank you! >>>> >>>> Regards, >>>> Gustavo >>>> >> > From kim.barrett at oracle.com Wed May 18 01:26:19 2016 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 17 May 2016 21:26:19 -0400 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> Message-ID: > On May 10, 2016, at 10:27 AM, Doerr, Martin wrote: > > Hello everybody, > > thanks for finding this issue. New webrev is here: > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/ > ------------------------------------------------------------------------------ src/share/vm/runtime/atomic.hpp 30 typedef enum cmpxchg_cmpxchg_memory_order { 31 memory_order_relaxed, 32 // Use value which doesn't interfere with C++2011. We need to be more conservative. 33 memory_order_conservative = 8 34 } cmpxchg_memory_order; This is C++, where enum tag names are types, so we don't need a typedef here. Just use "enum cmpxchg_memory_order { ... };". ------------------------------------------------------------------------------ src/share/vm/runtime/atomic.cpp 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, 60 volatile unsigned int* dest, unsigned int compare_value, 61 cmpxchg_memory_order order) { Misaligned parameters. I'm surprised this was ever out-of-line. But with this change it's quite bad to be out-of-line, as that's going to kill the constant propogation of the order value. ------------------------------------------------------------------------------ src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp In each use, the cmpxchg_post_membar's are after the exit label, whereas the acquire fences they are replacing were before the exit label. This means we'll be fencing on failure exit, where we weren't doing so before. It's not clear whether this change is intentional. Note that this change is consistent with the C++11 one-order forms of cmpxchg, where the single order argument is used as the sucess order and (with potentially some modification) as the failure order. [I was going to suggest the asm goto syntax might be used to obtain the original ordering, but "An asm goto statement cannot have outputs." So some non-trivial restructuring will probably be needed to get the original ordering.] Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. ------------------------------------------------------------------------------ src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp These repeated comments need updating: 315 // Note that cmpxchg guarantees a two-way memory barrier across 316 // the cmpxchg, so it's really a a 'fence_cmpxchg_acquire' 317 // (see atomic.hpp). Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. ------------------------------------------------------------------------------ src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp [pre-existing] The cmpxchg asm sequences all originally looked like /* fence */ strasm_sync ... /* acquire */ strasm_sync ... So they were using strasm_sync (the full fence) in both places, even though the comments suggest it could/should have been strasm_fence ... strasm_acquire However, the description in runtime/atomic.hpp seems to indicate something stronger than "acquire" is required here, so the second comment seems wrong. Maybe its a good thing the comments are being removed by the proposed change. Similarly in other corresponding places in other files. ------------------------------------------------------------------------------ src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp 138 return (void *) cmpxchg_ptr((intptr_t) exchange_value, 139 (volatile intptr_t*) dest, 140 (intptr_t) compare_value, order); I'd prefer the order argument be placed on its own line, rather than added to an existing line where it's kind of hiding. Similarly in other corresponding places in other files. ------------------------------------------------------------------------------ src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp 271 inline jint Atomic::cmpxchg(jint exchange_value, 272 volatile jint* dest, 273 jint compare_value, cmpxchg_memory_order order) { and similarly elsewhere, I'd prefer the order parameter be on it's own line like the other parameters. Similarly in other corresponding places here and in other files. ------------------------------------------------------------------------------ src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp 306 inline void* Atomic::cmpxchg_ptr(void* exchange_value, 307 volatile void* dest, 308 void* compare_value, cmpxchg_memory_order order) { 309 310 return (void *) cmpxchg_ptr((intptr_t) exchange_value, 311 (volatile intptr_t*) dest, 312 (intptr_t) compare_value); 313 } Inner cmpxchg_ptr is missing the order argument. This will discard an outer relaxed order. (atomic_linux_zero is OK.) ------------------------------------------------------------------------------ src/share/vm/runtime/atomic.inline.hpp The unspecialized Atomic::cmpxchg for jbyte isn't passing the order argument through to cmpxchg_general. Of course, then we might want to figure out what cmpxchg_general should be doing with the order. ------------------------------------------------------------------------------ From martin.doerr at sap.com Wed May 18 10:12:24 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 18 May 2016 10:12:24 +0000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> Message-ID: <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> Hi Kim, thank you very much for the detailed review. I agree with your comments and I have made all your requested changes here: http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. For performance optimization, we should better use (or introduce additional) enum values. Thanks and best regards, Martin -----Original Message----- From: Kim Barrett [mailto:kim.barrett at oracle.com] Sent: Mittwoch, 18. Mai 2016 03:26 To: Doerr, Martin Cc: Hiroshi H Horii ; David Holmes ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > On May 10, 2016, at 10:27 AM, Doerr, Martin wrote: > > Hello everybody, > > thanks for finding this issue. New webrev is here: > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/ > ------------------------------------------------------------------------------ src/share/vm/runtime/atomic.hpp 30 typedef enum cmpxchg_cmpxchg_memory_order { 31 memory_order_relaxed, 32 // Use value which doesn't interfere with C++2011. We need to be more conservative. 33 memory_order_conservative = 8 34 } cmpxchg_memory_order; This is C++, where enum tag names are types, so we don't need a typedef here. Just use "enum cmpxchg_memory_order { ... };". ------------------------------------------------------------------------------ src/share/vm/runtime/atomic.cpp 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, 60 volatile unsigned int* dest, unsigned int compare_value, 61 cmpxchg_memory_order order) { Misaligned parameters. I'm surprised this was ever out-of-line. But with this change it's quite bad to be out-of-line, as that's going to kill the constant propogation of the order value. ------------------------------------------------------------------------------ src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp In each use, the cmpxchg_post_membar's are after the exit label, whereas the acquire fences they are replacing were before the exit label. This means we'll be fencing on failure exit, where we weren't doing so before. It's not clear whether this change is intentional. Note that this change is consistent with the C++11 one-order forms of cmpxchg, where the single order argument is used as the sucess order and (with potentially some modification) as the failure order. [I was going to suggest the asm goto syntax might be used to obtain the original ordering, but "An asm goto statement cannot have outputs." So some non-trivial restructuring will probably be needed to get the original ordering.] Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. ------------------------------------------------------------------------------ src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp These repeated comments need updating: 315 // Note that cmpxchg guarantees a two-way memory barrier across 316 // the cmpxchg, so it's really a a 'fence_cmpxchg_acquire' 317 // (see atomic.hpp). Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. ------------------------------------------------------------------------------ src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp [pre-existing] The cmpxchg asm sequences all originally looked like /* fence */ strasm_sync ... /* acquire */ strasm_sync ... So they were using strasm_sync (the full fence) in both places, even though the comments suggest it could/should have been strasm_fence ... strasm_acquire However, the description in runtime/atomic.hpp seems to indicate something stronger than "acquire" is required here, so the second comment seems wrong. Maybe its a good thing the comments are being removed by the proposed change. Similarly in other corresponding places in other files. ------------------------------------------------------------------------------ src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp 138 return (void *) cmpxchg_ptr((intptr_t) exchange_value, 139 (volatile intptr_t*) dest, 140 (intptr_t) compare_value, order); I'd prefer the order argument be placed on its own line, rather than added to an existing line where it's kind of hiding. Similarly in other corresponding places in other files. ------------------------------------------------------------------------------ src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp 271 inline jint Atomic::cmpxchg(jint exchange_value, 272 volatile jint* dest, 273 jint compare_value, cmpxchg_memory_order order) { and similarly elsewhere, I'd prefer the order parameter be on it's own line like the other parameters. Similarly in other corresponding places here and in other files. ------------------------------------------------------------------------------ src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp 306 inline void* Atomic::cmpxchg_ptr(void* exchange_value, 307 volatile void* dest, 308 void* compare_value, cmpxchg_memory_order order) { 309 310 return (void *) cmpxchg_ptr((intptr_t) exchange_value, 311 (volatile intptr_t*) dest, 312 (intptr_t) compare_value); 313 } Inner cmpxchg_ptr is missing the order argument. This will discard an outer relaxed order. (atomic_linux_zero is OK.) ------------------------------------------------------------------------------ src/share/vm/runtime/atomic.inline.hpp The unspecialized Atomic::cmpxchg for jbyte isn't passing the order argument through to cmpxchg_general. Of course, then we might want to figure out what cmpxchg_general should be doing with the order. ------------------------------------------------------------------------------ From david.holmes at oracle.com Wed May 18 10:52:03 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 18 May 2016 20:52:03 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> Message-ID: <95a96fb1-f35b-2e04-9f57-af39e2da4c9c@oracle.com> On 18/05/2016 8:12 PM, Doerr, Martin wrote: > Hi Kim, > > thank you very much for the detailed review. > > I agree with your comments and I have made all your requested changes here: > http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ > > It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. > Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. What further specification are you looking for: // All of the atomic operations that imply a read-modify-write action // guarantee a two-way memory barrier across that operation. ?? David > For performance optimization, we should better use (or introduce additional) enum values. > > Thanks and best regards, > Martin > > > -----Original Message----- > From: Kim Barrett [mailto:kim.barrett at oracle.com] > Sent: Mittwoch, 18. Mai 2016 03:26 > To: Doerr, Martin > Cc: Hiroshi H Horii ; David Holmes ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > >> On May 10, 2016, at 10:27 AM, Doerr, Martin wrote: >> >> Hello everybody, >> >> thanks for finding this issue. New webrev is here: >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/ >> > > ------------------------------------------------------------------------------ > src/share/vm/runtime/atomic.hpp > 30 typedef enum cmpxchg_cmpxchg_memory_order { > 31 memory_order_relaxed, > 32 // Use value which doesn't interfere with C++2011. We need to be more conservative. > 33 memory_order_conservative = 8 > 34 } cmpxchg_memory_order; > > This is C++, where enum tag names are types, so we don't need a > typedef here. Just use "enum cmpxchg_memory_order { ... };". > > ------------------------------------------------------------------------------ > src/share/vm/runtime/atomic.cpp > 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, > 60 volatile unsigned int* dest, unsigned int compare_value, > 61 cmpxchg_memory_order order) { > > Misaligned parameters. > > I'm surprised this was ever out-of-line. But with this change it's > quite bad to be out-of-line, as that's going to kill the constant > propogation of the order value. > > ------------------------------------------------------------------------------ > src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > In each use, the cmpxchg_post_membar's are after the exit label, > whereas the acquire fences they are replacing were before the exit > label. This means we'll be fencing on failure exit, where we weren't > doing so before. > > It's not clear whether this change is intentional. Note that this > change is consistent with the C++11 one-order forms of cmpxchg, where > the single order argument is used as the sucess order and (with > potentially some modification) as the failure order. > > [I was going to suggest the asm goto syntax might be used to obtain > the original ordering, but "An asm goto statement cannot have > outputs." So some non-trivial restructuring will probably be needed to > get the original ordering.] > > Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. > > ------------------------------------------------------------------------------ > src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > These repeated comments need updating: > 315 // Note that cmpxchg guarantees a two-way memory barrier across > 316 // the cmpxchg, so it's really a a 'fence_cmpxchg_acquire' > 317 // (see atomic.hpp). > > Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. > > ------------------------------------------------------------------------------ > src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > [pre-existing] > > The cmpxchg asm sequences all originally looked like > > /* fence */ > strasm_sync > ... > /* acquire */ > strasm_sync > ... > > So they were using strasm_sync (the full fence) in both places, even > though the comments suggest it could/should have been > strasm_fence ... strasm_acquire > However, the description in runtime/atomic.hpp seems to indicate > something stronger than "acquire" is required here, so the second > comment seems wrong. Maybe its a good thing the comments are being > removed by the proposed change. > > Similarly in other corresponding places in other files. > > ------------------------------------------------------------------------------ > src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp > 138 return (void *) cmpxchg_ptr((intptr_t) exchange_value, > 139 (volatile intptr_t*) dest, > 140 (intptr_t) compare_value, order); > > I'd prefer the order argument be placed on its own line, rather than > added to an existing line where it's kind of hiding. > > Similarly in other corresponding places in other files. > > ------------------------------------------------------------------------------ > src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > 271 inline jint Atomic::cmpxchg(jint exchange_value, > 272 volatile jint* dest, > 273 jint compare_value, cmpxchg_memory_order order) { > > and similarly elsewhere, I'd prefer the order parameter be on it's own > line like the other parameters. > > Similarly in other corresponding places here and in other files. > > ------------------------------------------------------------------------------ > src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > 306 inline void* Atomic::cmpxchg_ptr(void* exchange_value, > 307 volatile void* dest, > 308 void* compare_value, cmpxchg_memory_order order) { > 309 > 310 return (void *) cmpxchg_ptr((intptr_t) exchange_value, > 311 (volatile intptr_t*) dest, > 312 (intptr_t) compare_value); > 313 } > > Inner cmpxchg_ptr is missing the order argument. This will discard an > outer relaxed order. (atomic_linux_zero is OK.) > > ------------------------------------------------------------------------------ > src/share/vm/runtime/atomic.inline.hpp > > The unspecialized Atomic::cmpxchg for jbyte isn't passing the order > argument through to cmpxchg_general. Of course, then we might want to > figure out what cmpxchg_general should be doing with the order. > > ------------------------------------------------------------------------------ > From martin.doerr at sap.com Wed May 18 11:08:52 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 18 May 2016 11:08:52 +0000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <95a96fb1-f35b-2e04-9f57-af39e2da4c9c@oracle.com> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <95a96fb1-f35b-2e04-9f57-af39e2da4c9c@oracle.com> Message-ID: <13d10493b45840b4912274f68fedc855@DEWDFE13DE14.global.corp.sap> Hi David, in comparison to C++11 or JEP 193, the hotspot C++ semantics are kind of unprecise for PPC64. That's the reason why we use 2 sync instructions which is the maximum conservative implementation. C++11 and JEP 193 specify which barriers are needed in case cmpxchg fails. In addition, I think one could implement " two-way memory barrier across that operation" for example as lwsync+cmpxchg+sync on PPC64 as well. But this wouldn't be multi-copy-atomic. It's unclear if this property is needed. Best regards, Martin -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Mittwoch, 18. Mai 2016 12:52 To: Doerr, Martin ; Kim Barrett Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg On 18/05/2016 8:12 PM, Doerr, Martin wrote: > Hi Kim, > > thank you very much for the detailed review. > > I agree with your comments and I have made all your requested changes here: > http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ > > It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. > Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. What further specification are you looking for: // All of the atomic operations that imply a read-modify-write action // guarantee a two-way memory barrier across that operation. ?? David > For performance optimization, we should better use (or introduce additional) enum values. > > Thanks and best regards, > Martin > > > -----Original Message----- > From: Kim Barrett [mailto:kim.barrett at oracle.com] > Sent: Mittwoch, 18. Mai 2016 03:26 > To: Doerr, Martin > Cc: Hiroshi H Horii ; David Holmes ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > >> On May 10, 2016, at 10:27 AM, Doerr, Martin wrote: >> >> Hello everybody, >> >> thanks for finding this issue. New webrev is here: >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/ >> > > ------------------------------------------------------------------------------ > src/share/vm/runtime/atomic.hpp > 30 typedef enum cmpxchg_cmpxchg_memory_order { > 31 memory_order_relaxed, > 32 // Use value which doesn't interfere with C++2011. We need to be more conservative. > 33 memory_order_conservative = 8 > 34 } cmpxchg_memory_order; > > This is C++, where enum tag names are types, so we don't need a > typedef here. Just use "enum cmpxchg_memory_order { ... };". > > ------------------------------------------------------------------------------ > src/share/vm/runtime/atomic.cpp > 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, > 60 volatile unsigned int* dest, unsigned int compare_value, > 61 cmpxchg_memory_order order) { > > Misaligned parameters. > > I'm surprised this was ever out-of-line. But with this change it's > quite bad to be out-of-line, as that's going to kill the constant > propogation of the order value. > > ------------------------------------------------------------------------------ > src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > In each use, the cmpxchg_post_membar's are after the exit label, > whereas the acquire fences they are replacing were before the exit > label. This means we'll be fencing on failure exit, where we weren't > doing so before. > > It's not clear whether this change is intentional. Note that this > change is consistent with the C++11 one-order forms of cmpxchg, where > the single order argument is used as the sucess order and (with > potentially some modification) as the failure order. > > [I was going to suggest the asm goto syntax might be used to obtain > the original ordering, but "An asm goto statement cannot have > outputs." So some non-trivial restructuring will probably be needed to > get the original ordering.] > > Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. > > ------------------------------------------------------------------------------ > src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > These repeated comments need updating: > 315 // Note that cmpxchg guarantees a two-way memory barrier across > 316 // the cmpxchg, so it's really a a 'fence_cmpxchg_acquire' > 317 // (see atomic.hpp). > > Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. > > ------------------------------------------------------------------------------ > src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > [pre-existing] > > The cmpxchg asm sequences all originally looked like > > /* fence */ > strasm_sync > ... > /* acquire */ > strasm_sync > ... > > So they were using strasm_sync (the full fence) in both places, even > though the comments suggest it could/should have been > strasm_fence ... strasm_acquire > However, the description in runtime/atomic.hpp seems to indicate > something stronger than "acquire" is required here, so the second > comment seems wrong. Maybe its a good thing the comments are being > removed by the proposed change. > > Similarly in other corresponding places in other files. > > ------------------------------------------------------------------------------ > src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp > 138 return (void *) cmpxchg_ptr((intptr_t) exchange_value, > 139 (volatile intptr_t*) dest, > 140 (intptr_t) compare_value, order); > > I'd prefer the order argument be placed on its own line, rather than > added to an existing line where it's kind of hiding. > > Similarly in other corresponding places in other files. > > ------------------------------------------------------------------------------ > src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > 271 inline jint Atomic::cmpxchg(jint exchange_value, > 272 volatile jint* dest, > 273 jint compare_value, cmpxchg_memory_order order) { > > and similarly elsewhere, I'd prefer the order parameter be on it's own > line like the other parameters. > > Similarly in other corresponding places here and in other files. > > ------------------------------------------------------------------------------ > src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > 306 inline void* Atomic::cmpxchg_ptr(void* exchange_value, > 307 volatile void* dest, > 308 void* compare_value, cmpxchg_memory_order order) { > 309 > 310 return (void *) cmpxchg_ptr((intptr_t) exchange_value, > 311 (volatile intptr_t*) dest, > 312 (intptr_t) compare_value); > 313 } > > Inner cmpxchg_ptr is missing the order argument. This will discard an > outer relaxed order. (atomic_linux_zero is OK.) > > ------------------------------------------------------------------------------ > src/share/vm/runtime/atomic.inline.hpp > > The unspecialized Atomic::cmpxchg for jbyte isn't passing the order > argument through to cmpxchg_general. Of course, then we might want to > figure out what cmpxchg_general should be doing with the order. > > ------------------------------------------------------------------------------ > From david.holmes at oracle.com Wed May 18 11:52:41 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 18 May 2016 21:52:41 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <13d10493b45840b4912274f68fedc855@DEWDFE13DE14.global.corp.sap> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <95a96fb1-f35b-2e04-9f57-af39e2da4c9c@oracle.com> <13d10493b45840b4912274f68fedc855@DEWDFE13DE14.global.corp.sap> Message-ID: <7628ef5d-4e59-ef05-0550-07d5838f4570@oracle.com> On 18/05/2016 9:08 PM, Doerr, Martin wrote: > Hi David, > > in comparison to C++11 or JEP 193, the hotspot C++ semantics are kind of unprecise for PPC64. That's the reason why we use 2 sync instructions which is the maximum conservative implementation. > > C++11 and JEP 193 specify which barriers are needed in case cmpxchg fails. The hotspot semantics are quite simple - no difference between success or failure - just two-way barriers around the operations. > In addition, I think one could implement " two-way memory barrier across that operation" for example as lwsync+cmpxchg+sync on PPC64 as well. But this wouldn't be multi-copy-atomic. It's unclear if this property is needed. Yes multi-copy-atomicity is implicit in the "two way barriers" - nothing can be reordered in relation to the operation, so implicitly all observers see the same thing at the same time. This may well be stronger than required by actual algorithms using the operations but as the comment block continues: // these semantics reflect the strength of atomic operations that are // provided on SPARC/X86. We assume that strength is necessary unless // we can prove that a weaker form is sufficiently safe. Cheers, David ----- > Best regards, > Martin > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Mittwoch, 18. Mai 2016 12:52 > To: Doerr, Martin ; Kim Barrett > Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > > On 18/05/2016 8:12 PM, Doerr, Martin wrote: >> Hi Kim, >> >> thank you very much for the detailed review. >> >> I agree with your comments and I have made all your requested changes here: >> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ >> >> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. >> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. > > What further specification are you looking for: > > // All of the atomic operations that imply a read-modify-write action > // guarantee a two-way memory barrier across that operation. > > ?? > > David > > >> For performance optimization, we should better use (or introduce additional) enum values. >> >> Thanks and best regards, >> Martin >> >> >> -----Original Message----- >> From: Kim Barrett [mailto:kim.barrett at oracle.com] >> Sent: Mittwoch, 18. Mai 2016 03:26 >> To: Doerr, Martin >> Cc: Hiroshi H Horii ; David Holmes ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >> >>> On May 10, 2016, at 10:27 AM, Doerr, Martin wrote: >>> >>> Hello everybody, >>> >>> thanks for finding this issue. New webrev is here: >>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/ >>> >> >> ------------------------------------------------------------------------------ >> src/share/vm/runtime/atomic.hpp >> 30 typedef enum cmpxchg_cmpxchg_memory_order { >> 31 memory_order_relaxed, >> 32 // Use value which doesn't interfere with C++2011. We need to be more conservative. >> 33 memory_order_conservative = 8 >> 34 } cmpxchg_memory_order; >> >> This is C++, where enum tag names are types, so we don't need a >> typedef here. Just use "enum cmpxchg_memory_order { ... };". >> >> ------------------------------------------------------------------------------ >> src/share/vm/runtime/atomic.cpp >> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, >> 60 volatile unsigned int* dest, unsigned int compare_value, >> 61 cmpxchg_memory_order order) { >> >> Misaligned parameters. >> >> I'm surprised this was ever out-of-line. But with this change it's >> quite bad to be out-of-line, as that's going to kill the constant >> propogation of the order value. >> >> ------------------------------------------------------------------------------ >> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >> >> In each use, the cmpxchg_post_membar's are after the exit label, >> whereas the acquire fences they are replacing were before the exit >> label. This means we'll be fencing on failure exit, where we weren't >> doing so before. >> >> It's not clear whether this change is intentional. Note that this >> change is consistent with the C++11 one-order forms of cmpxchg, where >> the single order argument is used as the sucess order and (with >> potentially some modification) as the failure order. >> >> [I was going to suggest the asm goto syntax might be used to obtain >> the original ordering, but "An asm goto statement cannot have >> outputs." So some non-trivial restructuring will probably be needed to >> get the original ordering.] >> >> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. >> >> ------------------------------------------------------------------------------ >> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >> >> These repeated comments need updating: >> 315 // Note that cmpxchg guarantees a two-way memory barrier across >> 316 // the cmpxchg, so it's really a a 'fence_cmpxchg_acquire' >> 317 // (see atomic.hpp). >> >> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. >> >> ------------------------------------------------------------------------------ >> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >> >> [pre-existing] >> >> The cmpxchg asm sequences all originally looked like >> >> /* fence */ >> strasm_sync >> ... >> /* acquire */ >> strasm_sync >> ... >> >> So they were using strasm_sync (the full fence) in both places, even >> though the comments suggest it could/should have been >> strasm_fence ... strasm_acquire >> However, the description in runtime/atomic.hpp seems to indicate >> something stronger than "acquire" is required here, so the second >> comment seems wrong. Maybe its a good thing the comments are being >> removed by the proposed change. >> >> Similarly in other corresponding places in other files. >> >> ------------------------------------------------------------------------------ >> src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp >> 138 return (void *) cmpxchg_ptr((intptr_t) exchange_value, >> 139 (volatile intptr_t*) dest, >> 140 (intptr_t) compare_value, order); >> >> I'd prefer the order argument be placed on its own line, rather than >> added to an existing line where it's kind of hiding. >> >> Similarly in other corresponding places in other files. >> >> ------------------------------------------------------------------------------ >> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp >> 271 inline jint Atomic::cmpxchg(jint exchange_value, >> 272 volatile jint* dest, >> 273 jint compare_value, cmpxchg_memory_order order) { >> >> and similarly elsewhere, I'd prefer the order parameter be on it's own >> line like the other parameters. >> >> Similarly in other corresponding places here and in other files. >> >> ------------------------------------------------------------------------------ >> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp >> 306 inline void* Atomic::cmpxchg_ptr(void* exchange_value, >> 307 volatile void* dest, >> 308 void* compare_value, cmpxchg_memory_order order) { >> 309 >> 310 return (void *) cmpxchg_ptr((intptr_t) exchange_value, >> 311 (volatile intptr_t*) dest, >> 312 (intptr_t) compare_value); >> 313 } >> >> Inner cmpxchg_ptr is missing the order argument. This will discard an >> outer relaxed order. (atomic_linux_zero is OK.) >> >> ------------------------------------------------------------------------------ >> src/share/vm/runtime/atomic.inline.hpp >> >> The unspecialized Atomic::cmpxchg for jbyte isn't passing the order >> argument through to cmpxchg_general. Of course, then we might want to >> figure out what cmpxchg_general should be doing with the order. >> >> ------------------------------------------------------------------------------ >> From martin.doerr at sap.com Wed May 18 12:32:15 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 18 May 2016 12:32:15 +0000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <7628ef5d-4e59-ef05-0550-07d5838f4570@oracle.com> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <95a96fb1-f35b-2e04-9f57-af39e2da4c9c@oracle.com> <13d10493b45840b4912274f68fedc855@DEWDFE13DE14.global.corp.sap> <7628ef5d-4e59-ef05-0550-07d5838f4570@oracle.com> Message-ID: Hi David, ok, this comment specifies it clear enough: // these semantics reflect the strength of atomic operations that are // provided on SPARC/X86. So my change does the right thing :-) Thanks for your quick response and especially for pushing this change forward. Best regards, Martin -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Mittwoch, 18. Mai 2016 13:53 To: Doerr, Martin ; Kim Barrett Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg On 18/05/2016 9:08 PM, Doerr, Martin wrote: > Hi David, > > in comparison to C++11 or JEP 193, the hotspot C++ semantics are kind of unprecise for PPC64. That's the reason why we use 2 sync instructions which is the maximum conservative implementation. > > C++11 and JEP 193 specify which barriers are needed in case cmpxchg fails. The hotspot semantics are quite simple - no difference between success or failure - just two-way barriers around the operations. > In addition, I think one could implement " two-way memory barrier across that operation" for example as lwsync+cmpxchg+sync on PPC64 as well. But this wouldn't be multi-copy-atomic. It's unclear if this property is needed. Yes multi-copy-atomicity is implicit in the "two way barriers" - nothing can be reordered in relation to the operation, so implicitly all observers see the same thing at the same time. This may well be stronger than required by actual algorithms using the operations but as the comment block continues: // these semantics reflect the strength of atomic operations that are // provided on SPARC/X86. We assume that strength is necessary unless // we can prove that a weaker form is sufficiently safe. Cheers, David ----- > Best regards, > Martin > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Mittwoch, 18. Mai 2016 12:52 > To: Doerr, Martin ; Kim Barrett > Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > > On 18/05/2016 8:12 PM, Doerr, Martin wrote: >> Hi Kim, >> >> thank you very much for the detailed review. >> >> I agree with your comments and I have made all your requested changes here: >> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ >> >> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. >> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. > > What further specification are you looking for: > > // All of the atomic operations that imply a read-modify-write action > // guarantee a two-way memory barrier across that operation. > > ?? > > David > > >> For performance optimization, we should better use (or introduce additional) enum values. >> >> Thanks and best regards, >> Martin >> >> >> -----Original Message----- >> From: Kim Barrett [mailto:kim.barrett at oracle.com] >> Sent: Mittwoch, 18. Mai 2016 03:26 >> To: Doerr, Martin >> Cc: Hiroshi H Horii ; David Holmes ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >> >>> On May 10, 2016, at 10:27 AM, Doerr, Martin wrote: >>> >>> Hello everybody, >>> >>> thanks for finding this issue. New webrev is here: >>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/ >>> >> >> ------------------------------------------------------------------------------ >> src/share/vm/runtime/atomic.hpp >> 30 typedef enum cmpxchg_cmpxchg_memory_order { >> 31 memory_order_relaxed, >> 32 // Use value which doesn't interfere with C++2011. We need to be more conservative. >> 33 memory_order_conservative = 8 >> 34 } cmpxchg_memory_order; >> >> This is C++, where enum tag names are types, so we don't need a >> typedef here. Just use "enum cmpxchg_memory_order { ... };". >> >> ------------------------------------------------------------------------------ >> src/share/vm/runtime/atomic.cpp >> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, >> 60 volatile unsigned int* dest, unsigned int compare_value, >> 61 cmpxchg_memory_order order) { >> >> Misaligned parameters. >> >> I'm surprised this was ever out-of-line. But with this change it's >> quite bad to be out-of-line, as that's going to kill the constant >> propogation of the order value. >> >> ------------------------------------------------------------------------------ >> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >> >> In each use, the cmpxchg_post_membar's are after the exit label, >> whereas the acquire fences they are replacing were before the exit >> label. This means we'll be fencing on failure exit, where we weren't >> doing so before. >> >> It's not clear whether this change is intentional. Note that this >> change is consistent with the C++11 one-order forms of cmpxchg, where >> the single order argument is used as the sucess order and (with >> potentially some modification) as the failure order. >> >> [I was going to suggest the asm goto syntax might be used to obtain >> the original ordering, but "An asm goto statement cannot have >> outputs." So some non-trivial restructuring will probably be needed to >> get the original ordering.] >> >> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. >> >> ------------------------------------------------------------------------------ >> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >> >> These repeated comments need updating: >> 315 // Note that cmpxchg guarantees a two-way memory barrier across >> 316 // the cmpxchg, so it's really a a 'fence_cmpxchg_acquire' >> 317 // (see atomic.hpp). >> >> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. >> >> ------------------------------------------------------------------------------ >> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp >> >> [pre-existing] >> >> The cmpxchg asm sequences all originally looked like >> >> /* fence */ >> strasm_sync >> ... >> /* acquire */ >> strasm_sync >> ... >> >> So they were using strasm_sync (the full fence) in both places, even >> though the comments suggest it could/should have been >> strasm_fence ... strasm_acquire >> However, the description in runtime/atomic.hpp seems to indicate >> something stronger than "acquire" is required here, so the second >> comment seems wrong. Maybe its a good thing the comments are being >> removed by the proposed change. >> >> Similarly in other corresponding places in other files. >> >> ------------------------------------------------------------------------------ >> src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp >> 138 return (void *) cmpxchg_ptr((intptr_t) exchange_value, >> 139 (volatile intptr_t*) dest, >> 140 (intptr_t) compare_value, order); >> >> I'd prefer the order argument be placed on its own line, rather than >> added to an existing line where it's kind of hiding. >> >> Similarly in other corresponding places in other files. >> >> ------------------------------------------------------------------------------ >> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp >> 271 inline jint Atomic::cmpxchg(jint exchange_value, >> 272 volatile jint* dest, >> 273 jint compare_value, cmpxchg_memory_order order) { >> >> and similarly elsewhere, I'd prefer the order parameter be on it's own >> line like the other parameters. >> >> Similarly in other corresponding places here and in other files. >> >> ------------------------------------------------------------------------------ >> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp >> 306 inline void* Atomic::cmpxchg_ptr(void* exchange_value, >> 307 volatile void* dest, >> 308 void* compare_value, cmpxchg_memory_order order) { >> 309 >> 310 return (void *) cmpxchg_ptr((intptr_t) exchange_value, >> 311 (volatile intptr_t*) dest, >> 312 (intptr_t) compare_value); >> 313 } >> >> Inner cmpxchg_ptr is missing the order argument. This will discard an >> outer relaxed order. (atomic_linux_zero is OK.) >> >> ------------------------------------------------------------------------------ >> src/share/vm/runtime/atomic.inline.hpp >> >> The unspecialized Atomic::cmpxchg for jbyte isn't passing the order >> argument through to cmpxchg_general. Of course, then we might want to >> figure out what cmpxchg_general should be doing with the order. >> >> ------------------------------------------------------------------------------ >> From david.holmes at oracle.com Thu May 19 00:03:46 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 19 May 2016 10:03:46 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> Message-ID: <743059fb-1b8a-1105-493b-b0071e53cbf8@oracle.com> On 18/05/2016 8:12 PM, Doerr, Martin wrote: > Hi Kim, > > thank you very much for the detailed review. > > I agree with your comments and I have made all your requested changes here: > http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ This looks fine to me. I make no comments on the PPC implementation. Thanks, David > It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. > Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. > > For performance optimization, we should better use (or introduce additional) enum values. > > Thanks and best regards, > Martin > > > -----Original Message----- > From: Kim Barrett [mailto:kim.barrett at oracle.com] > Sent: Mittwoch, 18. Mai 2016 03:26 > To: Doerr, Martin > Cc: Hiroshi H Horii ; David Holmes ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > >> On May 10, 2016, at 10:27 AM, Doerr, Martin wrote: >> >> Hello everybody, >> >> thanks for finding this issue. New webrev is here: >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/ >> > > ------------------------------------------------------------------------------ > src/share/vm/runtime/atomic.hpp > 30 typedef enum cmpxchg_cmpxchg_memory_order { > 31 memory_order_relaxed, > 32 // Use value which doesn't interfere with C++2011. We need to be more conservative. > 33 memory_order_conservative = 8 > 34 } cmpxchg_memory_order; > > This is C++, where enum tag names are types, so we don't need a > typedef here. Just use "enum cmpxchg_memory_order { ... };". > > ------------------------------------------------------------------------------ > src/share/vm/runtime/atomic.cpp > 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, > 60 volatile unsigned int* dest, unsigned int compare_value, > 61 cmpxchg_memory_order order) { > > Misaligned parameters. > > I'm surprised this was ever out-of-line. But with this change it's > quite bad to be out-of-line, as that's going to kill the constant > propogation of the order value. > > ------------------------------------------------------------------------------ > src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > In each use, the cmpxchg_post_membar's are after the exit label, > whereas the acquire fences they are replacing were before the exit > label. This means we'll be fencing on failure exit, where we weren't > doing so before. > > It's not clear whether this change is intentional. Note that this > change is consistent with the C++11 one-order forms of cmpxchg, where > the single order argument is used as the sucess order and (with > potentially some modification) as the failure order. > > [I was going to suggest the asm goto syntax might be used to obtain > the original ordering, but "An asm goto statement cannot have > outputs." So some non-trivial restructuring will probably be needed to > get the original ordering.] > > Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. > > ------------------------------------------------------------------------------ > src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > These repeated comments need updating: > 315 // Note that cmpxchg guarantees a two-way memory barrier across > 316 // the cmpxchg, so it's really a a 'fence_cmpxchg_acquire' > 317 // (see atomic.hpp). > > Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. > > ------------------------------------------------------------------------------ > src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp > > [pre-existing] > > The cmpxchg asm sequences all originally looked like > > /* fence */ > strasm_sync > ... > /* acquire */ > strasm_sync > ... > > So they were using strasm_sync (the full fence) in both places, even > though the comments suggest it could/should have been > strasm_fence ... strasm_acquire > However, the description in runtime/atomic.hpp seems to indicate > something stronger than "acquire" is required here, so the second > comment seems wrong. Maybe its a good thing the comments are being > removed by the proposed change. > > Similarly in other corresponding places in other files. > > ------------------------------------------------------------------------------ > src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp > 138 return (void *) cmpxchg_ptr((intptr_t) exchange_value, > 139 (volatile intptr_t*) dest, > 140 (intptr_t) compare_value, order); > > I'd prefer the order argument be placed on its own line, rather than > added to an existing line where it's kind of hiding. > > Similarly in other corresponding places in other files. > > ------------------------------------------------------------------------------ > src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > 271 inline jint Atomic::cmpxchg(jint exchange_value, > 272 volatile jint* dest, > 273 jint compare_value, cmpxchg_memory_order order) { > > and similarly elsewhere, I'd prefer the order parameter be on it's own > line like the other parameters. > > Similarly in other corresponding places here and in other files. > > ------------------------------------------------------------------------------ > src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp > 306 inline void* Atomic::cmpxchg_ptr(void* exchange_value, > 307 volatile void* dest, > 308 void* compare_value, cmpxchg_memory_order order) { > 309 > 310 return (void *) cmpxchg_ptr((intptr_t) exchange_value, > 311 (volatile intptr_t*) dest, > 312 (intptr_t) compare_value); > 313 } > > Inner cmpxchg_ptr is missing the order argument. This will discard an > outer relaxed order. (atomic_linux_zero is OK.) > > ------------------------------------------------------------------------------ > src/share/vm/runtime/atomic.inline.hpp > > The unspecialized Atomic::cmpxchg for jbyte isn't passing the order > argument through to cmpxchg_general. Of course, then we might want to > figure out what cmpxchg_general should be doing with the order. > > ------------------------------------------------------------------------------ > From david.holmes at oracle.com Thu May 19 11:56:58 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 19 May 2016 21:56:58 +1000 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <573D8D20.3080008@redhat.com> References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> <5711ED18.7000706@oracle.com> <201604180215.u3I2FUZi001650@d19av07.sagamino.japan.ibm.com> <571464DF.3070706@oracle.com> <5714E416.6030300@redhat.com> <571577EB.1080907@oracle.com> <573D8D20.3080008@redhat.com> Message-ID: <95abf1f5-0464-0df1-7de2-c8f7c493703e@oracle.com> On 19/05/2016 7:53 PM, Andrew Haley wrote: > The AArch64 code for this isn't ideal, of course. I'll submit > an AArch64 version as soon as this goes in. Do I need a different > bug ID? There are now two bugs for this: 8155949: Support relaxed semantics in cmpxchg is adding the new API, with a relaxed implementation for PPC. 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 is for the actual GC code changes to use the relaxed cmpxchg API. I will be shepherding 8155949 through the post FC process (once we know what it is). If you can provide the Aarch64 code before that completes then it can go in with the other changes under this bug. Otherwise it will need a separate bug. Cheers, David > Andrew. > From david.holmes at oracle.com Thu May 19 20:08:20 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 20 May 2016 06:08:20 +1000 Subject: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <573DD60F.9030005@redhat.com> References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com> <5711ED18.7000706@oracle.com> <201604180215.u3I2FUZi001650@d19av07.sagamino.japan.ibm.com> <571464DF.3070706@oracle.com> <5714E416.6030300@redhat.com> <573DD60F.9030005@redhat.com> Message-ID: <3fdffecd-75d0-fe77-0ed9-038b24bbf007@oracle.com> Andrew, Can you post this to the actual review thread for 8155949 please. Thanks, David On 20/05/2016 1:04 AM, Andrew Haley wrote: > There is one significant problem with this approach. > > Atomic::cmpxchg(jint) is defined like this in atomic.cpp: > > unsigned Atomic::cmpxchg(unsigned int exchange_value, > volatile unsigned int* dest, unsigned int compare_value, > cmpxchg_memory_order order) { > assert(sizeof(unsigned int) == sizeof(jint), "more work to do"); > return (unsigned int)Atomic::cmpxchg((jint)exchange_value, (volatile jint*)dest, > (jint)compare_value, order); > } > > Because this is in atomic.cpp, there is a *runtime* test on the memory > order: the compiler can't constant propagate it. If we're adding the > cmpxchg_memory_order I think we should move Atomic::cmpxchg(jint) to > atomic.inline.hpp. > > Andrew. > From kim.barrett at oracle.com Thu May 19 20:17:53 2016 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 19 May 2016 16:17:53 -0400 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <7628ef5d-4e59-ef05-0550-07d5838f4570@oracle.com> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <95a96fb1-f35b-2e04-9f57-af39e2da4c9c@oracle.com> <13d10493b45840b4912274f68fedc855@DEWDFE13DE14.global.corp.sap> <7628ef5d-4e59-ef05-0550-07d5838f4570@oracle.com> Message-ID: <227CE7D4-E731-4F1A-AED5-C29135208BC5@oracle.com> > On May 18, 2016, at 7:52 AM, David Holmes wrote: > > On 18/05/2016 9:08 PM, Doerr, Martin wrote: >> Hi David, >> >> in comparison to C++11 or JEP 193, the hotspot C++ semantics are kind of unprecise for PPC64. That's the reason why we use 2 sync instructions which is the maximum conservative implementation. >> >> C++11 and JEP 193 specify which barriers are needed in case cmpxchg fails. > > The hotspot semantics are quite simple - no difference between success or failure - just two-way barriers around the operations. The current ppc ports have have no post-barrier in the failure case. That does seem at variance from the documented hotspot semantics. I?m not sure about aarch64, since it uses a compiler intrinsic (__sync_val_compare_and_swap) whose expansion I don?t know where to find. sparc and x86 look fine, not surprisingly. From kim.barrett at oracle.com Thu May 19 22:03:02 2016 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 19 May 2016 18:03:02 -0400 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> Message-ID: > On May 18, 2016, at 6:12 AM, Doerr, Martin wrote: > > Hi Kim, > > thank you very much for the detailed review. > > I agree with your comments and I have made all your requested changes here: > http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ > > It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. > Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. > > For performance optimization, we should better use (or introduce additional) enum values. ------------------------------------------------------------------------------ There doesn't seem to have been any change for this earlier comment. src/share/vm/runtime/atomic.cpp 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, 60 volatile unsigned int* dest, unsigned int compare_value, 61 cmpxchg_memory_order order) { I'm surprised this was ever out-of-line. But with this change it's quite bad to be out-of-line, as that's going to kill the constant propogation of the order value. ------------------------------------------------------------------------------ Other than that, looks good. From gromero at linux.vnet.ibm.com Thu May 19 23:46:05 2016 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Thu, 19 May 2016 20:46:05 -0300 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: References: <56FEDBB3.5030106@linux.vnet.ibm.com> <57339EE1.2040500@linux.vnet.ibm.com> Message-ID: <201605192346.u4JNhcxs015627@mx0a-001b2d01.pphosted.com> Hi Martin Thank you for reviewing the webrev. > We could use a static variable for the default dscr value. It could be modified in VM_Version::config_dscr() and used by your restore code (load_const_optimized(tmp1, ...) instead of li(tmp1, 0)). Absolutely, resetting DSCR to the default value (zero) is not right. I did as you suggested and created a static variable modified and initialized from VM_Version::config_dscr(). Then I used it to get the current value of DSCR, set only the pre-fetch as deepest, and restore its previous value. > - The PPC-elf64abi-1.9 says: "Functions must ensure that the appropriate bits in the vrsave register are set for any vector registers they use. ...". I think not touching vrsave is the right thing for AIX and ppc64le, but I think we will either have to skip the optimization on ppc64 big endian or handle vrsave. Do you agree? About the VRSAVE register, you are right, but there is a confusing here and it's my fault: I'm not using the VMX registers. In my code I've used the VSX load/store instructions with a VectorRegister type, i.e. VR0 and VR1. It's OK if we look at the assembled instructions because, in the end, VR0 and VR1 will be converted to target (or source) registers number 0 and 1. But it's VSX registers 0 and 1 (VSR0 and VSR1) and not VMX (aka Altivec) registers 0 and 1 (VR0 and VR1). There is indeed a relationship between VSR and VR registers, as we can see in the following diagram adapted from [1]: .---------------------------------. VSR( 0)| FPR(0) | | VSR( 1)| FPR(1) | | ... | ... | | ... | ... | | VSR(30)| FPR(30) | | VSR(31)| FPR(31) | | VSR(32)| VR(0) | VSR(33)| VR(1) | ... | ... | ... | ... | VSR(62)| VR(30) | VSR(63)| VR(31) | '---------------------------------' 0 127 However VMX registers VR0-31 are mapped to VSX VSR32-63 registers, and so we can use VSR0 and VSR1 (although they are also mapped to FPR, FPR0-13 are volatile). Thus actually in my code I was using VSR0 and VSR1 and not VR0 and VR1. Thus as VRSAVE only corresponds to VMX/Altivec registers (VR0-VR31), there is not need to take care of VRSAVE. I fixed the registers names/types in this new webrev. I noted that the VSR registers were not implemented and thus I implemented them. Now VSX load/store instruction use VectorSRegister type. I'm using VSR0 and VSR1 registers in the stub, respecting the ABI. Webrev: http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v2/ Best regards, Gustavo [1] Power Architecture 64-Bit ELF V2 ABI https://goo.gl/LLXRwN, p. 43-44 > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Mittwoch, 11. Mai 2016 23:07 > To: Volker Simonis > Cc: Doerr, Martin ; Simonis, Volker ; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; brenohl at br.ibm.com > Subject: Re: PPC64 VSX load/store instructions in stubs > Importance: High > > Hi Volker, Hi Martin > > Sincere apologies for the long delay. > > My initial approach to test the VSX load/store was from an > extracted snippet regarding just the mass copy loop "grafted" inside an inline > asm, performing isolated tests with "perf" tool focused only on aligned source and > destination (best case). > > The extracted code, called "Original" in the plot below (black line), is here: > https://github.com/gromero/arraycopy/blob/2pairs/arraycopy.c#L27-L36 > > That extracted, after some experiments, evolved into this one that employs VSX > load/store, Data Stream deepest pre-fetch, d-cache touch, and backbranch aligned > to 32-byte: > https://github.com/gromero/arraycopy/blob/2pairs/arraycopy_vsx.c#L27-L41 > > All runs where "pinned" using `numactl --cpunodebind --membind` to avoid any > scheduler decision that could add noise to the measure. > > VSX, deepest data pre-fetch, d-cache touch, and 32-bytes align proved to be better > in the isolated code (red line) in comparison to the original extracted code > (black line): > http://gromero.github.io/openjdk/original_vsx_non_pf_vsx_pf_deepest.pdf > > So I proceeded to implement the VSX loop in OpenJDK based on the best case > result (VSX, pre-fetch deepest, d-cache touch, and backbranch target align - > goetz TODO note). > > OpenJDK 8 webrev: > http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/8/ > > OpenJDK 9 webrev: > http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/9/ > > I've tested the change on OpenJDK 8 using this script that calls > System.arraycopy() on shorts: > https://goo.gl/8UWtLm > > The results for all data alignment cases: > http://gromero.github.io/openjdk/src_0_dst_0.pdf > http://gromero.github.io/openjdk/src_1_dst_0.pdf > http://gromero.github.io/openjdk/src_0_dst_1.pdf > http://gromero.github.io/openjdk/src_1_dst_1.pdf > > Martin, I added the vsx test to the feature-string. Regarding the ABI, I'm just > using two VSR: vsr0 and vsr1, both volatile. > > Volker, as the loop unrolling was removed now the loop copies 16 elemets a time, > like the non-VSX loop, and not 32 elements. I just verified the change on Little > endian. Sorry I didn't understand your question regarding "instructions for > aligned load/stores". Did you mean instructions for unaligned load/stores? I think > both fixed-point (ld/std) and VSX instructions will do load/store slower in > unaligned scenario. However VMX load/store is different and expects aligned > operands. Thank you very much for opening the bug > https://bugs.openjdk.java.net/browse/JDK-8154156 > > I don't have the profiling per function for each SPEC{jbb,jvm} benchmark > in order to determine which one would stress the proposed change better. > Could I use a better benchmark? > > Thank you! > > Best regards, > Gustavo > > On 05-04-2016 14:23, Volker Simonis wrote: >> Hi Gustavo, >> >> thanks a lot for your contribution. >> >> Can you please describe if you've run benchmarks and which performance >> improvements you saw? >> >> With your change if we're running on Power 8, we will only use the >> fast path for arrays with at least 32 elements. For smaller arrays, we >> will fall-back to copying only 2 elements at a time which will be >> slower than the initial version which copied 4 at a time in that case. >> >> Did you verified your changes on both, little and big endian? >> >> And what about unaligned memory accesses? As far as I read, >> lxvd2x/stxvd2x still work, but may be slower. I saw there also exist >> instructions for aligned load/stores. Would it make sens >> (performance-wise) to use them for the cases where we can be sure that >> we have aligned memory accesses? >> >> Thank you and best regards, >> Volker >> >> >> On Fri, Apr 1, 2016 at 10:36 PM, Gustavo Romero >> wrote: >>> Hi Martin, Hi Volker >>> >>> Currently VSX load/store instructions are not being used in PPC64 stubs, >>> particularly in arraycopy stubs inside generate_arraycopy_stubs() like, >>> but not limited to, generate_disjoint_{byte,short,int,long}_copy. >>> >>> We can speed up mass copy using VSX (Vector-Scalar Extension) load/store >>> instruction in processors >= POWER8, the same way it's already done for >>> libc memcpy(). >>> >>> This is an initial patch just for jshort_disjoint_arraycopy() VSX vector >>> load/store: >>> >>> http://81.de.7a9f.ip4.static.sl-reverse.com/202539/webrev >>> >>> What are your thoughts on that? Is there any impediment to use VSX >>> instructions in OpenJDK at the moment? >>> >>> Thank you. >>> >>> Best regards, >>> Gustavo >>> >> > From gromero at linux.vnet.ibm.com Fri May 20 16:20:05 2016 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Fri, 20 May 2016 13:20:05 -0300 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: <201605192346.u4JNhdUm028414@mx0a-001b2d01.pphosted.com> References: <56FEDBB3.5030106@linux.vnet.ibm.com> <57339EE1.2040500@linux.vnet.ibm.com> <201605192346.u4JNhdUm028414@mx0a-001b2d01.pphosted.com> Message-ID: <201605201620.u4KGEXCM042745@mx0a-001b2d01.pphosted.com> > Hi Martin > > Thank you for reviewing the webrev. > >> We could use a static variable for the default dscr value. It could be modified in VM_Version::config_dscr() and used by your restore code (load_const_optimized(tmp1, ...) instead of li(tmp1, 0)). > > Absolutely, resetting DSCR to the default value (zero) is not right. > > I did as you suggested and created a static variable modified and > initialized from VM_Version::config_dscr(). Then I used it to get the > current value of DSCR, set only the pre-fetch as deepest, and restore > its previous value. > > >> - The PPC-elf64abi-1.9 says: "Functions must ensure that the appropriate bits in the vrsave register are set for any vector registers they use. ...". I think not touching vrsave is the right thing for AIX and ppc64le, but I think we will either have to skip the optimization on ppc64 big endian or handle vrsave. Do you agree? > > About the VRSAVE register, you are right, but there is a confusing here > and it's my fault: I'm not using the VMX registers. > > In my code I've used the VSX load/store instructions with a > VectorRegister type, i.e. VR0 and VR1. It's OK if we look at the > assembled instructions because, in the end, VR0 and VR1 will be > converted to target (or source) registers number 0 and 1. But it's VSX > registers 0 and 1 (VSR0 and VSR1) and not VMX (aka Altivec) registers > 0 and 1 (VR0 and VR1). > > There is indeed a relationship between VSR and VR registers, as > we can see in the following diagram adapted from [1]: > > .---------------------------------. > VSR( 0)| FPR(0) | | > VSR( 1)| FPR(1) | | > ... | ... | | > ... | ... | | > VSR(30)| FPR(30) | | > VSR(31)| FPR(31) | | > VSR(32)| VR(0) | > VSR(33)| VR(1) | > ... | ... | > ... | ... | > VSR(62)| VR(30) | > VSR(63)| VR(31) | > '---------------------------------' > 0 127 > > However VMX registers VR0-31 are mapped to VSX VSR32-63 registers, > and so we can use VSR0 and VSR1 (although they are also mapped to FPR, > FPR0-13 are volatile). Thus actually in my code I was using VSR0 and > VSR1 and not VR0 and VR1. Thus as VRSAVE only corresponds to > VMX/Altivec registers (VR0-VR31), there is not need to take care of > VRSAVE. I fixed the registers names/types in this new webrev. > > I noted that the VSR registers were not implemented and thus I > implemented them. Now VSX load/store instruction use VectorSRegister > type. I'm using VSR0 and VSR1 registers in the stub, respecting the > ABI. > > Webrev: > http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v2/ > > Best regards, > Gustavo > > [1] Power Architecture 64-Bit ELF V2 ABI https://goo.gl/LLXRwN, p. 43-44 > Hi Martin The previous change was not restoring the DSCR value. Here is the webwev with the fix included: http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v3/ Thank you! Best regards, Gustavo From david.holmes at oracle.com Fri May 20 23:09:31 2016 From: david.holmes at oracle.com (David Holmes) Date: Sat, 21 May 2016 09:09:31 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> Message-ID: <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> Hi Martin, Are you in a position to make the change now suggested by both Kim and Andrew? Can you also include the Aarch64 code that Andrew provided: http://cr.openjdk.java.net/~aph/8154736 I'd like to get this finalized so it is ready to push as soon as the process allows it to. Thanks, David On 20/05/2016 8:03 AM, Kim Barrett wrote: >> On May 18, 2016, at 6:12 AM, Doerr, Martin wrote: >> >> Hi Kim, >> >> thank you very much for the detailed review. >> >> I agree with your comments and I have made all your requested changes here: >> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ >> >> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. >> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. >> >> For performance optimization, we should better use (or introduce additional) enum values. > > ------------------------------------------------------------------------------ > There doesn't seem to have been any change for this earlier comment. > > src/share/vm/runtime/atomic.cpp > 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, > 60 volatile unsigned int* dest, unsigned int compare_value, > 61 cmpxchg_memory_order order) { > > I'm surprised this was ever out-of-line. But with this change it's > quite bad to be out-of-line, as that's going to kill the constant > propogation of the order value. > > ------------------------------------------------------------------------------ > > Other than that, looks good. > > > > From martin.doerr at sap.com Mon May 23 09:29:42 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 23 May 2016 09:29:42 +0000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> Message-ID: Hi David, here's the new webrev: http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/ Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change. Thanks for your time and your support. Best regards, Martin -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Samstag, 21. Mai 2016 01:10 To: Doerr, Martin Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg Hi Martin, Are you in a position to make the change now suggested by both Kim and Andrew? Can you also include the Aarch64 code that Andrew provided: http://cr.openjdk.java.net/~aph/8154736 I'd like to get this finalized so it is ready to push as soon as the process allows it to. Thanks, David On 20/05/2016 8:03 AM, Kim Barrett wrote: >> On May 18, 2016, at 6:12 AM, Doerr, Martin wrote: >> >> Hi Kim, >> >> thank you very much for the detailed review. >> >> I agree with your comments and I have made all your requested changes here: >> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ >> >> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. >> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. >> >> For performance optimization, we should better use (or introduce additional) enum values. > > ------------------------------------------------------------------------------ > There doesn't seem to have been any change for this earlier comment. > > src/share/vm/runtime/atomic.cpp > 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, > 60 volatile unsigned int* dest, unsigned int compare_value, > 61 cmpxchg_memory_order order) { > > I'm surprised this was ever out-of-line. But with this change it's > quite bad to be out-of-line, as that's going to kill the constant > propogation of the order value. > > ------------------------------------------------------------------------------ > > Other than that, looks good. > > > > From gromero at linux.vnet.ibm.com Mon May 23 14:22:16 2016 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 23 May 2016 11:22:16 -0300 Subject: RFR(M): PPC64: improve array copy stubs by using vector instructions Message-ID: <201605231422.u4NEIiBQ005583@mx0a-001b2d01.pphosted.com> Hi Martin Could you please host and review this webrev? Summary: * Add VSR registers to be used with VSX instruction set; * Add VSX load/store instructions (lxvd2x/stxvd2x) to mass copy in the stub for disjoint short copy in order to improve it. http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v4/ Thank you! Best regards, Gustavo From martin.doerr at sap.com Mon May 23 15:51:41 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 23 May 2016 15:51:41 +0000 Subject: RFR(M): PPC64: improve array copy stubs by using vector instructions In-Reply-To: <201605231422.u4NEIb1g013944@mx0a-001b2d01.pphosted.com> References: <201605231422.u4NEIb1g013944@mx0a-001b2d01.pphosted.com> Message-ID: <25cb11dfe7624a4a8848d049626413e7@DEWDFE13DE14.global.corp.sap> Hi Gustavo, thanks for implementing it and taking care of my concerns. Looks good, now. I will run tests and I can sponsor it after it was reviewed. Best regards, Martin -----Original Message----- From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] Sent: Montag, 23. Mai 2016 16:22 To: Doerr, Martin ; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net Cc: Simonis, Volker ; brenohl at br.ibm.com Subject: RFR(M): PPC64: improve array copy stubs by using vector instructions Hi Martin Could you please host and review this webrev? Summary: * Add VSR registers to be used with VSX instruction set; * Add VSX load/store instructions (lxvd2x/stxvd2x) to mass copy in the stub for disjoint short copy in order to improve it. http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v4/ Thank you! Best regards, Gustavo From gromero at linux.vnet.ibm.com Mon May 23 15:53:45 2016 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 23 May 2016 12:53:45 -0300 Subject: RFR(M): PPC64: improve array copy stubs by using vector instructions In-Reply-To: <25cb11dfe7624a4a8848d049626413e7@DEWDFE13DE14.global.corp.sap> References: <201605231422.u4NEIb1g013944@mx0a-001b2d01.pphosted.com> <25cb11dfe7624a4a8848d049626413e7@DEWDFE13DE14.global.corp.sap> Message-ID: <201605231553.u4NFn9EW016604@mx0a-001b2d01.pphosted.com> Hi Martin Thank you for reviewing the change. Best regards, Gustavo On 23-05-2016 12:51, Doerr, Martin wrote: > Hi Gustavo, > > thanks for implementing it and taking care of my concerns. Looks good, now. > I will run tests and I can sponsor it after it was reviewed. > > Best regards, > Martin > > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Montag, 23. Mai 2016 16:22 > To: Doerr, Martin ; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net > Cc: Simonis, Volker ; brenohl at br.ibm.com > Subject: RFR(M): PPC64: improve array copy stubs by using vector instructions > > Hi Martin > > Could you please host and review this webrev? > > Summary: > > * Add VSR registers to be used with VSX instruction set; > * Add VSX load/store instructions (lxvd2x/stxvd2x) to mass copy in > the stub for disjoint short copy in order to improve it. > > http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v4/ > > Thank you! > > Best regards, > Gustavo > From david.holmes at oracle.com Tue May 24 03:49:49 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 24 May 2016 13:49:49 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> Message-ID: <9cff0b75-e234-e789-910d-d86154bba834@oracle.com> Hi Martin, On 23/05/2016 7:29 PM, Doerr, Martin wrote: > Hi David, > > here's the new webrev: > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/ There seems to be some confusion. You've moved the jbyte Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but the comments from Andrew and Kim were about moving the unsigned Atomic::cmpxchg version. ?? Aside: In the changeset contributor's have to be specified by "email address" or "name ", OpenJDK user names are not accepted. I think Andrew should also be listed there for the Aarch64 component. Thanks, David > Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change. > > Thanks for your time and your support. > > Best regards, > Martin > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Samstag, 21. Mai 2016 01:10 > To: Doerr, Martin > Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > > Hi Martin, > > Are you in a position to make the change now suggested by both Kim and > Andrew? Can you also include the Aarch64 code that Andrew provided: > > http://cr.openjdk.java.net/~aph/8154736 > > I'd like to get this finalized so it is ready to push as soon as the > process allows it to. > > Thanks, > David > > On 20/05/2016 8:03 AM, Kim Barrett wrote: >>> On May 18, 2016, at 6:12 AM, Doerr, Martin wrote: >>> >>> Hi Kim, >>> >>> thank you very much for the detailed review. >>> >>> I agree with your comments and I have made all your requested changes here: >>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ >>> >>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. >>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. >>> >>> For performance optimization, we should better use (or introduce additional) enum values. >> >> ------------------------------------------------------------------------------ >> There doesn't seem to have been any change for this earlier comment. >> >> src/share/vm/runtime/atomic.cpp >> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, >> 60 volatile unsigned int* dest, unsigned int compare_value, >> 61 cmpxchg_memory_order order) { >> >> I'm surprised this was ever out-of-line. But with this change it's >> quite bad to be out-of-line, as that's going to kill the constant >> propogation of the order value. >> >> ------------------------------------------------------------------------------ >> >> Other than that, looks good. >> >> >> >> From goetz.lindenmaier at sap.com Tue May 24 08:29:41 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 24 May 2016 08:29:41 +0000 Subject: RFR(M): PPC64: improve array copy stubs by using vector instructions In-Reply-To: <201605231553.u4NFq1Rr022712@mx0a-001b2d01.pphosted.com> References: <201605231422.u4NEIb1g013944@mx0a-001b2d01.pphosted.com> <25cb11dfe7624a4a8848d049626413e7@DEWDFE13DE14.global.corp.sap> <201605231553.u4NFq1Rr022712@mx0a-001b2d01.pphosted.com> Message-ID: <8b32b8882e964fd0b2ac0f22c94e389a@DEWDFE13DE09.global.corp.sap> Hi Gustavo, thanks for contributing this optimization to the ppc port! The change looks good, nice work. Next time, please use correct subject in the RFR mail, the bugID is missing. Also, address the RFR to everybody. This one you addressed to Martin. In general, you need several reviews. Martin, thanks for reviewing though! Martin, I think you can push this as it's ppc-only. Best regards, Goetz. > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Montag, 23. Mai 2016 17:54 > To: Doerr, Martin ; ppc-aix-port- > dev at openjdk.java.net; hotspot-dev at openjdk.java.net > Cc: Simonis, Volker ; brenohl at br.ibm.com; > Lindenmaier, Goetz > Subject: Re: RFR(M): PPC64: improve array copy stubs by using vector > instructions > > Hi Martin > > Thank you for reviewing the change. > > Best regards, > Gustavo > > On 23-05-2016 12:51, Doerr, Martin wrote: > > Hi Gustavo, > > > > thanks for implementing it and taking care of my concerns. Looks good, > now. > > I will run tests and I can sponsor it after it was reviewed. > > > > Best regards, > > Martin > > > > -----Original Message----- > > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > > Sent: Montag, 23. Mai 2016 16:22 > > To: Doerr, Martin ; ppc-aix-port- > dev at openjdk.java.net; hotspot-dev at openjdk.java.net > > Cc: Simonis, Volker ; brenohl at br.ibm.com > > Subject: RFR(M): PPC64: improve array copy stubs by using vector > instructions > > > > Hi Martin > > > > Could you please host and review this webrev? > > > > Summary: > > > > * Add VSR registers to be used with VSX instruction set; > > * Add VSX load/store instructions (lxvd2x/stxvd2x) to mass copy in > > the stub for disjoint short copy in order to improve it. > > > > http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v4/ > > > > Thank you! > > > > Best regards, > > Gustavo > > From martin.doerr at sap.com Tue May 24 09:37:50 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 24 May 2016 09:37:50 +0000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <9cff0b75-e234-e789-910d-d86154bba834@oracle.com> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> <9cff0b75-e234-e789-910d-d86154bba834@oracle.com> Message-ID: Hi David and Andrew, sorry for missing this one. There were too many emails. After moving the jint version as well, there was not much left of atomic.cpp. I think it doesn't make any sense to keep a couple of trivial functions in the cpp file. Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/ Best regards, Martin -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Dienstag, 24. Mai 2016 05:50 To: Doerr, Martin Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg Hi Martin, On 23/05/2016 7:29 PM, Doerr, Martin wrote: > Hi David, > > here's the new webrev: > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/ There seems to be some confusion. You've moved the jbyte Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but the comments from Andrew and Kim were about moving the unsigned Atomic::cmpxchg version. ?? Aside: In the changeset contributor's have to be specified by "email address" or "name ", OpenJDK user names are not accepted. I think Andrew should also be listed there for the Aarch64 component. Thanks, David > Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change. > > Thanks for your time and your support. > > Best regards, > Martin > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Samstag, 21. Mai 2016 01:10 > To: Doerr, Martin > Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > > Hi Martin, > > Are you in a position to make the change now suggested by both Kim and > Andrew? Can you also include the Aarch64 code that Andrew provided: > > http://cr.openjdk.java.net/~aph/8154736 > > I'd like to get this finalized so it is ready to push as soon as the > process allows it to. > > Thanks, > David > > On 20/05/2016 8:03 AM, Kim Barrett wrote: >>> On May 18, 2016, at 6:12 AM, Doerr, Martin wrote: >>> >>> Hi Kim, >>> >>> thank you very much for the detailed review. >>> >>> I agree with your comments and I have made all your requested changes here: >>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ >>> >>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. >>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. >>> >>> For performance optimization, we should better use (or introduce additional) enum values. >> >> ------------------------------------------------------------------------------ >> There doesn't seem to have been any change for this earlier comment. >> >> src/share/vm/runtime/atomic.cpp >> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, >> 60 volatile unsigned int* dest, unsigned int compare_value, >> 61 cmpxchg_memory_order order) { >> >> I'm surprised this was ever out-of-line. But with this change it's >> quite bad to be out-of-line, as that's going to kill the constant >> propogation of the order value. >> >> ------------------------------------------------------------------------------ >> >> Other than that, looks good. >> >> >> >> From david.holmes at oracle.com Tue May 24 10:03:56 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 24 May 2016 20:03:56 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> <9cff0b75-e234-e789-910d-d86154bba834@oracle.com> Message-ID: <275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com> On 24/05/2016 7:37 PM, Doerr, Martin wrote: > Hi David and Andrew, > > sorry for missing this one. There were too many emails. > > After moving the jint version as well, there was not much left of atomic.cpp. > I think it doesn't make any sense to keep a couple of trivial functions in the cpp file. > Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file. Sorry I don't understand why the jbyte cmpxchg_general was moved to the .inline.hpp file - it seems far too big to be inlined. David > Webrev is here: > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/ > > Best regards, > Martin > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Dienstag, 24. Mai 2016 05:50 > To: Doerr, Martin > Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > > Hi Martin, > > On 23/05/2016 7:29 PM, Doerr, Martin wrote: >> Hi David, >> >> here's the new webrev: >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/ > > There seems to be some confusion. You've moved the jbyte > Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but > the comments from Andrew and Kim were about moving the unsigned > Atomic::cmpxchg version. ?? > > Aside: In the changeset contributor's have to be specified by "email > address" or "name ", OpenJDK user names are not accepted. > I think Andrew should also be listed there for the Aarch64 component. > > Thanks, > David > >> Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change. >> >> Thanks for your time and your support. >> >> Best regards, >> Martin >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Samstag, 21. Mai 2016 01:10 >> To: Doerr, Martin >> Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >> >> Hi Martin, >> >> Are you in a position to make the change now suggested by both Kim and >> Andrew? Can you also include the Aarch64 code that Andrew provided: >> >> http://cr.openjdk.java.net/~aph/8154736 >> >> I'd like to get this finalized so it is ready to push as soon as the >> process allows it to. >> >> Thanks, >> David >> >> On 20/05/2016 8:03 AM, Kim Barrett wrote: >>>> On May 18, 2016, at 6:12 AM, Doerr, Martin wrote: >>>> >>>> Hi Kim, >>>> >>>> thank you very much for the detailed review. >>>> >>>> I agree with your comments and I have made all your requested changes here: >>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ >>>> >>>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. >>>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. >>>> >>>> For performance optimization, we should better use (or introduce additional) enum values. >>> >>> ------------------------------------------------------------------------------ >>> There doesn't seem to have been any change for this earlier comment. >>> >>> src/share/vm/runtime/atomic.cpp >>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, >>> 60 volatile unsigned int* dest, unsigned int compare_value, >>> 61 cmpxchg_memory_order order) { >>> >>> I'm surprised this was ever out-of-line. But with this change it's >>> quite bad to be out-of-line, as that's going to kill the constant >>> propogation of the order value. >>> >>> ------------------------------------------------------------------------------ >>> >>> Other than that, looks good. >>> >>> >>> >>> From martin.doerr at sap.com Tue May 24 10:21:59 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 24 May 2016 10:21:59 +0000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> <9cff0b75-e234-e789-910d-d86154bba834@oracle.com> <275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com> Message-ID: <9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap> Hi David, it was moved for the same reason as the jint version of cmpxchg: It passes the memory order to the jint version. It may look large in terms of C++ code, but there's not much substantial content. I can only see a loop which calls the jint version + a bunch of very simple operations. Why shouldn't we give compilers a chance to inline and possibly optimize some of the simple operations and especially to eliminate the order check? Best regards, Martin -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Dienstag, 24. Mai 2016 12:04 To: Doerr, Martin ; Andrew Haley (aph at redhat.com) Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg On 24/05/2016 7:37 PM, Doerr, Martin wrote: > Hi David and Andrew, > > sorry for missing this one. There were too many emails. > > After moving the jint version as well, there was not much left of atomic.cpp. > I think it doesn't make any sense to keep a couple of trivial functions in the cpp file. > Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file. Sorry I don't understand why the jbyte cmpxchg_general was moved to the .inline.hpp file - it seems far too big to be inlined. David > Webrev is here: > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/ > > Best regards, > Martin > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Dienstag, 24. Mai 2016 05:50 > To: Doerr, Martin > Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > > Hi Martin, > > On 23/05/2016 7:29 PM, Doerr, Martin wrote: >> Hi David, >> >> here's the new webrev: >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/ > > There seems to be some confusion. You've moved the jbyte > Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but > the comments from Andrew and Kim were about moving the unsigned > Atomic::cmpxchg version. ?? > > Aside: In the changeset contributor's have to be specified by "email > address" or "name ", OpenJDK user names are not accepted. > I think Andrew should also be listed there for the Aarch64 component. > > Thanks, > David > >> Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change. >> >> Thanks for your time and your support. >> >> Best regards, >> Martin >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Samstag, 21. Mai 2016 01:10 >> To: Doerr, Martin >> Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >> >> Hi Martin, >> >> Are you in a position to make the change now suggested by both Kim and >> Andrew? Can you also include the Aarch64 code that Andrew provided: >> >> http://cr.openjdk.java.net/~aph/8154736 >> >> I'd like to get this finalized so it is ready to push as soon as the >> process allows it to. >> >> Thanks, >> David >> >> On 20/05/2016 8:03 AM, Kim Barrett wrote: >>>> On May 18, 2016, at 6:12 AM, Doerr, Martin wrote: >>>> >>>> Hi Kim, >>>> >>>> thank you very much for the detailed review. >>>> >>>> I agree with your comments and I have made all your requested changes here: >>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ >>>> >>>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. >>>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. >>>> >>>> For performance optimization, we should better use (or introduce additional) enum values. >>> >>> ------------------------------------------------------------------------------ >>> There doesn't seem to have been any change for this earlier comment. >>> >>> src/share/vm/runtime/atomic.cpp >>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, >>> 60 volatile unsigned int* dest, unsigned int compare_value, >>> 61 cmpxchg_memory_order order) { >>> >>> I'm surprised this was ever out-of-line. But with this change it's >>> quite bad to be out-of-line, as that's going to kill the constant >>> propogation of the order value. >>> >>> ------------------------------------------------------------------------------ >>> >>> Other than that, looks good. >>> >>> >>> >>> From david.holmes at oracle.com Tue May 24 12:26:31 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 24 May 2016 22:26:31 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> <9cff0b75-e234-e789-910d-d86154bba834@oracle.com> <275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com> <9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap> Message-ID: Hi Martin, On 24/05/2016 8:21 PM, Doerr, Martin wrote: > Hi David, > > it was moved for the same reason as the jint version of cmpxchg: It passes the memory order to the jint version. > It may look large in terms of C++ code, but there's not much substantial content. > I can only see a loop which calls the jint version + a bunch of very simple operations. > Why shouldn't we give compilers a chance to inline and possibly optimize some of the simple operations and especially to eliminate the order check? I think this forces the compiler to inline it, not just "gives it a chance". But I'll leave it to those more knowledgeable about the compiler side of this to comment. But if we're making these changes can you delete the Atomic::add(jlong) - it is unused and incorrect as discussed here: http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-February/021620.html Thanks, David > Best regards, > Martin > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Dienstag, 24. Mai 2016 12:04 > To: Doerr, Martin ; Andrew Haley (aph at redhat.com) > Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > > On 24/05/2016 7:37 PM, Doerr, Martin wrote: >> Hi David and Andrew, >> >> sorry for missing this one. There were too many emails. >> >> After moving the jint version as well, there was not much left of atomic.cpp. >> I think it doesn't make any sense to keep a couple of trivial functions in the cpp file. >> Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file. > > Sorry I don't understand why the jbyte cmpxchg_general was moved to the > .inline.hpp file - it seems far too big to be inlined. > > David > >> Webrev is here: >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/ >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Dienstag, 24. Mai 2016 05:50 >> To: Doerr, Martin >> Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >> >> Hi Martin, >> >> On 23/05/2016 7:29 PM, Doerr, Martin wrote: >>> Hi David, >>> >>> here's the new webrev: >>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/ >> >> There seems to be some confusion. You've moved the jbyte >> Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but >> the comments from Andrew and Kim were about moving the unsigned >> Atomic::cmpxchg version. ?? >> >> Aside: In the changeset contributor's have to be specified by "email >> address" or "name ", OpenJDK user names are not accepted. >> I think Andrew should also be listed there for the Aarch64 component. >> >> Thanks, >> David >> >>> Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change. >>> >>> Thanks for your time and your support. >>> >>> Best regards, >>> Martin >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Samstag, 21. Mai 2016 01:10 >>> To: Doerr, Martin >>> Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>> >>> Hi Martin, >>> >>> Are you in a position to make the change now suggested by both Kim and >>> Andrew? Can you also include the Aarch64 code that Andrew provided: >>> >>> http://cr.openjdk.java.net/~aph/8154736 >>> >>> I'd like to get this finalized so it is ready to push as soon as the >>> process allows it to. >>> >>> Thanks, >>> David >>> >>> On 20/05/2016 8:03 AM, Kim Barrett wrote: >>>>> On May 18, 2016, at 6:12 AM, Doerr, Martin wrote: >>>>> >>>>> Hi Kim, >>>>> >>>>> thank you very much for the detailed review. >>>>> >>>>> I agree with your comments and I have made all your requested changes here: >>>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ >>>>> >>>>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. >>>>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. >>>>> >>>>> For performance optimization, we should better use (or introduce additional) enum values. >>>> >>>> ------------------------------------------------------------------------------ >>>> There doesn't seem to have been any change for this earlier comment. >>>> >>>> src/share/vm/runtime/atomic.cpp >>>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, >>>> 60 volatile unsigned int* dest, unsigned int compare_value, >>>> 61 cmpxchg_memory_order order) { >>>> >>>> I'm surprised this was ever out-of-line. But with this change it's >>>> quite bad to be out-of-line, as that's going to kill the constant >>>> propogation of the order value. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> Other than that, looks good. >>>> >>>> >>>> >>>> From martin.doerr at sap.com Tue May 24 13:06:45 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 24 May 2016 13:06:45 +0000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> <9cff0b75-e234-e789-910d-d86154bba834@oracle.com> <275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com> <9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap> Message-ID: Hi David, unfortunately, Atomic::add(jlong) is used by mallocTracker.hpp (e.g. line 56). Removing it breaks the build. But I could change it as follows: inline jlong Atomic::add(jlong add_value, volatile jlong* dest) { #ifdef _LP64 return (jlong) add_ptr((intptr_t) add_value, (volatile intptr_t*) dest); #else jlong old = load(dest); jlong new_value = old + add_value; while (old != cmpxchg(new_value, dest, old)) { old = load(dest); new_value = old + add_value; } return new_value; #endif } Best regards, Martin -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Dienstag, 24. Mai 2016 14:27 To: Doerr, Martin ; Andrew Haley (aph at redhat.com) Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg Hi Martin, On 24/05/2016 8:21 PM, Doerr, Martin wrote: > Hi David, > > it was moved for the same reason as the jint version of cmpxchg: It passes the memory order to the jint version. > It may look large in terms of C++ code, but there's not much substantial content. > I can only see a loop which calls the jint version + a bunch of very simple operations. > Why shouldn't we give compilers a chance to inline and possibly optimize some of the simple operations and especially to eliminate the order check? I think this forces the compiler to inline it, not just "gives it a chance". But I'll leave it to those more knowledgeable about the compiler side of this to comment. But if we're making these changes can you delete the Atomic::add(jlong) - it is unused and incorrect as discussed here: http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-February/021620.html Thanks, David > Best regards, > Martin > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Dienstag, 24. Mai 2016 12:04 > To: Doerr, Martin ; Andrew Haley (aph at redhat.com) > Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > > On 24/05/2016 7:37 PM, Doerr, Martin wrote: >> Hi David and Andrew, >> >> sorry for missing this one. There were too many emails. >> >> After moving the jint version as well, there was not much left of atomic.cpp. >> I think it doesn't make any sense to keep a couple of trivial functions in the cpp file. >> Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file. > > Sorry I don't understand why the jbyte cmpxchg_general was moved to the > .inline.hpp file - it seems far too big to be inlined. > > David > >> Webrev is here: >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/ >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Dienstag, 24. Mai 2016 05:50 >> To: Doerr, Martin >> Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >> >> Hi Martin, >> >> On 23/05/2016 7:29 PM, Doerr, Martin wrote: >>> Hi David, >>> >>> here's the new webrev: >>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/ >> >> There seems to be some confusion. You've moved the jbyte >> Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but >> the comments from Andrew and Kim were about moving the unsigned >> Atomic::cmpxchg version. ?? >> >> Aside: In the changeset contributor's have to be specified by "email >> address" or "name ", OpenJDK user names are not accepted. >> I think Andrew should also be listed there for the Aarch64 component. >> >> Thanks, >> David >> >>> Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change. >>> >>> Thanks for your time and your support. >>> >>> Best regards, >>> Martin >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Samstag, 21. Mai 2016 01:10 >>> To: Doerr, Martin >>> Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>> >>> Hi Martin, >>> >>> Are you in a position to make the change now suggested by both Kim and >>> Andrew? Can you also include the Aarch64 code that Andrew provided: >>> >>> http://cr.openjdk.java.net/~aph/8154736 >>> >>> I'd like to get this finalized so it is ready to push as soon as the >>> process allows it to. >>> >>> Thanks, >>> David >>> >>> On 20/05/2016 8:03 AM, Kim Barrett wrote: >>>>> On May 18, 2016, at 6:12 AM, Doerr, Martin wrote: >>>>> >>>>> Hi Kim, >>>>> >>>>> thank you very much for the detailed review. >>>>> >>>>> I agree with your comments and I have made all your requested changes here: >>>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ >>>>> >>>>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. >>>>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. >>>>> >>>>> For performance optimization, we should better use (or introduce additional) enum values. >>>> >>>> ------------------------------------------------------------------------------ >>>> There doesn't seem to have been any change for this earlier comment. >>>> >>>> src/share/vm/runtime/atomic.cpp >>>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, >>>> 60 volatile unsigned int* dest, unsigned int compare_value, >>>> 61 cmpxchg_memory_order order) { >>>> >>>> I'm surprised this was ever out-of-line. But with this change it's >>>> quite bad to be out-of-line, as that's going to kill the constant >>>> propogation of the order value. >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> Other than that, looks good. >>>> >>>> >>>> >>>> From gromero at linux.vnet.ibm.com Tue May 24 13:59:58 2016 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 24 May 2016 10:59:58 -0300 Subject: RFR(M): PPC64: improve array copy stubs by using vector instructions In-Reply-To: <8b32b8882e964fd0b2ac0f22c94e389a@DEWDFE13DE09.global.corp.sap> References: <201605231422.u4NEIb1g013944@mx0a-001b2d01.pphosted.com> <25cb11dfe7624a4a8848d049626413e7@DEWDFE13DE14.global.corp.sap> <201605231553.u4NFq1Rr022712@mx0a-001b2d01.pphosted.com> <8b32b8882e964fd0b2ac0f22c94e389a@DEWDFE13DE09.global.corp.sap> Message-ID: <201605241400.u4ODtLeu022907@mx0a-001b2d01.pphosted.com> Hi Goetz I'm happy to be contributing to the ppc port! Sorry, I didn't realize that bugID was missing in the subject. Next time I'll pay attention on that and also address the RFR to everybody, sure. Thanks for point that out. Thanks a lot for reviewing the change. Best regards, Gustavo On 24-05-2016 05:29, Lindenmaier, Goetz wrote: > Hi Gustavo, > > thanks for contributing this optimization to the ppc port! > > The change looks good, nice work. > > Next time, please use correct subject in the RFR mail, the bugID is missing. > Also, address the RFR to everybody. This one you addressed to Martin. > In general, you need several reviews. > Martin, thanks for reviewing though! > > Martin, I think you can push this as it's ppc-only. > > Best regards, > Goetz. > > > >> -----Original Message----- >> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >> Sent: Montag, 23. Mai 2016 17:54 >> To: Doerr, Martin ; ppc-aix-port- >> dev at openjdk.java.net; hotspot-dev at openjdk.java.net >> Cc: Simonis, Volker ; brenohl at br.ibm.com; >> Lindenmaier, Goetz >> Subject: Re: RFR(M): PPC64: improve array copy stubs by using vector >> instructions >> >> Hi Martin >> >> Thank you for reviewing the change. >> >> Best regards, >> Gustavo >> >> On 23-05-2016 12:51, Doerr, Martin wrote: >>> Hi Gustavo, >>> >>> thanks for implementing it and taking care of my concerns. Looks good, >> now. >>> I will run tests and I can sponsor it after it was reviewed. >>> >>> Best regards, >>> Martin >>> >>> -----Original Message----- >>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] >>> Sent: Montag, 23. Mai 2016 16:22 >>> To: Doerr, Martin ; ppc-aix-port- >> dev at openjdk.java.net; hotspot-dev at openjdk.java.net >>> Cc: Simonis, Volker ; brenohl at br.ibm.com >>> Subject: RFR(M): PPC64: improve array copy stubs by using vector >> instructions >>> >>> Hi Martin >>> >>> Could you please host and review this webrev? >>> >>> Summary: >>> >>> * Add VSR registers to be used with VSX instruction set; >>> * Add VSX load/store instructions (lxvd2x/stxvd2x) to mass copy in >>> the stub for disjoint short copy in order to improve it. >>> >>> http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v4/ >>> >>> Thank you! >>> >>> Best regards, >>> Gustavo >>> > From ENOMIKI at jp.ibm.com Tue May 24 15:15:13 2016 From: ENOMIKI at jp.ibm.com (Miki M Enoki) Date: Wed, 25 May 2016 00:15:13 +0900 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: <573A034C.9060602@br.ibm.com> References: <56FEDBB3.5030106@linux.vnet.ibm.com><57339EE1.2040500@linux.vnet.ibm.com> <573A034C.9060602@br.ibm.com> Message-ID: <201605241515.u4OFFOkQ016415@d19av06.sagamino.japan.ibm.com> Hi Breno, Thank you for your reply. >The same mechanism could be used to copy arrays of short elements, as Gustavo was >working on. Do you agree? I think the mechanism is different with type (byte, short, int, long...). Gustavo will apply a pach with VSX for short array copy, so it would be reasonable to use VSX instruction for long array copy, too. My coworker is also creating byte and int arraycopy with VSX. He will post an email to this mailing list. I appreciate it if our patch for byte, int and long copy is applied to OpenJDK. Best regards, Miki From: Breno Leitao To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" , Cc: Gustavo Romero , Volker Simonis , "Simonis, Volker" , "ppc-aix-port-dev at openjdk.java.net" , "hotspot-dev at openjdk.java.net" Date: 2016/05/17 02:29 Subject: Re: PPC64 VSX load/store instructions in stubs Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote: > I also implemented VSX disjoint long arraycopy. > I appreciate it if it is applied to OpenJDK, too. Thanks for the summarized information, this is helpful. Based on your plot, I understand we can split the whole scenario in two: * Array size smaller than 4k, and then use VSX instructions to perform copy * Array size bigger than 4k, and then use VMX instructions to perform copy The same mechanism could be used to copy arrays of short elements, as Gustavo was working on. Do you agree? That said, I understand that a new patch should be generated that contemplates both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence a webrev should be generated mapping to bug id https://bugs.openjdk.java.net/browse/JDK-8154156 If you need any help on the webrev[1] creation and hosting, Gustavo might help, since he did this process already. [1] http://openjdk.java.net/guide/webrevHelp.html -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From david.holmes at oracle.com Tue May 24 20:18:49 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 25 May 2016 06:18:49 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> <9cff0b75-e234-e789-910d-d86154bba834@oracle.com> <275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com> <9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap> Message-ID: On 24/05/2016 11:06 PM, Doerr, Martin wrote: > Hi David, > > unfortunately, Atomic::add(jlong) is used by mallocTracker.hpp (e.g. line 56). Removing it breaks the build. Yeah I only discovered that this morning when I checked my test build results. That in itself is a bug as Zhengyu has noted. > But I could change it as follows: No - thanks - lets just leave this part for another day. Thanks, David > inline jlong Atomic::add(jlong add_value, volatile jlong* dest) { > #ifdef _LP64 > return (jlong) add_ptr((intptr_t) add_value, (volatile intptr_t*) dest); > #else > jlong old = load(dest); > jlong new_value = old + add_value; > while (old != cmpxchg(new_value, dest, old)) { > old = load(dest); > new_value = old + add_value; > } > return new_value; > #endif > } > > Best regards, > Martin > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Dienstag, 24. Mai 2016 14:27 > To: Doerr, Martin ; Andrew Haley (aph at redhat.com) > Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > > Hi Martin, > > On 24/05/2016 8:21 PM, Doerr, Martin wrote: >> Hi David, >> >> it was moved for the same reason as the jint version of cmpxchg: It passes the memory order to the jint version. >> It may look large in terms of C++ code, but there's not much substantial content. >> I can only see a loop which calls the jint version + a bunch of very simple operations. >> Why shouldn't we give compilers a chance to inline and possibly optimize some of the simple operations and especially to eliminate the order check? > > I think this forces the compiler to inline it, not just "gives it a > chance". But I'll leave it to those more knowledgeable about the > compiler side of this to comment. > > But if we're making these changes can you delete the Atomic::add(jlong) > - it is unused and incorrect as discussed here: > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-February/021620.html > > Thanks, > David > >> Best regards, >> Martin >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Dienstag, 24. Mai 2016 12:04 >> To: Doerr, Martin ; Andrew Haley (aph at redhat.com) >> Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >> >> On 24/05/2016 7:37 PM, Doerr, Martin wrote: >>> Hi David and Andrew, >>> >>> sorry for missing this one. There were too many emails. >>> >>> After moving the jint version as well, there was not much left of atomic.cpp. >>> I think it doesn't make any sense to keep a couple of trivial functions in the cpp file. >>> Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file. >> >> Sorry I don't understand why the jbyte cmpxchg_general was moved to the >> .inline.hpp file - it seems far too big to be inlined. >> >> David >> >>> Webrev is here: >>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/ >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Dienstag, 24. Mai 2016 05:50 >>> To: Doerr, Martin >>> Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>> >>> Hi Martin, >>> >>> On 23/05/2016 7:29 PM, Doerr, Martin wrote: >>>> Hi David, >>>> >>>> here's the new webrev: >>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/ >>> >>> There seems to be some confusion. You've moved the jbyte >>> Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but >>> the comments from Andrew and Kim were about moving the unsigned >>> Atomic::cmpxchg version. ?? >>> >>> Aside: In the changeset contributor's have to be specified by "email >>> address" or "name ", OpenJDK user names are not accepted. >>> I think Andrew should also be listed there for the Aarch64 component. >>> >>> Thanks, >>> David >>> >>>> Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change. >>>> >>>> Thanks for your time and your support. >>>> >>>> Best regards, >>>> Martin >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Samstag, 21. Mai 2016 01:10 >>>> To: Doerr, Martin >>>> Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>>> >>>> Hi Martin, >>>> >>>> Are you in a position to make the change now suggested by both Kim and >>>> Andrew? Can you also include the Aarch64 code that Andrew provided: >>>> >>>> http://cr.openjdk.java.net/~aph/8154736 >>>> >>>> I'd like to get this finalized so it is ready to push as soon as the >>>> process allows it to. >>>> >>>> Thanks, >>>> David >>>> >>>> On 20/05/2016 8:03 AM, Kim Barrett wrote: >>>>>> On May 18, 2016, at 6:12 AM, Doerr, Martin wrote: >>>>>> >>>>>> Hi Kim, >>>>>> >>>>>> thank you very much for the detailed review. >>>>>> >>>>>> I agree with your comments and I have made all your requested changes here: >>>>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ >>>>>> >>>>>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. >>>>>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. >>>>>> >>>>>> For performance optimization, we should better use (or introduce additional) enum values. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> There doesn't seem to have been any change for this earlier comment. >>>>> >>>>> src/share/vm/runtime/atomic.cpp >>>>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, >>>>> 60 volatile unsigned int* dest, unsigned int compare_value, >>>>> 61 cmpxchg_memory_order order) { >>>>> >>>>> I'm surprised this was ever out-of-line. But with this change it's >>>>> quite bad to be out-of-line, as that's going to kill the constant >>>>> propogation of the order value. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> Other than that, looks good. >>>>> >>>>> >>>>> >>>>> From david.holmes at oracle.com Tue May 24 20:19:30 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 25 May 2016 06:19:30 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <57447B34.3080608@redhat.com> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> <9cff0b75-e234-e789-910d-d86154bba834@oracle.com> <275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com> <9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap> <5744758A.3080405@redhat.com> <57447B34.3080608@redhat.com> Message-ID: <78cd5827-0b3d-da2b-21ea-8a01edde0405@oracle.com> On 25/05/2016 2:03 AM, Zhengyu Gu wrote: > > On 05/24/2016 11:38 AM, Zhengyu Gu wrote: >> >> >> On 05/24/2016 09:06 AM, Doerr, Martin wrote: >>> Hi David, >>> >>> unfortunately, Atomic::add(jlong) is used by mallocTracker.hpp (e.g. >>> line 56). Removing it breaks the build. >> It should be replaced with size_t version in mallocTracker.hpp. >> > I created https://bugs.openjdk.java.net/browse/JDK-8157709 for this. Thanks Zhengyu. David > -Zhengyu > >> -Zhengyu >> >> >> >>> >>> But I could change it as follows: >>> inline jlong Atomic::add(jlong add_value, volatile jlong* dest) { >>> #ifdef _LP64 >>> return (jlong) add_ptr((intptr_t) add_value, (volatile intptr_t*) >>> dest); >>> #else >>> jlong old = load(dest); >>> jlong new_value = old + add_value; >>> while (old != cmpxchg(new_value, dest, old)) { >>> old = load(dest); >>> new_value = old + add_value; >>> } >>> return new_value; >>> #endif >>> } >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Dienstag, 24. Mai 2016 14:27 >>> To: Doerr, Martin ; Andrew Haley >>> (aph at redhat.com) >>> Cc: Hiroshi H Horii ; Tim Ellison >>> ; ppc-aix-port-dev at openjdk.java.net; >>> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>> >>> Hi Martin, >>> >>> On 24/05/2016 8:21 PM, Doerr, Martin wrote: >>>> Hi David, >>>> >>>> it was moved for the same reason as the jint version of cmpxchg: It >>>> passes the memory order to the jint version. >>>> It may look large in terms of C++ code, but there's not much >>>> substantial content. >>>> I can only see a loop which calls the jint version + a bunch of very >>>> simple operations. >>>> Why shouldn't we give compilers a chance to inline and possibly >>>> optimize some of the simple operations and especially to eliminate >>>> the order check? >>> I think this forces the compiler to inline it, not just "gives it a >>> chance". But I'll leave it to those more knowledgeable about the >>> compiler side of this to comment. >>> >>> But if we're making these changes can you delete the Atomic::add(jlong) >>> - it is unused and incorrect as discussed here: >>> >>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-February/021620.html >>> >>> >>> Thanks, >>> David >>> >>>> Best regards, >>>> Martin >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Dienstag, 24. Mai 2016 12:04 >>>> To: Doerr, Martin ; Andrew Haley >>>> (aph at redhat.com) >>>> Cc: Hiroshi H Horii ; Tim Ellison >>>> ; ppc-aix-port-dev at openjdk.java.net; >>>> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>>> >>>> On 24/05/2016 7:37 PM, Doerr, Martin wrote: >>>>> Hi David and Andrew, >>>>> >>>>> sorry for missing this one. There were too many emails. >>>>> >>>>> After moving the jint version as well, there was not much left of >>>>> atomic.cpp. >>>>> I think it doesn't make any sense to keep a couple of trivial >>>>> functions in the cpp file. >>>>> Therefore, I have removed atomic.cpp and moved the remaining small >>>>> functions into the inline file. >>>> Sorry I don't understand why the jbyte cmpxchg_general was moved to the >>>> .inline.hpp file - it seems far too big to be inlined. >>>> >>>> David >>>> >>>>> Webrev is here: >>>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/ >>>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>> Sent: Dienstag, 24. Mai 2016 05:50 >>>>> To: Doerr, Martin >>>>> Cc: Hiroshi H Horii ; Tim Ellison >>>>> ; ppc-aix-port-dev at openjdk.java.net; >>>>> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>>>> >>>>> Hi Martin, >>>>> >>>>> On 23/05/2016 7:29 PM, Doerr, Martin wrote: >>>>>> Hi David, >>>>>> >>>>>> here's the new webrev: >>>>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/ >>>>> There seems to be some confusion. You've moved the jbyte >>>>> Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, >>>>> but >>>>> the comments from Andrew and Kim were about moving the unsigned >>>>> Atomic::cmpxchg version. ?? >>>>> >>>>> Aside: In the changeset contributor's have to be specified by "email >>>>> address" or "name ", OpenJDK user names are not >>>>> accepted. >>>>> I think Andrew should also be listed there for the Aarch64 component. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Btw.: The jbyte version of cmpxchg can be implemented on aarch >>>>>> like on ppc where we emulate the byte access by a 4 byte access >>>>>> (lwarx/stwcx). But that should better be done in a separate change. >>>>>> >>>>>> Thanks for your time and your support. >>>>>> >>>>>> Best regards, >>>>>> Martin >>>>>> >>>>>> -----Original Message----- >>>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>>> Sent: Samstag, 21. Mai 2016 01:10 >>>>>> To: Doerr, Martin >>>>>> Cc: Hiroshi H Horii ; Tim Ellison >>>>>> ; ppc-aix-port-dev at openjdk.java.net; >>>>>> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>>>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>>>>> >>>>>> Hi Martin, >>>>>> >>>>>> Are you in a position to make the change now suggested by both Kim >>>>>> and >>>>>> Andrew? Can you also include the Aarch64 code that Andrew provided: >>>>>> >>>>>> http://cr.openjdk.java.net/~aph/8154736 >>>>>> >>>>>> I'd like to get this finalized so it is ready to push as soon as the >>>>>> process allows it to. >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 20/05/2016 8:03 AM, Kim Barrett wrote: >>>>>>>> On May 18, 2016, at 6:12 AM, Doerr, Martin >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi Kim, >>>>>>>> >>>>>>>> thank you very much for the detailed review. >>>>>>>> >>>>>>>> I agree with your comments and I have made all your requested >>>>>>>> changes here: >>>>>>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ >>>>>>>> >>>>>>>> >>>>>>>> It's correct that the change changes the semantics of the >>>>>>>> conservative cmpxchg. In case of failure, we also execute the >>>>>>>> sync instruction, now. >>>>>>>> Advantage is that the new implementation is maximum conservative >>>>>>>> by default. I think this makes sense as long as the semantics of >>>>>>>> the hotspot C++ cmpxchg are not clearly specified. >>>>>>>> >>>>>>>> For performance optimization, we should better use (or introduce >>>>>>>> additional) enum values. >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> There doesn't seem to have been any change for this earlier comment. >>>>>>> >>>>>>> src/share/vm/runtime/atomic.cpp >>>>>>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, >>>>>>> 60 volatile unsigned int* dest, >>>>>>> unsigned int compare_value, >>>>>>> 61 cmpxchg_memory_order order) { >>>>>>> >>>>>>> I'm surprised this was ever out-of-line. But with this change it's >>>>>>> quite bad to be out-of-line, as that's going to kill the constant >>>>>>> propogation of the order value. >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> >>>>>>> >>>>>>> Other than that, looks good. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >> > From zgu at redhat.com Tue May 24 15:38:50 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 24 May 2016 11:38:50 -0400 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> <9cff0b75-e234-e789-910d-d86154bba834@oracle.com> <275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com> <9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap> Message-ID: <5744758A.3080405@redhat.com> On 05/24/2016 09:06 AM, Doerr, Martin wrote: > Hi David, > > unfortunately, Atomic::add(jlong) is used by mallocTracker.hpp (e.g. line 56). Removing it breaks the build. It should be replaced with size_t version in mallocTracker.hpp. -Zhengyu > > But I could change it as follows: > inline jlong Atomic::add(jlong add_value, volatile jlong* dest) { > #ifdef _LP64 > return (jlong) add_ptr((intptr_t) add_value, (volatile intptr_t*) dest); > #else > jlong old = load(dest); > jlong new_value = old + add_value; > while (old != cmpxchg(new_value, dest, old)) { > old = load(dest); > new_value = old + add_value; > } > return new_value; > #endif > } > > Best regards, > Martin > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Dienstag, 24. Mai 2016 14:27 > To: Doerr, Martin ; Andrew Haley (aph at redhat.com) > Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg > > Hi Martin, > > On 24/05/2016 8:21 PM, Doerr, Martin wrote: >> Hi David, >> >> it was moved for the same reason as the jint version of cmpxchg: It passes the memory order to the jint version. >> It may look large in terms of C++ code, but there's not much substantial content. >> I can only see a loop which calls the jint version + a bunch of very simple operations. >> Why shouldn't we give compilers a chance to inline and possibly optimize some of the simple operations and especially to eliminate the order check? > I think this forces the compiler to inline it, not just "gives it a > chance". But I'll leave it to those more knowledgeable about the > compiler side of this to comment. > > But if we're making these changes can you delete the Atomic::add(jlong) > - it is unused and incorrect as discussed here: > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-February/021620.html > > Thanks, > David > >> Best regards, >> Martin >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Dienstag, 24. Mai 2016 12:04 >> To: Doerr, Martin ; Andrew Haley (aph at redhat.com) >> Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >> >> On 24/05/2016 7:37 PM, Doerr, Martin wrote: >>> Hi David and Andrew, >>> >>> sorry for missing this one. There were too many emails. >>> >>> After moving the jint version as well, there was not much left of atomic.cpp. >>> I think it doesn't make any sense to keep a couple of trivial functions in the cpp file. >>> Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file. >> Sorry I don't understand why the jbyte cmpxchg_general was moved to the >> .inline.hpp file - it seems far too big to be inlined. >> >> David >> >>> Webrev is here: >>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/ >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Dienstag, 24. Mai 2016 05:50 >>> To: Doerr, Martin >>> Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>> >>> Hi Martin, >>> >>> On 23/05/2016 7:29 PM, Doerr, Martin wrote: >>>> Hi David, >>>> >>>> here's the new webrev: >>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/ >>> There seems to be some confusion. You've moved the jbyte >>> Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but >>> the comments from Andrew and Kim were about moving the unsigned >>> Atomic::cmpxchg version. ?? >>> >>> Aside: In the changeset contributor's have to be specified by "email >>> address" or "name ", OpenJDK user names are not accepted. >>> I think Andrew should also be listed there for the Aarch64 component. >>> >>> Thanks, >>> David >>> >>>> Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change. >>>> >>>> Thanks for your time and your support. >>>> >>>> Best regards, >>>> Martin >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Samstag, 21. Mai 2016 01:10 >>>> To: Doerr, Martin >>>> Cc: Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>>> >>>> Hi Martin, >>>> >>>> Are you in a position to make the change now suggested by both Kim and >>>> Andrew? Can you also include the Aarch64 code that Andrew provided: >>>> >>>> http://cr.openjdk.java.net/~aph/8154736 >>>> >>>> I'd like to get this finalized so it is ready to push as soon as the >>>> process allows it to. >>>> >>>> Thanks, >>>> David >>>> >>>> On 20/05/2016 8:03 AM, Kim Barrett wrote: >>>>>> On May 18, 2016, at 6:12 AM, Doerr, Martin wrote: >>>>>> >>>>>> Hi Kim, >>>>>> >>>>>> thank you very much for the detailed review. >>>>>> >>>>>> I agree with your comments and I have made all your requested changes here: >>>>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ >>>>>> >>>>>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now. >>>>>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified. >>>>>> >>>>>> For performance optimization, we should better use (or introduce additional) enum values. >>>>> ------------------------------------------------------------------------------ >>>>> There doesn't seem to have been any change for this earlier comment. >>>>> >>>>> src/share/vm/runtime/atomic.cpp >>>>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, >>>>> 60 volatile unsigned int* dest, unsigned int compare_value, >>>>> 61 cmpxchg_memory_order order) { >>>>> >>>>> I'm surprised this was ever out-of-line. But with this change it's >>>>> quite bad to be out-of-line, as that's going to kill the constant >>>>> propogation of the order value. >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> Other than that, looks good. >>>>> >>>>> >>>>> >>>>> From zgu at redhat.com Tue May 24 16:03:00 2016 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 24 May 2016 12:03:00 -0400 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: <5744758A.3080405@redhat.com> References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> <9cff0b75-e234-e789-910d-d86154bba834@oracle.com> <275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com> <9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap> <5744758A.3080405@redhat.com> Message-ID: <57447B34.3080608@redhat.com> On 05/24/2016 11:38 AM, Zhengyu Gu wrote: > > > On 05/24/2016 09:06 AM, Doerr, Martin wrote: >> Hi David, >> >> unfortunately, Atomic::add(jlong) is used by mallocTracker.hpp (e.g. >> line 56). Removing it breaks the build. > It should be replaced with size_t version in mallocTracker.hpp. > I created https://bugs.openjdk.java.net/browse/JDK-8157709 for this. -Zhengyu > -Zhengyu > > > >> >> But I could change it as follows: >> inline jlong Atomic::add(jlong add_value, volatile jlong* dest) { >> #ifdef _LP64 >> return (jlong) add_ptr((intptr_t) add_value, (volatile intptr_t*) >> dest); >> #else >> jlong old = load(dest); >> jlong new_value = old + add_value; >> while (old != cmpxchg(new_value, dest, old)) { >> old = load(dest); >> new_value = old + add_value; >> } >> return new_value; >> #endif >> } >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Dienstag, 24. Mai 2016 14:27 >> To: Doerr, Martin ; Andrew Haley >> (aph at redhat.com) >> Cc: Hiroshi H Horii ; Tim Ellison >> ; ppc-aix-port-dev at openjdk.java.net; >> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >> >> Hi Martin, >> >> On 24/05/2016 8:21 PM, Doerr, Martin wrote: >>> Hi David, >>> >>> it was moved for the same reason as the jint version of cmpxchg: It >>> passes the memory order to the jint version. >>> It may look large in terms of C++ code, but there's not much >>> substantial content. >>> I can only see a loop which calls the jint version + a bunch of very >>> simple operations. >>> Why shouldn't we give compilers a chance to inline and possibly >>> optimize some of the simple operations and especially to eliminate >>> the order check? >> I think this forces the compiler to inline it, not just "gives it a >> chance". But I'll leave it to those more knowledgeable about the >> compiler side of this to comment. >> >> But if we're making these changes can you delete the Atomic::add(jlong) >> - it is unused and incorrect as discussed here: >> >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-February/021620.html >> >> >> Thanks, >> David >> >>> Best regards, >>> Martin >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Dienstag, 24. Mai 2016 12:04 >>> To: Doerr, Martin ; Andrew Haley >>> (aph at redhat.com) >>> Cc: Hiroshi H Horii ; Tim Ellison >>> ; ppc-aix-port-dev at openjdk.java.net; >>> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>> >>> On 24/05/2016 7:37 PM, Doerr, Martin wrote: >>>> Hi David and Andrew, >>>> >>>> sorry for missing this one. There were too many emails. >>>> >>>> After moving the jint version as well, there was not much left of >>>> atomic.cpp. >>>> I think it doesn't make any sense to keep a couple of trivial >>>> functions in the cpp file. >>>> Therefore, I have removed atomic.cpp and moved the remaining small >>>> functions into the inline file. >>> Sorry I don't understand why the jbyte cmpxchg_general was moved to the >>> .inline.hpp file - it seems far too big to be inlined. >>> >>> David >>> >>>> Webrev is here: >>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/ >>>> >>>> Best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>> Sent: Dienstag, 24. Mai 2016 05:50 >>>> To: Doerr, Martin >>>> Cc: Hiroshi H Horii ; Tim Ellison >>>> ; ppc-aix-port-dev at openjdk.java.net; >>>> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>>> >>>> Hi Martin, >>>> >>>> On 23/05/2016 7:29 PM, Doerr, Martin wrote: >>>>> Hi David, >>>>> >>>>> here's the new webrev: >>>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/ >>>> There seems to be some confusion. You've moved the jbyte >>>> Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, >>>> but >>>> the comments from Andrew and Kim were about moving the unsigned >>>> Atomic::cmpxchg version. ?? >>>> >>>> Aside: In the changeset contributor's have to be specified by "email >>>> address" or "name ", OpenJDK user names are not >>>> accepted. >>>> I think Andrew should also be listed there for the Aarch64 component. >>>> >>>> Thanks, >>>> David >>>> >>>>> Btw.: The jbyte version of cmpxchg can be implemented on aarch >>>>> like on ppc where we emulate the byte access by a 4 byte access >>>>> (lwarx/stwcx). But that should better be done in a separate change. >>>>> >>>>> Thanks for your time and your support. >>>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> -----Original Message----- >>>>> From: David Holmes [mailto:david.holmes at oracle.com] >>>>> Sent: Samstag, 21. Mai 2016 01:10 >>>>> To: Doerr, Martin >>>>> Cc: Hiroshi H Horii ; Tim Ellison >>>>> ; ppc-aix-port-dev at openjdk.java.net; >>>>> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net >>>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg >>>>> >>>>> Hi Martin, >>>>> >>>>> Are you in a position to make the change now suggested by both Kim >>>>> and >>>>> Andrew? Can you also include the Aarch64 code that Andrew provided: >>>>> >>>>> http://cr.openjdk.java.net/~aph/8154736 >>>>> >>>>> I'd like to get this finalized so it is ready to push as soon as the >>>>> process allows it to. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> On 20/05/2016 8:03 AM, Kim Barrett wrote: >>>>>>> On May 18, 2016, at 6:12 AM, Doerr, Martin >>>>>>> wrote: >>>>>>> >>>>>>> Hi Kim, >>>>>>> >>>>>>> thank you very much for the detailed review. >>>>>>> >>>>>>> I agree with your comments and I have made all your requested >>>>>>> changes here: >>>>>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ >>>>>>> >>>>>>> >>>>>>> It's correct that the change changes the semantics of the >>>>>>> conservative cmpxchg. In case of failure, we also execute the >>>>>>> sync instruction, now. >>>>>>> Advantage is that the new implementation is maximum conservative >>>>>>> by default. I think this makes sense as long as the semantics of >>>>>>> the hotspot C++ cmpxchg are not clearly specified. >>>>>>> >>>>>>> For performance optimization, we should better use (or introduce >>>>>>> additional) enum values. >>>>>> ------------------------------------------------------------------------------ >>>>>> >>>>>> There doesn't seem to have been any change for this earlier comment. >>>>>> >>>>>> src/share/vm/runtime/atomic.cpp >>>>>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value, >>>>>> 60 volatile unsigned int* dest, >>>>>> unsigned int compare_value, >>>>>> 61 cmpxchg_memory_order order) { >>>>>> >>>>>> I'm surprised this was ever out-of-line. But with this change it's >>>>>> quite bad to be out-of-line, as that's going to kill the constant >>>>>> propogation of the order value. >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> >>>>>> >>>>>> Other than that, looks good. >>>>>> >>>>>> >>>>>> >>>>>> > From kim.barrett at oracle.com Thu May 26 00:04:53 2016 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 25 May 2016 20:04:53 -0400 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> <9cff0b75-e234-e789-910d-d86154bba834@oracle.com> Message-ID: > On May 24, 2016, at 5:37 AM, Doerr, Martin wrote: > > Hi David and Andrew, > > sorry for missing this one. There were too many emails. > > After moving the jint version as well, there was not much left of atomic.cpp. > I think it doesn't make any sense to keep a couple of trivial functions in the cpp file. > Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file. > > Webrev is here: > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/ ------------------------------------------------------------------------------ 100 inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte *dest, jbyte comparand, cmpxchg_memory_order order) The addition of the order option makes it a bit more obvious that this does not, and never has, executed any fences in the immediate failure case, e.g. when 111 while (cur_as_bytes[offset] == comparand) { is false on the first iteration. This seems like a bug. Assuming it is, I'm not sure whether this should be dealt with as part of this changeset, or moved to a separate bug for this (pre-existing) issue. I think only ARM targets (and zero?) are lacking specialized cmpxchg on bytes and so use this version? Sorry I didn't notice this previously. ------------------------------------------------------------------------------ Other than that and the already mentioned (pre-existing) Atomic::add for jlong return value problem, this looks good. From david.holmes at oracle.com Thu May 26 01:09:17 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 26 May 2016 11:09:17 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> <9cff0b75-e234-e789-910d-d86154bba834@oracle.com> Message-ID: Hi Kim, On 26/05/2016 10:04 AM, Kim Barrett wrote: >> On May 24, 2016, at 5:37 AM, Doerr, Martin wrote: >> >> Hi David and Andrew, >> >> sorry for missing this one. There were too many emails. >> >> After moving the jint version as well, there was not much left of atomic.cpp. >> I think it doesn't make any sense to keep a couple of trivial functions in the cpp file. >> Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file. >> >> Webrev is here: >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/ > > ------------------------------------------------------------------------------ > 100 inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte *dest, jbyte comparand, cmpxchg_memory_order order) > > The addition of the order option makes it a bit more obvious that this > does not, and never has, executed any fences in the immediate failure > case, e.g. when > > 111 while (cur_as_bytes[offset] == comparand) { > > is false on the first iteration. This seems like a bug. Assuming it > is, I'm not sure whether this should be dealt with as part of this > changeset, or moved to a separate bug for this (pre-existing) issue. > I think only ARM targets (and zero?) are lacking specialized cmpxchg > on bytes and so use this version? I'll file a separate bug for that. Thanks, David ------ > Sorry I didn't notice this previously. > > ------------------------------------------------------------------------------ > > Other than that and the already mentioned (pre-existing) Atomic::add > for jlong return value problem, this looks good. > From david.holmes at oracle.com Thu May 26 01:29:58 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 26 May 2016 11:29:58 +1000 Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg In-Reply-To: References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com> <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com> <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com> <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com> <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap> <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap> <267a624c-626f-4238-0166-baa14ff4b412@oracle.com> <9cff0b75-e234-e789-910d-d86154bba834@oracle.com> Message-ID: <17598d6f-6729-65c1-0af0-60ba93b4c003@oracle.com> Filed: https://bugs.openjdk.java.net/browse/JDK-8157904 Atomic::cmpxchg_general for jbyte is missing a fence on initial failure David On 26/05/2016 11:09 AM, David Holmes wrote: > Hi Kim, > > On 26/05/2016 10:04 AM, Kim Barrett wrote: >>> On May 24, 2016, at 5:37 AM, Doerr, Martin wrote: >>> >>> Hi David and Andrew, >>> >>> sorry for missing this one. There were too many emails. >>> >>> After moving the jint version as well, there was not much left of >>> atomic.cpp. >>> I think it doesn't make any sense to keep a couple of trivial >>> functions in the cpp file. >>> Therefore, I have removed atomic.cpp and moved the remaining small >>> functions into the inline file. >>> >>> Webrev is here: >>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/ >> >> ------------------------------------------------------------------------------ >> >> 100 inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte >> *dest, jbyte comparand, cmpxchg_memory_order order) >> >> The addition of the order option makes it a bit more obvious that this >> does not, and never has, executed any fences in the immediate failure >> case, e.g. when >> >> 111 while (cur_as_bytes[offset] == comparand) { >> >> is false on the first iteration. This seems like a bug. Assuming it >> is, I'm not sure whether this should be dealt with as part of this >> changeset, or moved to a separate bug for this (pre-existing) issue. >> I think only ARM targets (and zero?) are lacking specialized cmpxchg >> on bytes and so use this version? > > I'll file a separate bug for that. > > Thanks, > David > ------ > > >> Sorry I didn't notice this previously. >> >> ------------------------------------------------------------------------------ >> >> >> Other than that and the already mentioned (pre-existing) Atomic::add >> for jlong return value problem, this looks good. >> From HORIE at jp.ibm.com Mon May 30 01:42:31 2016 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Mon, 30 May 2016 10:42:31 +0900 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: References: <56FEDBB3.5030106@linux.vnet.ibm.com><57339EE1.2040500@linux.vnet.ibm.com> <573A034C.9060602@br.ibm.com> Message-ID: <201605300143.u4U1cXVI000652@mx0a-001b2d01.pphosted.com> Dear Breno, Gustavo, Voker, and Martin, I am a cowoker of Miki. I implemented VSX disjoint arraycopy functions for byte, int, and long. Although Miki had implemented VSX disjoint long arraycopy, we found a couple of bugs so I fixed it. Would you please review them? Micro benchmarks for byte and int are as follows. (The one for long is the same as Miki's, which was attached before by Miki) (See attached file: ArrayCopyTest_byte.java)(See attached file: ArrayCopyTest_int.java) Results are as follows. (For the short result, I used Gustavo's code.) (See attached file: result_disjoint-arraycopy_vsx-max.jpg) Patch for Java8: (See attached file: hotspot_jdk8.diff) Patch for Java9: (See attached file: hotspot_jdk9.diff) Best regards, -- Michihiro Horie, IBM Research - Tokyo From: Miki M Enoki/Japan/IBM To: Breno Leitao Cc: Gustavo Romero , "hotspot-dev at openjdk.java.net" , "Doerr, Martin" , "ppc-aix-port-dev at openjdk.java.net" , "Simonis, Volker" , Volker Simonis Date: 2016/05/25 00:15 Subject: Re: PPC64 VSX load/store instructions in stubs Hi Breno, Thank you for your reply. >The same mechanism could be used to copy arrays of short elements, as Gustavo was >working on. Do you agree? I think the mechanism is different with type (byte, short, int, long...). Gustavo will apply a pach with VSX for short array copy, so it would be reasonable to use VSX instruction for long array copy, too. My coworker is also creating byte and int arraycopy with VSX. He will post an email to this mailing list. I appreciate it if our patch for byte, int and long copy is applied to OpenJDK. Best regards, Miki From: Breno Leitao To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" , Cc: Gustavo Romero , Volker Simonis , "Simonis, Volker" , "ppc-aix-port-dev at openjdk.java.net" , "hotspot-dev at openjdk.java.net" Date: 2016/05/17 02:29 Subject: Re: PPC64 VSX load/store instructions in stubs Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote: > I also implemented VSX disjoint long arraycopy. > I appreciate it if it is applied to OpenJDK, too. Thanks for the summarized information, this is helpful. Based on your plot, I understand we can split the whole scenario in two: * Array size smaller than 4k, and then use VSX instructions to perform copy * Array size bigger than 4k, and then use VMX instructions to perform copy The same mechanism could be used to copy arrays of short elements, as Gustavo was working on. Do you agree? That said, I understand that a new patch should be generated that contemplates both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence a webrev should be generated mapping to bug id https://bugs.openjdk.java.net/browse/JDK-8154156 If you need any help on the webrev[1] creation and hosting, Gustavo might help, since he did this process already. [1] http://openjdk.java.net/guide/webrevHelp.html -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ArrayCopyTest_byte.java Type: application/octet-stream Size: 219239 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ArrayCopyTest_int.java Type: application/octet-stream Size: 14652 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: result_disjoint-arraycopy_vsx-max.jpg Type: image/jpeg Size: 30481 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hotspot_jdk8.diff Type: application/octet-stream Size: 10689 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hotspot_jdk9.diff Type: application/octet-stream Size: 9729 bytes Desc: not available URL: From sgehwolf at redhat.com Mon May 30 08:55:50 2016 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Mon, 30 May 2016 10:55:50 +0200 Subject: RFR: JDK-8157336: Generation of classlists at build time should be configurable In-Reply-To: <57456E94.9030200@oracle.com> References: <574456F8.7000506@oracle.com> <097cad5f-c4a9-6f79-3ac3-717d512e81ba@oracle.com> <57456E94.9030200@oracle.com> Message-ID: <1464598550.3804.14.camel@redhat.com> cc'ing PPC folks for input. On Wed, 2016-05-25 at 11:21 +0200, Erik Joelsson wrote: > Thanks! > > When building zero, the JVM_VARIANT is "zero" so this addresses that? > problem automatically too. I have verified that. > > There are some other peculiarities with zero in that it ends up in the? > "server" directory so I understand that it's confusing. > > /Erik > > On 2016-05-24 22:35, David Holmes wrote: > > > > Hi Erik, > > > > On 24/05/2016 11:28 PM, Erik Joelsson wrote: > > > > > > Generating a classlist at build time is not supported on all JVM > > > configurations. This patch adds a configure flag to control this build > > > step: --disable-generate-classlist. The default is to be enabled if > > > either a client or server JVM Variant is being built. > > > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8157336 > > > Webrev:? > > > http://cr.openjdk.java.net/~erikj/8157336/webrev.top.01/index.html > > This looks okay to me. It addresses the "minimal VM only" problem? > > automatically which is good. I'm unclear if the Zero case is? > > automatically handled as I'm not sure how the VM variants are? > > expressed - but having the option is good enough I think. The Zero case is handled as noted in another branch of this thread, but I wonder if this works for "server" JVMs on PPC64? Is -Xshare:dump working on JDK 9 and PPC64? AFAIR, in JDK 8 on PPC64 it was not supported at some point. Cheers, Severin From martin.doerr at sap.com Mon May 30 09:56:25 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 30 May 2016 09:56:25 +0000 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: <201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com> References: <56FEDBB3.5030106@linux.vnet.ibm.com><57339EE1.2040500@linux.vnet.ibm.com> <573A034C.9060602@br.ibm.com> <201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com> Message-ID: <4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap> Hi Michihiro, thanks for implementing the VSX versions. Gustavo's change "8154156: PPC64: improve array copy stubs by using vector instructions" is pushed into hs-comp. Your change needs to get adapted: - The vm_version and assembler parts are already there. - Vector-scalar load/store instructions use VectorSRegisters, now. The byte and int version look good to me. I think the long version should be implemented in a similar way: check for has_vsx() is necessary, the length comparison should be done inside of the block. Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Montag, 30. Mai 2016 03:43 To: Miki M Enoki Cc: Breno Leitao ; Gustavo Romero ; hotspot-dev at openjdk.java.net; Doerr, Martin ; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; Volker Simonis Subject: Re: PPC64 VSX load/store instructions in stubs Dear Breno, Gustavo, Voker, and Martin, I am a cowoker of Miki. I implemented VSX disjoint arraycopy functions for byte, int, and long. Although Miki had implemented VSX disjoint long arraycopy, we found a couple of bugs so I fixed it. Would you please review them? Micro benchmarks for byte and int are as follows. (The one for long is the same as Miki's, which was attached before by Miki) (See attached file: ArrayCopyTest_byte.java)(See attached file: ArrayCopyTest_int.java) Results are as follows. (For the short result, I used Gustavo's code.) (See attached file: result_disjoint-arraycopy_vsx-max.jpg) Patch for Java8: (See attached file: hotspot_jdk8.diff) Patch for Java9: (See attached file: hotspot_jdk9.diff) Best regards, -- Michihiro Horie, IBM Research - Tokyo [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply. From: Miki M Enoki/Japan/IBM To: Breno Leitao > Cc: Gustavo Romero >, "hotspot-dev at openjdk.java.net" >, "Doerr, Martin" >, "ppc-aix-port-dev at openjdk.java.net" >, "Simonis, Volker" >, Volker Simonis > Date: 2016/05/25 00:15 Subject: Re: PPC64 VSX load/store instructions in stubs ________________________________ Hi Breno, Thank you for your reply. >The same mechanism could be used to copy arrays of short elements, as Gustavo was >working on. Do you agree? I think the mechanism is different with type (byte, short, int, long...). Gustavo will apply a pach with VSX for short array copy, so it would be reasonable to use VSX instruction for long array copy, too. My coworker is also creating byte and int arraycopy with VSX. He will post an email to this mailing list. I appreciate it if our patch for byte, int and long copy is applied to OpenJDK. Best regards, Miki [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote: From: Breno Leitao > To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" >, Cc: Gustavo Romero >, Volker Simonis >, "Simonis, Volker" >, "ppc-aix-port-dev at openjdk.java.net" >, "hotspot-dev at openjdk.java.net" > Date: 2016/05/17 02:29 Subject: Re: PPC64 VSX load/store instructions in stubs ________________________________ Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote: > I also implemented VSX disjoint long arraycopy. > I appreciate it if it is applied to OpenJDK, too. Thanks for the summarized information, this is helpful. Based on your plot, I understand we can split the whole scenario in two: * Array size smaller than 4k, and then use VSX instructions to perform copy * Array size bigger than 4k, and then use VMX instructions to perform copy The same mechanism could be used to copy arrays of short elements, as Gustavo was working on. Do you agree? That said, I understand that a new patch should be generated that contemplates both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence a webrev should be generated mapping to bug id https://bugs.openjdk.java.net/browse/JDK-8154156 If you need any help on the webrev[1] creation and hosting, Gustavo might help, since he did this process already. [1] http://openjdk.java.net/guide/webrevHelp.html -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From gromero at linux.vnet.ibm.com Tue May 31 01:31:10 2016 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 30 May 2016 22:31:10 -0300 Subject: SIGILL crashes JVM on PPC64 LE In-Reply-To: References: <5733B30D.6010201@linux.vnet.ibm.com> Message-ID: <201605310131.u4V1T5Gm040249@mx0a-001b2d01.pphosted.com> Hi Volker The following test case has been isolated by Hiroshi Horii and generates the illegal instruction, crashing the JVM on PPC64 LE: UnalignedUnsafeAccess.java: http://hastebin.com/raw/uqegukific $ javac UnalignedUnsafeAccess.java $ java -Xcomp -Xbatch UnalignedUnsafeAccess The issue can be reproduced on OpenJDK 8 downstream, OpenJDK 8, and OpenJDK 9 - hs_err logs: OpenJDK 9, tag 0be6f4f5d186 jdk-9+120: http://hastebin.com/raw/ecuhukutur OpenJDK 8, tag 5aaa43d91c73 tip: http://hastebin.com/raw/ipohoyafos OpenJDK 8 downstream: Ubuntu 16.04 LTS build 1.8.0_91-8u91-b14-0ubuntu4~16.04.1-b14 http://hastebin.com/raw/yetizebofo RHEL 7.2: build 1.8.0_91-b14 http://hastebin.com/raw/irequfawaw The crash happens when an illegal instruction - 0xea2f0013 - is executed. The backtrace shows: Stack: [0x00003fff56030000,0x00003fff56430000], sp=0x00003fff5642b8d0, free space=4078k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x162104] loadI2LNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x194 V [libjvm.so+0x8ece28] Compile::fill_buffer(CodeBuffer*, unsigned int*)+0x4e8 V [libjvm.so+0x368e08] Compile::Code_Gen()+0x3c8 V [libjvm.so+0x369e04] Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool)+0xf64 V [libjvm.so+0x271380] C2Compiler::compile_method(ciEnv*, ciMethod*, int)+0x1f0 V [libjvm.so+0x3785a4] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xd54 V [libjvm.so+0x379dc8] CompileBroker::compiler_thread_loop()+0x488 V [libjvm.so+0xa5de90] compiler_thread_entry(JavaThread*, Thread*)+0x20 V [libjvm.so+0xa690c8] JavaThread::thread_main_inner()+0x178 V [libjvm.so+0x8c8c10] java_start(Thread*)+0x170 C [libpthread.so.0+0x833c] start_thread+0xfc C [libc.so.6+0x12b014] clone+0xe4 loadI2LNode class is generated according to the following ADL code in ppc.ad file: instruct loadI2L(iRegLdst dst, memory mem) %{ match(Set dst (ConvI2L (LoadI mem))); predicate(_kids[0]->_leaf->as_Load()->is_unordered()); ins_cost(MEMORY_REF_COST); format %{ "LWA $dst, $mem \t// loadI2L" %} size(4); ins_encode %{ // TODO: PPC port $archOpcode(ppc64Opcode_lwa); int Idisp = $mem$$disp + frame_slots_bias($mem$$base, ra_); __ lwa($dst$$Register, Idisp, $mem$$base$$Register); %} ins_pipe(pipe_class_memory); %} So the generated illegal instruction comes from: lwa 17,17,15 (DS-form: lwa RT, DS, RA) As DS field must always be 4-byte aligned (i.e. DS field is always concatenated with 0b00), 17 as DS (middle 17 value) is illegal, generating the illegal instruction in question: 11101010000000000000000000000010: LWA 00000010001000000000000000000000: 17 00000000000000000000000000010001: 17 00000000000011110000000000000000: 15 -------------------------------- 11101010001011110000000000010011: 0xEA2F0013 => Illegal instruction The following change is proposed to fix the issue and deals with the unaligned displacements: OpenJDK 9 webrev: 81.de.7a9f.ip4.static.sl-reverse.com./illegal/9 OpenJDK 8 webrev: 81.de.7a9f.ip4.static.sl-reverse.com./illegal/8 Could we open a JIRA ticket regarding this issue in order to include it in the webrev? Thank you! Best regards, Gustavo On 12-05-2016 09:39, Volker Simonis wrote: > And I forgot to mention: I've checked and we don't emit vsel > instructions in jdk8 on ppc. So it must be a coincidence that changing > the endianess of the offending instruction yields a valid 'vsel' > instruction. > > > > On Thu, May 12, 2016 at 2:26 PM, Volker Simonis > wrote: >> Hi Gustavo, >> >> thanks for the bug report. The hs_err file you provided indicates that >> this crash happened with Ubuntu's openjdk 8 version. Can you still >> reproduce this with the the newest jdk9 builds? >> >> Also, I can see from the hs_err file that the crash happened in the C2 >> compiled method java.util.TimSort.countRunAndMakeAscending which >> doesn't seem to be related to nio and unsafe. >> >> Ideally, you could post an easy test case to reproduce the problem. If >> that's not possible, it would be helpful if you could post the output >> of a failing run with >> "-XX:CompileCommand=print,java.util.TimSort::countRunAndMakeAscending >> -XX:CompileCommand=option,java.util.TimSort::countRunAndMakeAscending,PrintOptoAssembly". >> In order to get the disassembly output for compiled methods you have >> to build the hsdis library from hotspot/src/share/tools/hsdis (it has >> a README with build instructions). >> >> Regards, >> Volker >> >> >> On Thu, May 12, 2016 at 12:32 AM, Gustavo Romero >> wrote: >>> Hi >>> >>> I'm getting a nasty SIGILL that crashes the JVM on PPC64 LE. >>> >>> hs_err log: >>> http://hastebin.com/raw/fovagunaci >>> >>> The application employs methods from both java.nio.ByteBuffer and >>> sun.misc.Unsafe classes in order to write and read from an allocated buffer. >>> >>> A interesting thing is that after debugging the instruction that caused the >>> said SIGILL: >>> >>> 0x3fff902839a4: cmpwi cr6,r17,0 >>> 0x3fff902839a8: beq cr6,0x3fff90283ae4 >>> 0x3fff902839ac: .long 0xea2f0013 <============ illegal instruction >>> 0x3fff902839b0: add r15,r15,r17 >>> 0x3fff902839b4: add r14,r17,r14 >>> >>> I found that when its endianness is changed it turns out to be a valid >>> instruction: vsel v24,v0,v5,v31 >>> >>> However, I'm still unable to determine if it's an application issue, something >>> with JVM unsafe interface code, or something else. >>> >>> Any clue on how to narrow down this SIGILL? >>> >>> Thank you! >>> >>> Regards, >>> Gustavo >>> > From gromero at linux.vnet.ibm.com Tue May 31 01:49:34 2016 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Mon, 30 May 2016 22:49:34 -0300 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: <4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap> References: <56FEDBB3.5030106@linux.vnet.ibm.com> <57339EE1.2040500@linux.vnet.ibm.com> <573A034C.9060602@br.ibm.com> <201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com> <4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap> Message-ID: <201605310149.u4V1mmv5012147@mx0a-001b2d01.pphosted.com> Hi Michihiro Thanks a lot for providing a result summary for byte, short, int, and long. Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate the ABI, so you can use them as Martin suggested. Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi) for byte, short, int, and long webrevs or open a new one? Thank you. Best regards, Gustavo On 30-05-2016 06:56, Doerr, Martin wrote: > Hi Michihiro, > > thanks for implementing the VSX versions. > > Gustavo's change "8154156: PPC64: improve array copy stubs by using vector instructions" is pushed into hs-comp. > Your change needs to get adapted: > > - The vm_version and assembler parts are already there. > > - Vector-scalar load/store instructions use VectorSRegisters, now. > > The byte and int version look good to me. I think the long version should be implemented in a similar way: check for has_vsx() is necessary, the length comparison should be done inside of the block. > > Best regards, > Martin > > > From: Michihiro Horie [mailto:HORIE at jp.ibm.com] > Sent: Montag, 30. Mai 2016 03:43 > To: Miki M Enoki > Cc: Breno Leitao ; Gustavo Romero ; hotspot-dev at openjdk.java.net; Doerr, Martin ; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; Volker Simonis > Subject: Re: PPC64 VSX load/store instructions in stubs > > > Dear Breno, Gustavo, Voker, and Martin, > I am a cowoker of Miki. > > I implemented VSX disjoint arraycopy functions for byte, int, and long. Although Miki had implemented VSX disjoint long arraycopy, we found a couple of bugs so I fixed it. Would you please review them? > > Micro benchmarks for byte and int are as follows. (The one for long is the same as Miki's, which was attached before by Miki) > (See attached file: ArrayCopyTest_byte.java)(See attached file: ArrayCopyTest_int.java) > > Results are as follows. (For the short result, I used Gustavo's code.) > (See attached file: result_disjoint-arraycopy_vsx-max.jpg) > > Patch for Java8: > (See attached file: hotspot_jdk8.diff) > > Patch for Java9: > (See attached file: hotspot_jdk9.diff) > > Best regards, > -- > Michihiro Horie, > IBM Research - Tokyo > > [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply. > > From: Miki M Enoki/Japan/IBM > To: Breno Leitao > > Cc: Gustavo Romero >, "hotspot-dev at openjdk.java.net" >, "Doerr, Martin" >, "ppc-aix-port-dev at openjdk.java.net" >, "Simonis, Volker" >, Volker Simonis > > Date: 2016/05/25 00:15 > Subject: Re: PPC64 VSX load/store instructions in stubs > > ________________________________ > > > Hi Breno, > > Thank you for your reply. > >> The same mechanism could be used to copy arrays of short elements, as Gustavo was >> working on. Do you agree? > > I think the mechanism is different with type (byte, short, int, long...). > Gustavo will apply a pach with VSX for short array copy, so it would be reasonable to use VSX instruction for long array copy, too. > > My coworker is also creating byte and int arraycopy with VSX. He will post an email to this mailing list. > I appreciate it if our patch for byte, int and long copy is applied to OpenJDK. > > > Best regards, > Miki > > > > > [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote: > > From: Breno Leitao > > To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" >, > Cc: Gustavo Romero >, Volker Simonis >, "Simonis, Volker" >, "ppc-aix-port-dev at openjdk.java.net" >, "hotspot-dev at openjdk.java.net" > > Date: 2016/05/17 02:29 > Subject: Re: PPC64 VSX load/store instructions in stubs > ________________________________ > > > > Hi Miki, > > On 05/16/2016 02:53 AM, Miki M Enoki wrote: >> I also implemented VSX disjoint long arraycopy. >> I appreciate it if it is applied to OpenJDK, too. > > Thanks for the summarized information, this is helpful. Based on your plot, I > understand we can split the whole scenario in two: > > * Array size smaller than 4k, and then use VSX instructions to perform copy > * Array size bigger than 4k, and then use VMX instructions to perform copy > > The same mechanism could be used to copy arrays of short elements, as Gustavo was > working on. Do you agree? > > That said, I understand that a new patch should be generated that contemplates > both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence > a webrev should be generated mapping to bug id > https://bugs.openjdk.java.net/browse/JDK-8154156 > > If you need any help on the webrev[1] creation and hosting, Gustavo might help, > since he did this process already. > > [1] http://openjdk.java.net/guide/webrevHelp.html > > From goetz.lindenmaier at sap.com Tue May 31 07:24:58 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 31 May 2016 07:24:58 +0000 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: <201605310149.u4V1mmv5012147@mx0a-001b2d01.pphosted.com> References: <56FEDBB3.5030106@linux.vnet.ibm.com> <57339EE1.2040500@linux.vnet.ibm.com> <573A034C.9060602@br.ibm.com> <201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com> <4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap> <201605310149.u4V1mmv5012147@mx0a-001b2d01.pphosted.com> Message-ID: Hi Gustavo, you need a new bugId, as the change with the other one has been pushed by Martin. You can't have the same bugId on two different changes. http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/f8f067457966 Best regards, Goetz. > -----Original Message----- > From: ppc-aix-port-dev [mailto:ppc-aix-port-dev- > bounces at openjdk.java.net] On Behalf Of Gustavo Romero > Sent: Dienstag, 31. Mai 2016 03:50 > To: Doerr, Martin ; Michihiro Horie > ; Miki M Enoki > Cc: Simonis, Volker ; ppc-aix-port- > dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Breno Leitao > > Subject: Re: PPC64 VSX load/store instructions in stubs > > Hi Michihiro > > Thanks a lot for providing a result summary for byte, short, int, and > long. > > Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate > the ABI, so you can use them as Martin suggested. > > Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi) > for byte, short, int, and long webrevs or open a new one? > > Thank you. > > Best regards, > Gustavo > > On 30-05-2016 06:56, Doerr, Martin wrote: > > Hi Michihiro, > > > > thanks for implementing the VSX versions. > > > > Gustavo's change "8154156: PPC64: improve array copy stubs by using > vector instructions" is pushed into hs-comp. > > Your change needs to get adapted: > > > > - The vm_version and assembler parts are already there. > > > > - Vector-scalar load/store instructions use VectorSRegisters, now. > > > > The byte and int version look good to me. I think the long version should be > implemented in a similar way: check for has_vsx() is necessary, the length > comparison should be done inside of the block. > > > > Best regards, > > Martin > > > > > > From: Michihiro Horie [mailto:HORIE at jp.ibm.com] > > Sent: Montag, 30. Mai 2016 03:43 > > To: Miki M Enoki > > Cc: Breno Leitao ; Gustavo Romero > ; hotspot-dev at openjdk.java.net; Doerr, > Martin ; ppc-aix-port-dev at openjdk.java.net; > Simonis, Volker ; Volker Simonis > > > Subject: Re: PPC64 VSX load/store instructions in stubs > > > > > > Dear Breno, Gustavo, Voker, and Martin, > > I am a cowoker of Miki. > > > > I implemented VSX disjoint arraycopy functions for byte, int, and long. > Although Miki had implemented VSX disjoint long arraycopy, we found a > couple of bugs so I fixed it. Would you please review them? > > > > Micro benchmarks for byte and int are as follows. (The one for long is the > same as Miki's, which was attached before by Miki) > > (See attached file: ArrayCopyTest_byte.java)(See attached file: > ArrayCopyTest_int.java) > > > > Results are as follows. (For the short result, I used Gustavo's code.) > > (See attached file: result_disjoint-arraycopy_vsx-max.jpg) > > > > Patch for Java8: > > (See attached file: hotspot_jdk8.diff) > > > > Patch for Java9: > > (See attached file: hotspot_jdk9.diff) > > > > Best regards, > > -- > > Michihiro Horie, > > IBM Research - Tokyo > > > > [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno, > Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno, > Thank you for your reply. > > > > From: Miki M Enoki/Japan/IBM > > To: Breno Leitao > > > Cc: Gustavo Romero > >, > "hotspot-dev at openjdk.java.net" > >, > "Doerr, Martin" >, > "ppc-aix-port-dev at openjdk.java.net dev at openjdk.java.net>" aix-port-dev at openjdk.java.net>>, "Simonis, Volker" > >, Volker > Simonis > > > Date: 2016/05/25 00:15 > > Subject: Re: PPC64 VSX load/store instructions in stubs > > > > ________________________________ > > > > > > Hi Breno, > > > > Thank you for your reply. > > > >> The same mechanism could be used to copy arrays of short elements, as > Gustavo was > >> working on. Do you agree? > > > > I think the mechanism is different with type (byte, short, int, long...). > > Gustavo will apply a pach with VSX for short array copy, so it would be > reasonable to use VSX instruction for long array copy, too. > > > > My coworker is also creating byte and int arraycopy with VSX. He will post > an email to this mailing list. > > I appreciate it if our patch for byte, int and long copy is applied to OpenJDK. > > > > > > Best regards, > > Miki > > > > > > > > > > [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On > 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17 > 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote: > > > > From: Breno Leitao > > > To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" > >, > > Cc: Gustavo Romero > >, > Volker Simonis > >, "Simonis, > Volker" >, "ppc- > aix-port-dev at openjdk.java.net dev at openjdk.java.net>" aix-port-dev at openjdk.java.net>>, "hotspot- > dev at openjdk.java.net" dev at openjdk.java.net> > > Date: 2016/05/17 02:29 > > Subject: Re: PPC64 VSX load/store instructions in stubs > > ________________________________ > > > > > > > > Hi Miki, > > > > On 05/16/2016 02:53 AM, Miki M Enoki wrote: > >> I also implemented VSX disjoint long arraycopy. > >> I appreciate it if it is applied to OpenJDK, too. > > > > Thanks for the summarized information, this is helpful. Based on your plot, > I > > understand we can split the whole scenario in two: > > > > * Array size smaller than 4k, and then use VSX instructions to perform copy > > * Array size bigger than 4k, and then use VMX instructions to perform copy > > > > The same mechanism could be used to copy arrays of short elements, as > Gustavo was > > working on. Do you agree? > > > > That said, I understand that a new patch should be generated that > contemplates > > both cases on a single patch, ready to be applied on OpenJDK 9 source > code. Hence > > a webrev should be generated mapping to bug id > > https://bugs.openjdk.java.net/browse/JDK-8154156 > > > > If you need any help on the webrev[1] creation and hosting, Gustavo might > help, > > since he did this process already. > > > > [1] http://openjdk.java.net/guide/webrevHelp.html > > > > From martin.doerr at sap.com Tue May 31 10:17:28 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 31 May 2016 10:17:28 +0000 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: <201605310149.u4V1mlAA012138@mx0a-001b2d01.pphosted.com> References: <56FEDBB3.5030106@linux.vnet.ibm.com> <57339EE1.2040500@linux.vnet.ibm.com> <573A034C.9060602@br.ibm.com> <201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com> <4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap> <201605310149.u4V1mlAA012138@mx0a-001b2d01.pphosted.com> Message-ID: Hello everybody, I have created a new bug: JDK-8158232 We will need a webrev and a request for review mail to hotspot-dev: "RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions" Thanks and best regards, Martin -----Original Message----- From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] Sent: Dienstag, 31. Mai 2016 03:50 To: Doerr, Martin ; Michihiro Horie ; Miki M Enoki Cc: Breno Leitao ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; Volker Simonis ; Breno Leitao Subject: Re: PPC64 VSX load/store instructions in stubs Hi Michihiro Thanks a lot for providing a result summary for byte, short, int, and long. Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate the ABI, so you can use them as Martin suggested. Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi) for byte, short, int, and long webrevs or open a new one? Thank you. Best regards, Gustavo On 30-05-2016 06:56, Doerr, Martin wrote: > Hi Michihiro, > > thanks for implementing the VSX versions. > > Gustavo's change "8154156: PPC64: improve array copy stubs by using vector instructions" is pushed into hs-comp. > Your change needs to get adapted: > > - The vm_version and assembler parts are already there. > > - Vector-scalar load/store instructions use VectorSRegisters, now. > > The byte and int version look good to me. I think the long version should be implemented in a similar way: check for has_vsx() is necessary, the length comparison should be done inside of the block. > > Best regards, > Martin > > > From: Michihiro Horie [mailto:HORIE at jp.ibm.com] > Sent: Montag, 30. Mai 2016 03:43 > To: Miki M Enoki > Cc: Breno Leitao ; Gustavo Romero ; hotspot-dev at openjdk.java.net; Doerr, Martin ; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; Volker Simonis > Subject: Re: PPC64 VSX load/store instructions in stubs > > > Dear Breno, Gustavo, Voker, and Martin, > I am a cowoker of Miki. > > I implemented VSX disjoint arraycopy functions for byte, int, and long. Although Miki had implemented VSX disjoint long arraycopy, we found a couple of bugs so I fixed it. Would you please review them? > > Micro benchmarks for byte and int are as follows. (The one for long is the same as Miki's, which was attached before by Miki) > (See attached file: ArrayCopyTest_byte.java)(See attached file: ArrayCopyTest_int.java) > > Results are as follows. (For the short result, I used Gustavo's code.) > (See attached file: result_disjoint-arraycopy_vsx-max.jpg) > > Patch for Java8: > (See attached file: hotspot_jdk8.diff) > > Patch for Java9: > (See attached file: hotspot_jdk9.diff) > > Best regards, > -- > Michihiro Horie, > IBM Research - Tokyo > > [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply. > > From: Miki M Enoki/Japan/IBM > To: Breno Leitao > > Cc: Gustavo Romero >, "hotspot-dev at openjdk.java.net" >, "Doerr, Martin" >, "ppc-aix-port-dev at openjdk.java.net" >, "Simonis, Volker" >, Volker Simonis > > Date: 2016/05/25 00:15 > Subject: Re: PPC64 VSX load/store instructions in stubs > > ________________________________ > > > Hi Breno, > > Thank you for your reply. > >> The same mechanism could be used to copy arrays of short elements, as Gustavo was >> working on. Do you agree? > > I think the mechanism is different with type (byte, short, int, long...). > Gustavo will apply a pach with VSX for short array copy, so it would be reasonable to use VSX instruction for long array copy, too. > > My coworker is also creating byte and int arraycopy with VSX. He will post an email to this mailing list. > I appreciate it if our patch for byte, int and long copy is applied to OpenJDK. > > > Best regards, > Miki > > > > > [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote: > > From: Breno Leitao > > To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" >, > Cc: Gustavo Romero >, Volker Simonis >, "Simonis, Volker" >, "ppc-aix-port-dev at openjdk.java.net" >, "hotspot-dev at openjdk.java.net" > > Date: 2016/05/17 02:29 > Subject: Re: PPC64 VSX load/store instructions in stubs > ________________________________ > > > > Hi Miki, > > On 05/16/2016 02:53 AM, Miki M Enoki wrote: >> I also implemented VSX disjoint long arraycopy. >> I appreciate it if it is applied to OpenJDK, too. > > Thanks for the summarized information, this is helpful. Based on your plot, I > understand we can split the whole scenario in two: > > * Array size smaller than 4k, and then use VSX instructions to perform copy > * Array size bigger than 4k, and then use VMX instructions to perform copy > > The same mechanism could be used to copy arrays of short elements, as Gustavo was > working on. Do you agree? > > That said, I understand that a new patch should be generated that contemplates > both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence > a webrev should be generated mapping to bug id > https://bugs.openjdk.java.net/browse/JDK-8154156 > > If you need any help on the webrev[1] creation and hosting, Gustavo might help, > since he did this process already. > > [1] http://openjdk.java.net/guide/webrevHelp.html > > From HORIE at jp.ibm.com Tue May 31 11:50:37 2016 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Tue, 31 May 2016 20:50:37 +0900 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: References: <56FEDBB3.5030106@linux.vnet.ibm.com> <57339EE1.2040500@linux.vnet.ibm.com> <573A034C.9060602@br.ibm.com> <201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com> <4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap> <201605310149.u4V1mlAA012138@mx0a-001b2d01.pphosted.com> Message-ID: <201605311151.u4VBnUY9039336@mx0a-001b2d01.pphosted.com> Hi Martin, Gustavo, Thank you very much for your comments. I used VectorSRegisters, inserted an if-statement with has_vsx() in long arraycopy, and moved the length comparison inside the if-statement. Diff from jdk9 hs-comp hotspot: (See attached file: hotspot_jdk9_hscomp.diff) Best regards, -- Michihiro Horie, IBM Research - Tokyo From: "Doerr, Martin" To: Gustavo Romero , Michihiro Horie/Japan/IBM at IBMJP, Miki M Enoki/Japan/IBM at IBMJP Cc: Breno Leitao , "hotspot-dev at openjdk.java.net" , "ppc-aix-port-dev at openjdk.java.net" , "Simonis, Volker" , Volker Simonis , "Breno Leitao" Date: 2016/05/31 19:18 Subject: RE: PPC64 VSX load/store instructions in stubs Hello everybody, I have created a new bug: JDK-8158232 We will need a webrev and a request for review mail to hotspot-dev: "RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions" Thanks and best regards, Martin -----Original Message----- From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] Sent: Dienstag, 31. Mai 2016 03:50 To: Doerr, Martin ; Michihiro Horie ; Miki M Enoki Cc: Breno Leitao ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; Volker Simonis ; Breno Leitao Subject: Re: PPC64 VSX load/store instructions in stubs Hi Michihiro Thanks a lot for providing a result summary for byte, short, int, and long. Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate the ABI, so you can use them as Martin suggested. Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi) for byte, short, int, and long webrevs or open a new one? Thank you. Best regards, Gustavo On 30-05-2016 06:56, Doerr, Martin wrote: > Hi Michihiro, > > thanks for implementing the VSX versions. > > Gustavo's change "8154156: PPC64: improve array copy stubs by using vector instructions" is pushed into hs-comp. > Your change needs to get adapted: > > - The vm_version and assembler parts are already there. > > - Vector-scalar load/store instructions use VectorSRegisters, now. > > The byte and int version look good to me. I think the long version should be implemented in a similar way: check for has_vsx() is necessary, the length comparison should be done inside of the block. > > Best regards, > Martin > > > From: Michihiro Horie [mailto:HORIE at jp.ibm.com] > Sent: Montag, 30. Mai 2016 03:43 > To: Miki M Enoki > Cc: Breno Leitao ; Gustavo Romero ; hotspot-dev at openjdk.java.net; Doerr, Martin ; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; Volker Simonis > Subject: Re: PPC64 VSX load/store instructions in stubs > > > Dear Breno, Gustavo, Voker, and Martin, > I am a cowoker of Miki. > > I implemented VSX disjoint arraycopy functions for byte, int, and long. Although Miki had implemented VSX disjoint long arraycopy, we found a couple of bugs so I fixed it. Would you please review them? > > Micro benchmarks for byte and int are as follows. (The one for long is the same as Miki's, which was attached before by Miki) > (See attached file: ArrayCopyTest_byte.java)(See attached file: ArrayCopyTest_int.java) > > Results are as follows. (For the short result, I used Gustavo's code.) > (See attached file: result_disjoint-arraycopy_vsx-max.jpg) > > Patch for Java8: > (See attached file: hotspot_jdk8.diff) > > Patch for Java9: > (See attached file: hotspot_jdk9.diff) > > Best regards, > -- > Michihiro Horie, > IBM Research - Tokyo > > [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply. > > From: Miki M Enoki/Japan/IBM > To: Breno Leitao > > Cc: Gustavo Romero >, "hotspot-dev at openjdk.java.net< mailto:hotspot-dev at openjdk.java.net>" >, "Doerr, Martin" >, "ppc-aix-port-dev at openjdk.java.net" >, "Simonis, Volker" >, Volker Simonis > > Date: 2016/05/25 00:15 > Subject: Re: PPC64 VSX load/store instructions in stubs > > ________________________________ > > > Hi Breno, > > Thank you for your reply. > >> The same mechanism could be used to copy arrays of short elements, as Gustavo was >> working on. Do you agree? > > I think the mechanism is different with type (byte, short, int, long...). > Gustavo will apply a pach with VSX for short array copy, so it would be reasonable to use VSX instruction for long array copy, too. > > My coworker is also creating byte and int arraycopy with VSX. He will post an email to this mailing list. > I appreciate it if our patch for byte, int and long copy is applied to OpenJDK. > > > Best regards, > Miki > > > > > [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote: > > From: Breno Leitao > > To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" >, > Cc: Gustavo Romero >, Volker Simonis >, "Simonis, Volker" >, "ppc-aix-port-dev at openjdk.java.net" >, "hotspot-dev at openjdk.java.net< mailto:hotspot-dev at openjdk.java.net>" > > Date: 2016/05/17 02:29 > Subject: Re: PPC64 VSX load/store instructions in stubs > ________________________________ > > > > Hi Miki, > > On 05/16/2016 02:53 AM, Miki M Enoki wrote: >> I also implemented VSX disjoint long arraycopy. >> I appreciate it if it is applied to OpenJDK, too. > > Thanks for the summarized information, this is helpful. Based on your plot, I > understand we can split the whole scenario in two: > > * Array size smaller than 4k, and then use VSX instructions to perform copy > * Array size bigger than 4k, and then use VMX instructions to perform copy > > The same mechanism could be used to copy arrays of short elements, as Gustavo was > working on. Do you agree? > > That said, I understand that a new patch should be generated that contemplates > both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence > a webrev should be generated mapping to bug id > https://bugs.openjdk.java.net/browse/JDK-8154156 > > If you need any help on the webrev[1] creation and hosting, Gustavo might help, > since he did this process already. > > [1] http://openjdk.java.net/guide/webrevHelp.html > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hotspot_jdk9_hscomp.diff Type: application/octet-stream Size: 5729 bytes Desc: not available URL: From gromero at linux.vnet.ibm.com Tue May 31 12:53:44 2016 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 31 May 2016 09:53:44 -0300 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: References: <56FEDBB3.5030106@linux.vnet.ibm.com> <57339EE1.2040500@linux.vnet.ibm.com> <573A034C.9060602@br.ibm.com> <201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com> <4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap> <201605310149.u4V1mmv5012147@mx0a-001b2d01.pphosted.com> Message-ID: <201605311253.u4VCnhtE023173@mx0a-001b2d01.pphosted.com> Hi Goetz Got it. Thanks for clarifying it. Best regards, Gustavo On 31-05-2016 04:24, Lindenmaier, Goetz wrote: > Hi Gustavo, > > you need a new bugId, as the change with the other one has been > pushed by Martin. You can't have the same bugId on two different > changes. > http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/f8f067457966 > > Best regards, > Goetz. > > > >> -----Original Message----- >> From: ppc-aix-port-dev [mailto:ppc-aix-port-dev- >> bounces at openjdk.java.net] On Behalf Of Gustavo Romero >> Sent: Dienstag, 31. Mai 2016 03:50 >> To: Doerr, Martin ; Michihiro Horie >> ; Miki M Enoki >> Cc: Simonis, Volker ; ppc-aix-port- >> dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Breno Leitao >> >> Subject: Re: PPC64 VSX load/store instructions in stubs >> >> Hi Michihiro >> >> Thanks a lot for providing a result summary for byte, short, int, and >> long. >> >> Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate >> the ABI, so you can use them as Martin suggested. >> >> Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi) >> for byte, short, int, and long webrevs or open a new one? >> >> Thank you. >> >> Best regards, >> Gustavo >> >> On 30-05-2016 06:56, Doerr, Martin wrote: >>> Hi Michihiro, >>> >>> thanks for implementing the VSX versions. >>> >>> Gustavo's change "8154156: PPC64: improve array copy stubs by using >> vector instructions" is pushed into hs-comp. >>> Your change needs to get adapted: >>> >>> - The vm_version and assembler parts are already there. >>> >>> - Vector-scalar load/store instructions use VectorSRegisters, now. >>> >>> The byte and int version look good to me. I think the long version should be >> implemented in a similar way: check for has_vsx() is necessary, the length >> comparison should be done inside of the block. >>> >>> Best regards, >>> Martin >>> >>> >>> From: Michihiro Horie [mailto:HORIE at jp.ibm.com] >>> Sent: Montag, 30. Mai 2016 03:43 >>> To: Miki M Enoki >>> Cc: Breno Leitao ; Gustavo Romero >> ; hotspot-dev at openjdk.java.net; Doerr, >> Martin ; ppc-aix-port-dev at openjdk.java.net; >> Simonis, Volker ; Volker Simonis >> >>> Subject: Re: PPC64 VSX load/store instructions in stubs >>> >>> >>> Dear Breno, Gustavo, Voker, and Martin, >>> I am a cowoker of Miki. >>> >>> I implemented VSX disjoint arraycopy functions for byte, int, and long. >> Although Miki had implemented VSX disjoint long arraycopy, we found a >> couple of bugs so I fixed it. Would you please review them? >>> >>> Micro benchmarks for byte and int are as follows. (The one for long is the >> same as Miki's, which was attached before by Miki) >>> (See attached file: ArrayCopyTest_byte.java)(See attached file: >> ArrayCopyTest_int.java) >>> >>> Results are as follows. (For the short result, I used Gustavo's code.) >>> (See attached file: result_disjoint-arraycopy_vsx-max.jpg) >>> >>> Patch for Java8: >>> (See attached file: hotspot_jdk8.diff) >>> >>> Patch for Java9: >>> (See attached file: hotspot_jdk9.diff) >>> >>> Best regards, >>> -- >>> Michihiro Horie, >>> IBM Research - Tokyo >>> >>> [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno, >> Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno, >> Thank you for your reply. >>> >>> From: Miki M Enoki/Japan/IBM >>> To: Breno Leitao > >>> Cc: Gustavo Romero >> >, >> "hotspot-dev at openjdk.java.net" >> >, >> "Doerr, Martin" >, >> "ppc-aix-port-dev at openjdk.java.net> dev at openjdk.java.net>" > aix-port-dev at openjdk.java.net>>, "Simonis, Volker" >> >, Volker >> Simonis > >>> Date: 2016/05/25 00:15 >>> Subject: Re: PPC64 VSX load/store instructions in stubs >>> >>> ________________________________ >>> >>> >>> Hi Breno, >>> >>> Thank you for your reply. >>> >>>> The same mechanism could be used to copy arrays of short elements, as >> Gustavo was >>>> working on. Do you agree? >>> >>> I think the mechanism is different with type (byte, short, int, long...). >>> Gustavo will apply a pach with VSX for short array copy, so it would be >> reasonable to use VSX instruction for long array copy, too. >>> >>> My coworker is also creating byte and int arraycopy with VSX. He will post >> an email to this mailing list. >>> I appreciate it if our patch for byte, int and long copy is applied to OpenJDK. >>> >>> >>> Best regards, >>> Miki >>> >>> >>> >>> >>> [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On >> 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17 >> 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote: >>> >>> From: Breno Leitao > >>> To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" >> >, >>> Cc: Gustavo Romero >> >, >> Volker Simonis >> >, "Simonis, >> Volker" >, "ppc- >> aix-port-dev at openjdk.java.net> dev at openjdk.java.net>" > aix-port-dev at openjdk.java.net>>, "hotspot- >> dev at openjdk.java.net" > dev at openjdk.java.net> >>> Date: 2016/05/17 02:29 >>> Subject: Re: PPC64 VSX load/store instructions in stubs >>> ________________________________ >>> >>> >>> >>> Hi Miki, >>> >>> On 05/16/2016 02:53 AM, Miki M Enoki wrote: >>>> I also implemented VSX disjoint long arraycopy. >>>> I appreciate it if it is applied to OpenJDK, too. >>> >>> Thanks for the summarized information, this is helpful. Based on your plot, >> I >>> understand we can split the whole scenario in two: >>> >>> * Array size smaller than 4k, and then use VSX instructions to perform copy >>> * Array size bigger than 4k, and then use VMX instructions to perform copy >>> >>> The same mechanism could be used to copy arrays of short elements, as >> Gustavo was >>> working on. Do you agree? >>> >>> That said, I understand that a new patch should be generated that >> contemplates >>> both cases on a single patch, ready to be applied on OpenJDK 9 source >> code. Hence >>> a webrev should be generated mapping to bug id >>> https://bugs.openjdk.java.net/browse/JDK-8154156 >>> >>> If you need any help on the webrev[1] creation and hosting, Gustavo might >> help, >>> since he did this process already. >>> >>> [1] http://openjdk.java.net/guide/webrevHelp.html >>> >>> > From gromero at linux.vnet.ibm.com Tue May 31 12:57:14 2016 From: gromero at linux.vnet.ibm.com (Gustavo Romero) Date: Tue, 31 May 2016 09:57:14 -0300 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: References: <56FEDBB3.5030106@linux.vnet.ibm.com> <57339EE1.2040500@linux.vnet.ibm.com> <573A034C.9060602@br.ibm.com> <201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com> <4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap> <201605310149.u4V1mlAA012138@mx0a-001b2d01.pphosted.com> Message-ID: <201605311257.u4VCsQHD036253@mx0a-001b2d01.pphosted.com> Hi Martin! Thanks for creating a new BugID. Regards, Gustavo On 31-05-2016 07:17, Doerr, Martin wrote: > Hello everybody, > > I have created a new bug: JDK-8158232 > > We will need a webrev and a request for review mail to hotspot-dev: > "RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions" > > Thanks and best regards, > Martin > > -----Original Message----- > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] > Sent: Dienstag, 31. Mai 2016 03:50 > To: Doerr, Martin ; Michihiro Horie ; Miki M Enoki > Cc: Breno Leitao ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; Volker Simonis ; Breno Leitao > Subject: Re: PPC64 VSX load/store instructions in stubs > > Hi Michihiro > > Thanks a lot for providing a result summary for byte, short, int, and > long. > > Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate > the ABI, so you can use them as Martin suggested. > > Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi) > for byte, short, int, and long webrevs or open a new one? > > Thank you. > > Best regards, > Gustavo > > On 30-05-2016 06:56, Doerr, Martin wrote: >> Hi Michihiro, >> >> thanks for implementing the VSX versions. >> >> Gustavo's change "8154156: PPC64: improve array copy stubs by using vector instructions" is pushed into hs-comp. >> Your change needs to get adapted: >> >> - The vm_version and assembler parts are already there. >> >> - Vector-scalar load/store instructions use VectorSRegisters, now. >> >> The byte and int version look good to me. I think the long version should be implemented in a similar way: check for has_vsx() is necessary, the length comparison should be done inside of the block. >> >> Best regards, >> Martin >> >> >> From: Michihiro Horie [mailto:HORIE at jp.ibm.com] >> Sent: Montag, 30. Mai 2016 03:43 >> To: Miki M Enoki >> Cc: Breno Leitao ; Gustavo Romero ; hotspot-dev at openjdk.java.net; Doerr, Martin ; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; Volker Simonis >> Subject: Re: PPC64 VSX load/store instructions in stubs >> >> >> Dear Breno, Gustavo, Voker, and Martin, >> I am a cowoker of Miki. >> >> I implemented VSX disjoint arraycopy functions for byte, int, and long. Although Miki had implemented VSX disjoint long arraycopy, we found a couple of bugs so I fixed it. Would you please review them? >> >> Micro benchmarks for byte and int are as follows. (The one for long is the same as Miki's, which was attached before by Miki) >> (See attached file: ArrayCopyTest_byte.java)(See attached file: ArrayCopyTest_int.java) >> >> Results are as follows. (For the short result, I used Gustavo's code.) >> (See attached file: result_disjoint-arraycopy_vsx-max.jpg) >> >> Patch for Java8: >> (See attached file: hotspot_jdk8.diff) >> >> Patch for Java9: >> (See attached file: hotspot_jdk9.diff) >> >> Best regards, >> -- >> Michihiro Horie, >> IBM Research - Tokyo >> >> [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply. >> >> From: Miki M Enoki/Japan/IBM >> To: Breno Leitao > >> Cc: Gustavo Romero >, "hotspot-dev at openjdk.java.net" >, "Doerr, Martin" >, "ppc-aix-port-dev at openjdk.java.net" >, "Simonis, Volker" >, Volker Simonis > >> Date: 2016/05/25 00:15 >> Subject: Re: PPC64 VSX load/store instructions in stubs >> >> ________________________________ >> >> >> Hi Breno, >> >> Thank you for your reply. >> >>> The same mechanism could be used to copy arrays of short elements, as Gustavo was >>> working on. Do you agree? >> >> I think the mechanism is different with type (byte, short, int, long...). >> Gustavo will apply a pach with VSX for short array copy, so it would be reasonable to use VSX instruction for long array copy, too. >> >> My coworker is also creating byte and int arraycopy with VSX. He will post an email to this mailing list. >> I appreciate it if our patch for byte, int and long copy is applied to OpenJDK. >> >> >> Best regards, >> Miki >> >> >> >> >> [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote: >> >> From: Breno Leitao > >> To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" >, >> Cc: Gustavo Romero >, Volker Simonis >, "Simonis, Volker" >, "ppc-aix-port-dev at openjdk.java.net" >, "hotspot-dev at openjdk.java.net" > >> Date: 2016/05/17 02:29 >> Subject: Re: PPC64 VSX load/store instructions in stubs >> ________________________________ >> >> >> >> Hi Miki, >> >> On 05/16/2016 02:53 AM, Miki M Enoki wrote: >>> I also implemented VSX disjoint long arraycopy. >>> I appreciate it if it is applied to OpenJDK, too. >> >> Thanks for the summarized information, this is helpful. Based on your plot, I >> understand we can split the whole scenario in two: >> >> * Array size smaller than 4k, and then use VSX instructions to perform copy >> * Array size bigger than 4k, and then use VMX instructions to perform copy >> >> The same mechanism could be used to copy arrays of short elements, as Gustavo was >> working on. Do you agree? >> >> That said, I understand that a new patch should be generated that contemplates >> both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence >> a webrev should be generated mapping to bug id >> https://bugs.openjdk.java.net/browse/JDK-8154156 >> >> If you need any help on the webrev[1] creation and hosting, Gustavo might help, >> since he did this process already. >> >> [1] http://openjdk.java.net/guide/webrevHelp.html >> >> > From martin.doerr at sap.com Tue May 31 14:27:04 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 31 May 2016 14:27:04 +0000 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: <201605311151.u4VBnUpB001747@mx0a-001b2d01.pphosted.com> References: <56FEDBB3.5030106@linux.vnet.ibm.com> <57339EE1.2040500@linux.vnet.ibm.com> <573A034C.9060602@br.ibm.com> <201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com> <4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap> <201605310149.u4V1mlAA012138@mx0a-001b2d01.pphosted.com> <201605311151.u4VBnUpB001747@mx0a-001b2d01.pphosted.com> Message-ID: <0a783f90ba244080b20723c4982dc37e@DEWDFE13DE14.global.corp.sap> Hi Michihiro, I have uploaded a webrev here: http://cr.openjdk.java.net/~mdoerr/8158232_PPC_vsx_copy/webrev.00/ I had to change the formatting a little bit. There was a "src/cpu/ppc/vm/stubGenerator_ppc.cpp:1896: Trailing whitespace" which is not allowed. Please send out a request for review with the following subject and point to the webrev: "RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions" Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Dienstag, 31. Mai 2016 13:51 To: Doerr, Martin Cc: Gustavo Romero ; Miki M Enoki ; Breno Leitao ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; Volker Simonis ; Breno Leitao Subject: RE: PPC64 VSX load/store instructions in stubs Hi Martin, Gustavo, Thank you very much for your comments. I used VectorSRegisters, inserted an if-statement with has_vsx() in long arraycopy, and moved the length comparison inside the if-statement. Diff from jdk9 hs-comp hotspot: (See attached file: hotspot_jdk9_hscomp.diff) Best regards, -- Michihiro Horie, IBM Research - Tokyo [Inactive hide details for "Doerr, Martin" ---2016/05/31 19:18:19---Hello everybody, I have created a new bug: JDK-8158232]"Doerr, Martin" ---2016/05/31 19:18:19---Hello everybody, I have created a new bug: JDK-8158232 From: "Doerr, Martin" > To: Gustavo Romero >, Michihiro Horie/Japan/IBM at IBMJP, Miki M Enoki/Japan/IBM at IBMJP Cc: Breno Leitao >, "hotspot-dev at openjdk.java.net" >, "ppc-aix-port-dev at openjdk.java.net" >, "Simonis, Volker" >, Volker Simonis >, "Breno Leitao" > Date: 2016/05/31 19:18 Subject: RE: PPC64 VSX load/store instructions in stubs ________________________________ Hello everybody, I have created a new bug: JDK-8158232 We will need a webrev and a request for review mail to hotspot-dev: "RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions" Thanks and best regards, Martin -----Original Message----- From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] Sent: Dienstag, 31. Mai 2016 03:50 To: Doerr, Martin >; Michihiro Horie >; Miki M Enoki > Cc: Breno Leitao >; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker >; Volker Simonis >; Breno Leitao > Subject: Re: PPC64 VSX load/store instructions in stubs Hi Michihiro Thanks a lot for providing a result summary for byte, short, int, and long. Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate the ABI, so you can use them as Martin suggested. Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi) for byte, short, int, and long webrevs or open a new one? Thank you. Best regards, Gustavo On 30-05-2016 06:56, Doerr, Martin wrote: > Hi Michihiro, > > thanks for implementing the VSX versions. > > Gustavo's change "8154156: PPC64: improve array copy stubs by using vector instructions" is pushed into hs-comp. > Your change needs to get adapted: > > - The vm_version and assembler parts are already there. > > - Vector-scalar load/store instructions use VectorSRegisters, now. > > The byte and int version look good to me. I think the long version should be implemented in a similar way: check for has_vsx() is necessary, the length comparison should be done inside of the block. > > Best regards, > Martin > > > From: Michihiro Horie [mailto:HORIE at jp.ibm.com] > Sent: Montag, 30. Mai 2016 03:43 > To: Miki M Enoki > > Cc: Breno Leitao >; Gustavo Romero >; hotspot-dev at openjdk.java.net; Doerr, Martin >; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker >; Volker Simonis > > Subject: Re: PPC64 VSX load/store instructions in stubs > > > Dear Breno, Gustavo, Voker, and Martin, > I am a cowoker of Miki. > > I implemented VSX disjoint arraycopy functions for byte, int, and long. Although Miki had implemented VSX disjoint long arraycopy, we found a couple of bugs so I fixed it. Would you please review them? > > Micro benchmarks for byte and int are as follows. (The one for long is the same as Miki's, which was attached before by Miki) > (See attached file: ArrayCopyTest_byte.java)(See attached file: ArrayCopyTest_int.java) > > Results are as follows. (For the short result, I used Gustavo's code.) > (See attached file: result_disjoint-arraycopy_vsx-max.jpg) > > Patch for Java8: > (See attached file: hotspot_jdk8.diff) > > Patch for Java9: > (See attached file: hotspot_jdk9.diff) > > Best regards, > -- > Michihiro Horie, > IBM Research - Tokyo > > [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply. > > From: Miki M Enoki/Japan/IBM > To: Breno Leitao > > Cc: Gustavo Romero >, "hotspot-dev at openjdk.java.net" >, "Doerr, Martin" >, "ppc-aix-port-dev at openjdk.java.net" >, "Simonis, Volker" >, Volker Simonis > > Date: 2016/05/25 00:15 > Subject: Re: PPC64 VSX load/store instructions in stubs > > ________________________________ > > > Hi Breno, > > Thank you for your reply. > >> The same mechanism could be used to copy arrays of short elements, as Gustavo was >> working on. Do you agree? > > I think the mechanism is different with type (byte, short, int, long...). > Gustavo will apply a pach with VSX for short array copy, so it would be reasonable to use VSX instruction for long array copy, too. > > My coworker is also creating byte and int arraycopy with VSX. He will post an email to this mailing list. > I appreciate it if our patch for byte, int and long copy is applied to OpenJDK. > > > Best regards, > Miki > > > > > [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote: > > From: Breno Leitao > > To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" >, > Cc: Gustavo Romero >, Volker Simonis >, "Simonis, Volker" >, "ppc-aix-port-dev at openjdk.java.net" >, "hotspot-dev at openjdk.java.net" > > Date: 2016/05/17 02:29 > Subject: Re: PPC64 VSX load/store instructions in stubs > ________________________________ > > > > Hi Miki, > > On 05/16/2016 02:53 AM, Miki M Enoki wrote: >> I also implemented VSX disjoint long arraycopy. >> I appreciate it if it is applied to OpenJDK, too. > > Thanks for the summarized information, this is helpful. Based on your plot, I > understand we can split the whole scenario in two: > > * Array size smaller than 4k, and then use VSX instructions to perform copy > * Array size bigger than 4k, and then use VMX instructions to perform copy > > The same mechanism could be used to copy arrays of short elements, as Gustavo was > working on. Do you agree? > > That said, I understand that a new patch should be generated that contemplates > both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence > a webrev should be generated mapping to bug id > https://bugs.openjdk.java.net/browse/JDK-8154156 > > If you need any help on the webrev[1] creation and hosting, Gustavo might help, > since he did this process already. > > [1] http://openjdk.java.net/guide/webrevHelp.html > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From HORIE at jp.ibm.com Tue May 31 15:36:38 2016 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Wed, 1 Jun 2016 00:36:38 +0900 Subject: RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions Message-ID: <201605311537.u4VFYZWQ029737@mx0a-001b2d01.pphosted.com> Dear all, Could you please review the following webrev? http://cr.openjdk.java.net/~mdoerr/8158232_PPC_vsx_copy/webrev.00/ This change improves performance of disjoint arraycopy of byte, int, and long by using VSX load/store instructions. Discussion started from: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002483.html Performance improvement with micro benchmarks is shown in: http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002531.html Thank you very much, Best regards, -- Michihiro Horie, IBM Research - Tokyo -------------- next part -------------- An HTML attachment was scrubbed... URL: From HORIE at jp.ibm.com Tue May 31 15:46:35 2016 From: HORIE at jp.ibm.com (Michihiro Horie) Date: Wed, 1 Jun 2016 00:46:35 +0900 Subject: PPC64 VSX load/store instructions in stubs In-Reply-To: <0a783f90ba244080b20723c4982dc37e@DEWDFE13DE14.global.corp.sap> References: <56FEDBB3.5030106@linux.vnet.ibm.com> <57339EE1.2040500@linux.vnet.ibm.com> <573A034C.9060602@br.ibm.com> <201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com> <4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap> <201605310149.u4V1mlAA012138@mx0a-001b2d01.pphosted.com> <201605311151.u4VBnUpB001747@mx0a-001b2d01.pphosted.com> <0a783f90ba244080b20723c4982dc37e@DEWDFE13DE14.global.corp.sap> Message-ID: <201605311547.u4VFdbTm041151@mx0a-001b2d01.pphosted.com> Hi Martin, Thank you for fixing my code and uploading webrev. I sent a request with the given title. Best regards, -- Michihiro Horie, IBM Research - Tokyo From: "Doerr, Martin" To: Michihiro Horie/Japan/IBM at IBMJP Cc: Gustavo Romero , Miki M Enoki/Japan/IBM at IBMJP, Breno Leitao , "hotspot-dev at openjdk.java.net" , "ppc-aix-port-dev at openjdk.java.net" , "Simonis, Volker" , Volker Simonis , Breno Leitao Date: 2016/05/31 23:28 Subject: RE: PPC64 VSX load/store instructions in stubs Hi Michihiro, I have uploaded a webrev here: http://cr.openjdk.java.net/~mdoerr/8158232_PPC_vsx_copy/webrev.00/ I had to change the formatting a little bit. There was a ?src/cpu/ppc/vm/stubGenerator_ppc.cpp:1896: Trailing whitespace? which is not allowed. Please send out a request for review with the following subject and point to the webrev: "RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions" Best regards, Martin From: Michihiro Horie [mailto:HORIE at jp.ibm.com] Sent: Dienstag, 31. Mai 2016 13:51 To: Doerr, Martin Cc: Gustavo Romero ; Miki M Enoki ; Breno Leitao ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; Volker Simonis ; Breno Leitao Subject: RE: PPC64 VSX load/store instructions in stubs Hi Martin, Gustavo, Thank you very much for your comments. I used VectorSRegisters, inserted an if-statement with has_vsx() in long arraycopy, and moved the length comparison inside the if-statement. Diff from jdk9 hs-comp hotspot: (See attached file: hotspot_jdk9_hscomp.diff) Best regards, -- Michihiro Horie, IBM Research - Tokyo Inactive hide details for "Doerr, Martin" ---2016/05/31 19:18:19---Hello everybody, I have created a new bug: JDK-8158232"Doerr, Martin" ---2016/05/31 19:18:19---Hello everybody, I have created a new bug: JDK-8158232 From: "Doerr, Martin" To: Gustavo Romero , Michihiro Horie/Japan/IBM at IBMJP, Miki M Enoki/Japan/IBM at IBMJP Cc: Breno Leitao , "hotspot-dev at openjdk.java.net" < hotspot-dev at openjdk.java.net>, "ppc-aix-port-dev at openjdk.java.net" < ppc-aix-port-dev at openjdk.java.net>, "Simonis, Volker" < volker.simonis at sap.com>, Volker Simonis , "Breno Leitao" Date: 2016/05/31 19:18 Subject: RE: PPC64 VSX load/store instructions in stubs Hello everybody, I have created a new bug: JDK-8158232 We will need a webrev and a request for review mail to hotspot-dev: "RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions" Thanks and best regards, Martin -----Original Message----- From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] Sent: Dienstag, 31. Mai 2016 03:50 To: Doerr, Martin ; Michihiro Horie ; Miki M Enoki Cc: Breno Leitao ; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker ; Volker Simonis ; Breno Leitao < brenohl at br.ibm.com> Subject: Re: PPC64 VSX load/store instructions in stubs Hi Michihiro Thanks a lot for providing a result summary for byte, short, int, and long. Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate the ABI, so you can use them as Martin suggested. Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi) for byte, short, int, and long webrevs or open a new one? Thank you. Best regards, Gustavo On 30-05-2016 06:56, Doerr, Martin wrote: > Hi Michihiro, > > thanks for implementing the VSX versions. > > Gustavo's change "8154156: PPC64: improve array copy stubs by using vector instructions" is pushed into hs-comp. > Your change needs to get adapted: > > - The vm_version and assembler parts are already there. > > - Vector-scalar load/store instructions use VectorSRegisters, now. > > The byte and int version look good to me. I think the long version should be implemented in a similar way: check for has_vsx() is necessary, the length comparison should be done inside of the block. > > Best regards, > Martin > > > From: Michihiro Horie [mailto:HORIE at jp.ibm.com] > Sent: Montag, 30. Mai 2016 03:43 > To: Miki M Enoki > Cc: Breno Leitao ; Gustavo Romero < gromero at linux.vnet.ibm.com>; hotspot-dev at openjdk.java.net; Doerr, Martin < martin.doerr at sap.com>; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker < volker.simonis at sap.com>; Volker Simonis > Subject: Re: PPC64 VSX load/store instructions in stubs > > > Dear Breno, Gustavo, Voker, and Martin, > I am a cowoker of Miki. > > I implemented VSX disjoint arraycopy functions for byte, int, and long. Although Miki had implemented VSX disjoint long arraycopy, we found a couple of bugs so I fixed it. Would you please review them? > > Micro benchmarks for byte and int are as follows. (The one for long is the same as Miki's, which was attached before by Miki) > (See attached file: ArrayCopyTest_byte.java)(See attached file: ArrayCopyTest_int.java) > > Results are as follows. (For the short result, I used Gustavo's code.) > (See attached file: result_disjoint-arraycopy_vsx-max.jpg) > > Patch for Java8: > (See attached file: hotspot_jdk8.diff) > > Patch for Java9: > (See attached file: hotspot_jdk9.diff) > > Best regards, > -- > Michihiro Horie, > IBM Research - Tokyo > > [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply. > > From: Miki M Enoki/Japan/IBM > To: Breno Leitao > > Cc: Gustavo Romero >, "hotspot-dev at openjdk.java.net< mailto:hotspot-dev at openjdk.java.net>" >, "Doerr, Martin" >, "ppc-aix-port-dev at openjdk.java.net" >, "Simonis, Volker" >, Volker Simonis > > Date: 2016/05/25 00:15 > Subject: Re: PPC64 VSX load/store instructions in stubs > > ________________________________ > > > Hi Breno, > > Thank you for your reply. > >> The same mechanism could be used to copy arrays of short elements, as Gustavo was >> working on. Do you agree? > > I think the mechanism is different with type (byte, short, int, long...). > Gustavo will apply a pach with VSX for short array copy, so it would be reasonable to use VSX instruction for long array copy, too. > > My coworker is also creating byte and int arraycopy with VSX. He will post an email to this mailing list. > I appreciate it if our patch for byte, int and long copy is applied to OpenJDK. > > > Best regards, > Miki > > > > > [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote: > > From: Breno Leitao > > To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" >, > Cc: Gustavo Romero >, Volker Simonis >, "Simonis, Volker" >, "ppc-aix-port-dev at openjdk.java.net" >, "hotspot-dev at openjdk.java.net< mailto:hotspot-dev at openjdk.java.net>" > > Date: 2016/05/17 02:29 > Subject: Re: PPC64 VSX load/store instructions in stubs > ________________________________ > > > > Hi Miki, > > On 05/16/2016 02:53 AM, Miki M Enoki wrote: >> I also implemented VSX disjoint long arraycopy. >> I appreciate it if it is applied to OpenJDK, too. > > Thanks for the summarized information, this is helpful. Based on your plot, I > understand we can split the whole scenario in two: > > * Array size smaller than 4k, and then use VSX instructions to perform copy > * Array size bigger than 4k, and then use VMX instructions to perform copy > > The same mechanism could be used to copy arrays of short elements, as Gustavo was > working on. Do you agree? > > That said, I understand that a new patch should be generated that contemplates > both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence > a webrev should be generated mapping to bug id > https://bugs.openjdk.java.net/browse/JDK-8154156 > > If you need any help on the webrev[1] creation and hosting, Gustavo might help, > since he did this process already. > > [1] http://openjdk.java.net/guide/webrevHelp.html > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: