From david.holmes at oracle.com  Mon May  2 23:04:24 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 3 May 2016 09:04:24 +1000
Subject: RFR(S): 8153892: Handle unsafe access error directly in signal
	handler instead of going through a stub
In-Reply-To: <e533e9bd-0cc9-8fdb-5a47-12d5c382fb71@oracle.com>
References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com>
	<570C417F.20600@oracle.com>
	<CAA-vtUw5s3zZVND4_4K228+rOK4_XbrZ90m+1e9WKHAb1RpBJA@mail.gmail.com>
	<571FB50E.6090108@oracle.com>
	<CAA-vtUxOt2PfxoYbFZGguMzUsHAydO8gGPgNrs0dsqinNzYREg@mail.gmail.com>
	<5720E0C8.608@oracle.com> <57220515.6080309@oracle.com>
	<e533e9bd-0cc9-8fdb-5a47-12d5c382fb71@oracle.com>
Message-ID: <71ac3531-8c7b-44dc-23ac-09c9867f8c20@oracle.com>

On 29/04/2016 1:44 AM, Mikael Vidstedt wrote:
>
>
> On 4/28/2016 5:41 AM, David Holmes wrote:
>> Hi Mikael,
>>
>> On 28/04/2016 1:54 AM, Mikael Vidstedt wrote:
>>>
>>>
>>> On 4/27/2016 12:24 AM, Thomas St?fe wrote:
>>>> Hi Mikael,
>>>>
>>>> On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt
>>>> <<mailto:mikael.vidstedt at oracle.com>mikael.vidstedt at oracle.com> wrote:
>>>>
>>>>
>>>>
>>>>     On 4/12/2016 2:15 AM, Thomas St?fe wrote:
>>>>>     Hi Mikael, David,
>>>>>
>>>>>     On Tue, Apr 12, 2016 at 2:29 AM, David Holmes
>>>>> <<mailto:david.holmes at oracle.com>david.holmes at oracle.com> wrote:
>>>>>
>>>>>         On 11/04/2016 10:57 AM, David Holmes wrote:
>>>>>
>>>>>             Hi Mikael,
>>>>>
>>>>>             I think we need to be able to answer the question as to
>>>>>             why the stubbed
>>>>>             and stubless forms of this code exist to ensure that
>>>>>             converting all
>>>>>             platforms to the same form is appropriate.
>>>>>
>>>>>
>>>>>         The more I look at this the more the stubs make no sense :)
>>>>>         AIII a stub is generated when we need runtime code that may
>>>>>         be different to that which we could write directly for
>>>>>         compiling at build time - ie to use CPU specific features of
>>>>>         the actual CPU. But I see nothing here that suggests any such
>>>>>         usage.
>>>>>
>>>>>         So I agree with removing the stubs.
>>>>>
>>>>>             I'm still going through this but my initial reaction is
>>>>>             to wonder why we
>>>>>             don't use the same form of handle_unsafe_access on all
>>>>>             platforms and
>>>>>             always pass in npc? (That seems to be the only difference
>>>>>             in code that
>>>>>             otherwise seems platform independent.)
>>>>>
>>>>>
>>>>>         Futher to this and Thomas's comments I think
>>>>>         handle_unsafe_access(thread, pc, npc) can be defined in
>>>>>         shared code (where? not sure). Further, if we always pass in
>>>>>         npc then we don't need to pass in pc as it is unused (seems
>>>>>         unused in original code too for sparc).
>>>>>
>>>>>
>>>>>     I agree. We commonized ucontext_set_pc for all Posix platforms,
>>>>>     so we can make a common function "handle_unsafe_access(thread,
>>>>>     npc)" and inside use os::Posix::ucontext_set_pc to modify the
>>>>>     context. Then we can get rid of the special handling in the
>>>>>     signal handlers inside os_aix_ppc.cpp and os_linux_ppc.cpp (for
>>>>>     both the compiled and the interpreted case).
>>>>
>>>>     There is definitely room for unification and simplification here.
>>>>     Right now the signal handling code is, sadly, different on all the
>>>>     different platforms, despite the fact that in many cases it should
>>>>     be similar or the exact same. That said, as much as a
>>>>     refactoring/rewrite of the signal handler code is needed, it will
>>>>     very quickly turn into a much larger effort...
>>>>
>>>>     In this specific case, it would probably make more sense to pass
>>>>     in the full context to the handle_unsafe_access method and have it
>>>>     do whatever it feels is necessary to update it. However, a lot of
>>>>     the signal handler code assumes that a "stub" variable gets set up
>>>>     and only at the end of the main signal handler function does the
>>>>     actual context get updated. Changing how that works only for this
>>>>     specific case is obviously not a good idea, which means it's back
>>>>     to the full scale refactoring and out of scope for the bug fix.
>>>>
>>>>     So to me the fact that the method prototypes differ depending on
>>>>     the exact platform is just a reflection of how the contexts
>>>>     differ. In lack of the full context the handler method needs to
>>>>     take whatever parts of the context is needed to do it's job. I
>>>>     could of course change the handler method to only take a single
>>>>     "next_pc" argument, but to me that feels like putting a key part
>>>>     of the logic which handles the unsafe access (specifically, the
>>>>     part which calculates the next pc) in the wrong place - IMHO that
>>>>     should really be tightly coupled with the rest of the logic needed
>>>>     to handle an unsafe access (updating the thread state etc.), and
>>>>     hence I feel that it really belongs in the handle_unsafe_access
>>>>     method itself. Happy to hear your thoughts, but I hope we can
>>>>     agree that the suggested fix, even in its current state, is still
>>>>     significantly better than what is there now.
>>>>
>>>>
>>>>     Unless somebody has a better suggestion, I'm going to be moving
>>>>     the implementations of the handle_unsafe_access methods to
>>>>     sharedRuntime (instead of stubRoutines) and will send out a new
>>>>     webrev shortly.
>>>>
>>>>
>>>> I am unhappy with the fact that we factor unsafe handling out for x86
>>>> and sparc but do it inline for ppc. I know that was done before your
>>>> change as well but would be happy if that could be improved. I would
>>>> prefer either one of:
>>>
>>> Fully agree - this is an example of the more general problem of logic
>>> which is /almost/ the same across different platforms, but which has
>>> been effectively copy/pasted and drifted apart over time.
>>>
>>>>
>>>> 1) flatten out the coding into the signal handlers like it is done in
>>>> os_linux_ppc.cpp and os_aix_ppc.cpp or
>>>> 2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case
>>>>
>>>> I would actually prefer (1) even though this would multiply the code
>>>> out for all os cases into <os_cpu>; we are only talking about 1-2
>>>> lines of additional coding, and it would improve the readability of
>>>> the signal handlers.
>>>>
>>>> But this is only my personal opinion, and I do not have strong
>>>> emotions. I agree with you that a full cleanup of the signal coding is
>>>> out of scope for this issue.
>>>
>>> I spent yesterday going back and forth on the various alternatives and
>>> the only thing I can say with certainty now is that apart from
>>> refactoring the whole thing everything else is ugly... For example, I
>>> agree that consistency is an important goal here, but since there's
>>> little to no consistency there today it's really hard to make a relevant
>>> dent in it. :(
>>>
>>> Flattening it out is an alternative (and a good one), but that is not
>>> something I'm willing to do as part of this change because only
>>> flattening this specific case/return will actually add to the
>>> inconstency... So ultimately yesterday I chose to do something closer to
>>> your alternative 2). Is it still ugly? Yes; lipstick on pig and all of
>>> that. Have a look at it and see how you feel about it. I try to keep in
>>> mind that what is there today is (more) broken. :)
>>>
>>> Webrev:
>>>
>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02/hotspot/webrev/
>>>
>>
>> Now I see this in code form I really don't understand why next_pc is
>> passed in, unused and then returned ??
>
> Given that the full context can't easily be passed in and updated (which
> I still think is the right, long term way of doing this), I chose to do
> it this way instead. It is a signal to a caller that *only* calling
> handle_unsafe_access is not enough, there's more to it in that the
> context also needs to be updated. I see it as a way to make sure the
> actual next_pc calculation and update is not forgotten, and make it
> clear that it goes hand in hand with updating the thread state. It's
> obviously not perfect, but I do feel like it ever so slightly helps
> clarify how the access fault needs to be handled.

Okay.

Thanks,
David
-----


>>
>> Otherwise in src/share/vm/runtime/sharedRuntime.cpp in the comment
>> block - capitals after periods please :)
>
> Fixed, I'll not post a new webrev for it though :)
>
> Cheers,
> Mikael
>
>>
>> Stub removal seems fine.
>>
>> Thanks,
>> David
>>
>>> Incremental from webrev.01:
>>>
>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02.incr/hotspot/webrev/
>>>
>>>
>>> Cheers,
>>> Mikael
>>>
>>>>
>>>>
>>>>     Cheers,
>>>>     Mikael
>>>>
>>>>>
>>>>>
>>>>>         BTW I found this comment somewhat unfathomable (both now and
>>>>>         in original code):
>>>>>
>>>>>         +   // pc is the instruction which we must emulate
>>>>>         +   // doing a no-op is fine:  return garbage from the load
>>>>>
>>>>>         but finally realized that it means that after the load that
>>>>>         raised the signal the native code proceeds normally but the
>>>>>         value apparently loaded is just garbage/arbitrary, and the
>>>>>         only sign something went wrong is the setting of the pending
>>>>>         unsafe-access-error bit. This would be a potential source of
>>>>>         bugs I think, except that when we hit the Java level, we
>>>>>         throw the exception and so never actually "return" the
>>>>>         garbage value. But it does mean we would have to be careful
>>>>>         if calling the unsafe routines from native code.
>>>>>
>>>>>
>>>>>     I admit I do not understand fully how the
>>>>>     _special_runtime_exit_condition flag is processed later, at least
>>>>>     not for all cases: If I have a java method A using
>>>>>     sun.misc.unsafe, which gets compiled, the sun.misc.unsafe
>>>>>     intrinsic gets inlined into that method. So, the whole method A
>>>>>     gets marked as "has unsafe access"? So, any SIGBUS happening
>>>>>     inside this method - which may be larger than the inlined
>>>>>     sun.misc.unsafe call - will yield an InternalError? And when is
>>>>>     the flag checked if that method A is called from another java
>>>>>     method B?
>>>>>
>>>>>     Sorry if the questions are stupid, I am not a JIT expert, but I
>>>>>     try to understand how much can happen between the SIGBUS and the
>>>>>     InternalError getting thrown.
>>>>
>>>>     No questions are stupid here. As you may have seen in the other
>>>>     thread, I filed JDK-8154592[1] to cover making the handling of the
>>>>     faults synchronous. Hope that helps.
>>>>
>>>>
>>>> Thank you!
>>>>
>>>> Kind Regards, Thomas
>>>>
>>>>     Cheers,
>>>>     Mikael
>>>>
>>>>     [1] https://bugs.openjdk.java.net/browse/JDK-8154592
>>>>
>>>>
>>>>>
>>>>>     Thanks, Thomas
>>>>>
>>>>>         Thanks,
>>>>>         David
>>>>>
>>>>>
>>>>>             Thanks,
>>>>>             David
>>>>>
>>>>>             On 9/04/2016 8:33 AM, Mikael Vidstedt wrote:
>>>>>
>>>>>
>>>>>                 Please review:
>>>>>
>>>>>                 Bug: https://bugs.openjdk.java.net/browse/JDK-8153892
>>>>>                 Webrev:
>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/
>>>>>
>>>>> <http://cr.openjdk.java.net/%7Emikael/webrevs/8153892/webrev.01/hotspot/webrev/>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                 * Note: this is patch 2 in a set of 3 all aiming to
>>>>>                 clean up and unify
>>>>>                 the unsafe memory getters/setters, along with the
>>>>>                 handling of unsafe
>>>>>                 access errors. The other two issues are:
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8153890 -
>>>>>                 Handle unsafe access
>>>>>                 error as an asynchronous exception
>>>>> https://bugs.openjdk.java.net/browse/JDK-8150921 -
>>>>>                 Update Unsafe
>>>>>                 getters/setters to use double-register variants
>>>>>
>>>>>
>>>>>                 * Summary (copied from the bug description)
>>>>>
>>>>>
>>>>>                 In certain cases, such as accessing a region of a
>>>>>                 memory mapped file
>>>>>                 which has been truncated on unix-style operating
>>>>>                 systems, a SIGBUS
>>>>>                 signal will be raised and the VM will process it in
>>>>>                 the signal handler.
>>>>>
>>>>>                 How the signal is processed differs depending on the
>>>>>                 operating system
>>>>>                 and/or CPU architecture, with two major alternatives:
>>>>>
>>>>>                 * "stubless"
>>>>>
>>>>>                 Do the necessary thread state updates directly in the
>>>>>                 signal handler,
>>>>>                 and modify the context so that the signal handler
>>>>>                 returns to the place
>>>>>                 where the execution should continue
>>>>>
>>>>>                 * Using a stub
>>>>>
>>>>>                 Update the context so that when the signal handler
>>>>>                 returns the thread
>>>>>                 will continue execution in a generated stub, which in
>>>>>                 turn will call
>>>>>                 some native code in the VM to update the thread state
>>>>>                 and figure out
>>>>>                 where execution should continue. The stub will then
>>>>>                 jump to that new
>>>>>                 place.
>>>>>
>>>>>
>>>>>                 It should be noted that the work of updating the
>>>>>                 thread state is very
>>>>>                 small - it's setting a flag or two in the thread
>>>>>                 structure, and figures
>>>>>                 out where the next instruction starts. It should also
>>>>>                 be noted that the
>>>>>                 generated stubs today are broken, because they do not
>>>>>                 preserve all the
>>>>>                 live registers over the call into the VM. There are
>>>>>                 two ways to address
>>>>>                 this:
>>>>>
>>>>>                 * Preserve all the necessary registers
>>>>>
>>>>>                 This would mean implementing, in macro assembly, the
>>>>>                 necessary logic for
>>>>>                 preserving all the live registers, including, but not
>>>>>                 limited to,
>>>>>                 floating point registers, flag registers, etc. It
>>>>>                 quickly becomes
>>>>>                 obvious that this platform specific and error prone.
>>>>>
>>>>>                 * Leverage the fact that the operating system already
>>>>>                 does this as part
>>>>>                 of the signal handling
>>>>>
>>>>>                 Do the necessary work in the signal handler instead,
>>>>>                 removing the need
>>>>>                 for the stub alltogether
>>>>>
>>>>>                 As mentioned, on some platforms the latter model is
>>>>>                 already in use. It
>>>>>                 is dramatically easier and all platforms should be
>>>>>                 updated to do it the
>>>>>                 same way.
>>>>>
>>>>>
>>>>>                 * Testing
>>>>>
>>>>>                 Just as mentioned in the RFR for JDK-8153890, a new
>>>>>                 test was developed
>>>>>                 to test this code path:
>>>>>
>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java
>>>>>
>>>>> <http://cr.openjdk.java.net/%7Emikael/webrevs/8150921/MappedTruncated.java>
>>>>>
>>>>>
>>>>>                 In fact, it was when running this test I found the
>>>>>                 register preservation
>>>>>                 issue. JPRT also passes. Much like JDK-8153890 I
>>>>>                 wanted to get some
>>>>>                 feedback on this before running additional tests.
>>>>>
>>>>>
>>>>>                 Cheers,
>>>>>                 Mikael
>>>>>
>>>>>
>>>>
>>>>
>>>
>

From mikael.vidstedt at oracle.com  Tue May  3 04:12:09 2016
From: mikael.vidstedt at oracle.com (Mikael Vidstedt)
Date: Mon, 2 May 2016 21:12:09 -0700
Subject: RFR(S): 8153892: Handle unsafe access error directly in signal
	handler instead of going through a stub
In-Reply-To: <71ac3531-8c7b-44dc-23ac-09c9867f8c20@oracle.com>
References: <570831BD.7080005@oracle.com> <570AF68B.9090707@oracle.com>
	<570C417F.20600@oracle.com>
	<CAA-vtUw5s3zZVND4_4K228+rOK4_XbrZ90m+1e9WKHAb1RpBJA@mail.gmail.com>
	<571FB50E.6090108@oracle.com>
	<CAA-vtUxOt2PfxoYbFZGguMzUsHAydO8gGPgNrs0dsqinNzYREg@mail.gmail.com>
	<5720E0C8.608@oracle.com> <57220515.6080309@oracle.com>
	<e533e9bd-0cc9-8fdb-5a47-12d5c382fb71@oracle.com>
	<71ac3531-8c7b-44dc-23ac-09c9867f8c20@oracle.com>
Message-ID: <50fd5d01-2186-8b07-20bf-72ad382eb6d6@oracle.com>


On 5/2/2016 4:04 PM, David Holmes wrote:
> On 29/04/2016 1:44 AM, Mikael Vidstedt wrote:
>>
>>
>> On 4/28/2016 5:41 AM, David Holmes wrote:
>>> Hi Mikael,
>>>
>>> On 28/04/2016 1:54 AM, Mikael Vidstedt wrote:
>>>>
>>>>
>>>> On 4/27/2016 12:24 AM, Thomas St?fe wrote:
>>>>> Hi Mikael,
>>>>>
>>>>> On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt
>>>>> <<mailto:mikael.vidstedt at oracle.com>mikael.vidstedt at oracle.com> 
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>     On 4/12/2016 2:15 AM, Thomas St?fe wrote:
>>>>>>     Hi Mikael, David,
>>>>>>
>>>>>>     On Tue, Apr 12, 2016 at 2:29 AM, David Holmes
>>>>>> <<mailto:david.holmes at oracle.com>david.holmes at oracle.com> wrote:
>>>>>>
>>>>>>         On 11/04/2016 10:57 AM, David Holmes wrote:
>>>>>>
>>>>>>             Hi Mikael,
>>>>>>
>>>>>>             I think we need to be able to answer the question as to
>>>>>>             why the stubbed
>>>>>>             and stubless forms of this code exist to ensure that
>>>>>>             converting all
>>>>>>             platforms to the same form is appropriate.
>>>>>>
>>>>>>
>>>>>>         The more I look at this the more the stubs make no sense :)
>>>>>>         AIII a stub is generated when we need runtime code that may
>>>>>>         be different to that which we could write directly for
>>>>>>         compiling at build time - ie to use CPU specific features of
>>>>>>         the actual CPU. But I see nothing here that suggests any 
>>>>>> such
>>>>>>         usage.
>>>>>>
>>>>>>         So I agree with removing the stubs.
>>>>>>
>>>>>>             I'm still going through this but my initial reaction is
>>>>>>             to wonder why we
>>>>>>             don't use the same form of handle_unsafe_access on all
>>>>>>             platforms and
>>>>>>             always pass in npc? (That seems to be the only 
>>>>>> difference
>>>>>>             in code that
>>>>>>             otherwise seems platform independent.)
>>>>>>
>>>>>>
>>>>>>         Futher to this and Thomas's comments I think
>>>>>>         handle_unsafe_access(thread, pc, npc) can be defined in
>>>>>>         shared code (where? not sure). Further, if we always pass in
>>>>>>         npc then we don't need to pass in pc as it is unused (seems
>>>>>>         unused in original code too for sparc).
>>>>>>
>>>>>>
>>>>>>     I agree. We commonized ucontext_set_pc for all Posix platforms,
>>>>>>     so we can make a common function "handle_unsafe_access(thread,
>>>>>>     npc)" and inside use os::Posix::ucontext_set_pc to modify the
>>>>>>     context. Then we can get rid of the special handling in the
>>>>>>     signal handlers inside os_aix_ppc.cpp and os_linux_ppc.cpp (for
>>>>>>     both the compiled and the interpreted case).
>>>>>
>>>>>     There is definitely room for unification and simplification here.
>>>>>     Right now the signal handling code is, sadly, different on all 
>>>>> the
>>>>>     different platforms, despite the fact that in many cases it 
>>>>> should
>>>>>     be similar or the exact same. That said, as much as a
>>>>>     refactoring/rewrite of the signal handler code is needed, it will
>>>>>     very quickly turn into a much larger effort...
>>>>>
>>>>>     In this specific case, it would probably make more sense to pass
>>>>>     in the full context to the handle_unsafe_access method and 
>>>>> have it
>>>>>     do whatever it feels is necessary to update it. However, a lot of
>>>>>     the signal handler code assumes that a "stub" variable gets 
>>>>> set up
>>>>>     and only at the end of the main signal handler function does the
>>>>>     actual context get updated. Changing how that works only for this
>>>>>     specific case is obviously not a good idea, which means it's back
>>>>>     to the full scale refactoring and out of scope for the bug fix.
>>>>>
>>>>>     So to me the fact that the method prototypes differ depending on
>>>>>     the exact platform is just a reflection of how the contexts
>>>>>     differ. In lack of the full context the handler method needs to
>>>>>     take whatever parts of the context is needed to do it's job. I
>>>>>     could of course change the handler method to only take a single
>>>>>     "next_pc" argument, but to me that feels like putting a key part
>>>>>     of the logic which handles the unsafe access (specifically, the
>>>>>     part which calculates the next pc) in the wrong place - IMHO that
>>>>>     should really be tightly coupled with the rest of the logic 
>>>>> needed
>>>>>     to handle an unsafe access (updating the thread state etc.), and
>>>>>     hence I feel that it really belongs in the handle_unsafe_access
>>>>>     method itself. Happy to hear your thoughts, but I hope we can
>>>>>     agree that the suggested fix, even in its current state, is still
>>>>>     significantly better than what is there now.
>>>>>
>>>>>
>>>>>     Unless somebody has a better suggestion, I'm going to be moving
>>>>>     the implementations of the handle_unsafe_access methods to
>>>>>     sharedRuntime (instead of stubRoutines) and will send out a new
>>>>>     webrev shortly.
>>>>>
>>>>>
>>>>> I am unhappy with the fact that we factor unsafe handling out for x86
>>>>> and sparc but do it inline for ppc. I know that was done before your
>>>>> change as well but would be happy if that could be improved. I would
>>>>> prefer either one of:
>>>>
>>>> Fully agree - this is an example of the more general problem of logic
>>>> which is /almost/ the same across different platforms, but which has
>>>> been effectively copy/pasted and drifted apart over time.
>>>>
>>>>>
>>>>> 1) flatten out the coding into the signal handlers like it is done in
>>>>> os_linux_ppc.cpp and os_aix_ppc.cpp or
>>>>> 2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case
>>>>>
>>>>> I would actually prefer (1) even though this would multiply the code
>>>>> out for all os cases into <os_cpu>; we are only talking about 1-2
>>>>> lines of additional coding, and it would improve the readability of
>>>>> the signal handlers.
>>>>>
>>>>> But this is only my personal opinion, and I do not have strong
>>>>> emotions. I agree with you that a full cleanup of the signal 
>>>>> coding is
>>>>> out of scope for this issue.
>>>>
>>>> I spent yesterday going back and forth on the various alternatives and
>>>> the only thing I can say with certainty now is that apart from
>>>> refactoring the whole thing everything else is ugly... For example, I
>>>> agree that consistency is an important goal here, but since there's
>>>> little to no consistency there today it's really hard to make a 
>>>> relevant
>>>> dent in it. :(
>>>>
>>>> Flattening it out is an alternative (and a good one), but that is not
>>>> something I'm willing to do as part of this change because only
>>>> flattening this specific case/return will actually add to the
>>>> inconstency... So ultimately yesterday I chose to do something 
>>>> closer to
>>>> your alternative 2). Is it still ugly? Yes; lipstick on pig and all of
>>>> that. Have a look at it and see how you feel about it. I try to 
>>>> keep in
>>>> mind that what is there today is (more) broken. :)
>>>>
>>>> Webrev:
>>>>
>>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02/hotspot/webrev/ 
>>>>
>>>>
>>>
>>> Now I see this in code form I really don't understand why next_pc is
>>> passed in, unused and then returned ??
>>
>> Given that the full context can't easily be passed in and updated (which
>> I still think is the right, long term way of doing this), I chose to do
>> it this way instead. It is a signal to a caller that *only* calling
>> handle_unsafe_access is not enough, there's more to it in that the
>> context also needs to be updated. I see it as a way to make sure the
>> actual next_pc calculation and update is not forgotten, and make it
>> clear that it goes hand in hand with updating the thread state. It's
>> obviously not perfect, but I do feel like it ever so slightly helps
>> clarify how the access fault needs to be handled.
>
> Okay.
>
> Thanks,
> David
> -----

Thomas/David - thank you very much for the feedback and reviews!

Cheers,
Mikael

>
>
>>>
>>> Otherwise in src/share/vm/runtime/sharedRuntime.cpp in the comment
>>> block - capitals after periods please :)
>>
>> Fixed, I'll not post a new webrev for it though :)
>>
>> Cheers,
>> Mikael
>>
>>>
>>> Stub removal seems fine.
>>>
>>> Thanks,
>>> David
>>>
>>>> Incremental from webrev.01:
>>>>
>>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02.incr/hotspot/webrev/ 
>>>>
>>>>
>>>>
>>>> Cheers,
>>>> Mikael
>>>>
>>>>>
>>>>>
>>>>>     Cheers,
>>>>>     Mikael
>>>>>
>>>>>>
>>>>>>
>>>>>>         BTW I found this comment somewhat unfathomable (both now and
>>>>>>         in original code):
>>>>>>
>>>>>>         +   // pc is the instruction which we must emulate
>>>>>>         +   // doing a no-op is fine:  return garbage from the load
>>>>>>
>>>>>>         but finally realized that it means that after the load that
>>>>>>         raised the signal the native code proceeds normally but the
>>>>>>         value apparently loaded is just garbage/arbitrary, and the
>>>>>>         only sign something went wrong is the setting of the pending
>>>>>>         unsafe-access-error bit. This would be a potential source of
>>>>>>         bugs I think, except that when we hit the Java level, we
>>>>>>         throw the exception and so never actually "return" the
>>>>>>         garbage value. But it does mean we would have to be careful
>>>>>>         if calling the unsafe routines from native code.
>>>>>>
>>>>>>
>>>>>>     I admit I do not understand fully how the
>>>>>>     _special_runtime_exit_condition flag is processed later, at 
>>>>>> least
>>>>>>     not for all cases: If I have a java method A using
>>>>>>     sun.misc.unsafe, which gets compiled, the sun.misc.unsafe
>>>>>>     intrinsic gets inlined into that method. So, the whole method A
>>>>>>     gets marked as "has unsafe access"? So, any SIGBUS happening
>>>>>>     inside this method - which may be larger than the inlined
>>>>>>     sun.misc.unsafe call - will yield an InternalError? And when is
>>>>>>     the flag checked if that method A is called from another java
>>>>>>     method B?
>>>>>>
>>>>>>     Sorry if the questions are stupid, I am not a JIT expert, but I
>>>>>>     try to understand how much can happen between the SIGBUS and the
>>>>>>     InternalError getting thrown.
>>>>>
>>>>>     No questions are stupid here. As you may have seen in the other
>>>>>     thread, I filed JDK-8154592[1] to cover making the handling of 
>>>>> the
>>>>>     faults synchronous. Hope that helps.
>>>>>
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Kind Regards, Thomas
>>>>>
>>>>>     Cheers,
>>>>>     Mikael
>>>>>
>>>>>     [1] https://bugs.openjdk.java.net/browse/JDK-8154592
>>>>>
>>>>>
>>>>>>
>>>>>>     Thanks, Thomas
>>>>>>
>>>>>>         Thanks,
>>>>>>         David
>>>>>>
>>>>>>
>>>>>>             Thanks,
>>>>>>             David
>>>>>>
>>>>>>             On 9/04/2016 8:33 AM, Mikael Vidstedt wrote:
>>>>>>
>>>>>>
>>>>>>                 Please review:
>>>>>>
>>>>>>                 Bug: 
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153892
>>>>>>                 Webrev:
>>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/ 
>>>>>>
>>>>>>
>>>>>> <http://cr.openjdk.java.net/%7Emikael/webrevs/8153892/webrev.01/hotspot/webrev/> 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>                 * Note: this is patch 2 in a set of 3 all aiming to
>>>>>>                 clean up and unify
>>>>>>                 the unsafe memory getters/setters, along with the
>>>>>>                 handling of unsafe
>>>>>>                 access errors. The other two issues are:
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153890 -
>>>>>>                 Handle unsafe access
>>>>>>                 error as an asynchronous exception
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8150921 -
>>>>>>                 Update Unsafe
>>>>>>                 getters/setters to use double-register variants
>>>>>>
>>>>>>
>>>>>>                 * Summary (copied from the bug description)
>>>>>>
>>>>>>
>>>>>>                 In certain cases, such as accessing a region of a
>>>>>>                 memory mapped file
>>>>>>                 which has been truncated on unix-style operating
>>>>>>                 systems, a SIGBUS
>>>>>>                 signal will be raised and the VM will process it in
>>>>>>                 the signal handler.
>>>>>>
>>>>>>                 How the signal is processed differs depending on the
>>>>>>                 operating system
>>>>>>                 and/or CPU architecture, with two major 
>>>>>> alternatives:
>>>>>>
>>>>>>                 * "stubless"
>>>>>>
>>>>>>                 Do the necessary thread state updates directly in 
>>>>>> the
>>>>>>                 signal handler,
>>>>>>                 and modify the context so that the signal handler
>>>>>>                 returns to the place
>>>>>>                 where the execution should continue
>>>>>>
>>>>>>                 * Using a stub
>>>>>>
>>>>>>                 Update the context so that when the signal handler
>>>>>>                 returns the thread
>>>>>>                 will continue execution in a generated stub, 
>>>>>> which in
>>>>>>                 turn will call
>>>>>>                 some native code in the VM to update the thread 
>>>>>> state
>>>>>>                 and figure out
>>>>>>                 where execution should continue. The stub will then
>>>>>>                 jump to that new
>>>>>>                 place.
>>>>>>
>>>>>>
>>>>>>                 It should be noted that the work of updating the
>>>>>>                 thread state is very
>>>>>>                 small - it's setting a flag or two in the thread
>>>>>>                 structure, and figures
>>>>>>                 out where the next instruction starts. It should 
>>>>>> also
>>>>>>                 be noted that the
>>>>>>                 generated stubs today are broken, because they do 
>>>>>> not
>>>>>>                 preserve all the
>>>>>>                 live registers over the call into the VM. There are
>>>>>>                 two ways to address
>>>>>>                 this:
>>>>>>
>>>>>>                 * Preserve all the necessary registers
>>>>>>
>>>>>>                 This would mean implementing, in macro assembly, the
>>>>>>                 necessary logic for
>>>>>>                 preserving all the live registers, including, but 
>>>>>> not
>>>>>>                 limited to,
>>>>>>                 floating point registers, flag registers, etc. It
>>>>>>                 quickly becomes
>>>>>>                 obvious that this platform specific and error prone.
>>>>>>
>>>>>>                 * Leverage the fact that the operating system 
>>>>>> already
>>>>>>                 does this as part
>>>>>>                 of the signal handling
>>>>>>
>>>>>>                 Do the necessary work in the signal handler instead,
>>>>>>                 removing the need
>>>>>>                 for the stub alltogether
>>>>>>
>>>>>>                 As mentioned, on some platforms the latter model is
>>>>>>                 already in use. It
>>>>>>                 is dramatically easier and all platforms should be
>>>>>>                 updated to do it the
>>>>>>                 same way.
>>>>>>
>>>>>>
>>>>>>                 * Testing
>>>>>>
>>>>>>                 Just as mentioned in the RFR for JDK-8153890, a new
>>>>>>                 test was developed
>>>>>>                 to test this code path:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java 
>>>>>>
>>>>>>
>>>>>> <http://cr.openjdk.java.net/%7Emikael/webrevs/8150921/MappedTruncated.java> 
>>>>>>
>>>>>>
>>>>>>
>>>>>>                 In fact, it was when running this test I found the
>>>>>>                 register preservation
>>>>>>                 issue. JPRT also passes. Much like JDK-8153890 I
>>>>>>                 wanted to get some
>>>>>>                 feedback on this before running additional tests.
>>>>>>
>>>>>>
>>>>>>                 Cheers,
>>>>>>                 Mikael
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>


From david.holmes at oracle.com  Wed May  4 05:55:29 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 4 May 2016 15:55:29 +1000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<571A1FA3.9030006@oracle.com>
	<201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com>
Message-ID: <1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com>

Hi Hiroshi,

Sorry for the delay on getting back to this.

On 25/04/2016 5:09 PM, Hiroshi H Horii wrote:
> Hi David,
>
> Thank you for your comments and questions.
>
>> 1. Are the current cmpxchg semantics exactly the same as
>> memory_order_seq_cst?
>
> This is very good question..
>
> I guess, cmpxchg needs a more conservative constraint for memory ordering
> than C++11, to add sync after a compare-and-exchange operation.
>
> Could someone give comments or thoughts?

I don't want to comment on the comparison with C++11. What I would 
prefer to see is an additional memory_order value (such as 
memory_order_ignored) which is the default for all methods declared to 
take a memory_order parameter. That way existing implementations are 
clearly ignoring the memory_order attribute and there is no potential 
for confusion as to whether the existing implementations equate to 
memory_order_seq_cst or not.

That said, I'm not sure it makes sense to add the memory_order parameter 
to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark, 
oopDesc::cas_forward_to, unless those methods can sensibly be called 
with any value for memory_order - which seems highly unlikely. Perhaps 
those methods should identify the weakest form of memory_order they 
support and that should be hard-wired into them?

Thanks,
David

> memory_order_seq_cst is defined as
>     "Any operation with this memory order is both an acquire operation and
>      a release operation, plus a single total order exists in which all
> threads
>      observe all modifications (see below) in the same order."
> (http://en.cppreference.com/w/cpp/atomic/memory_order)
>
> In my environment, g++ and xlc generate following assemblies on ppc64le.
> (interestingly, they generates the same assemblies for any memory_order)
>
> g++ (4.9.2)
>     100008a4:   ac 04 00 7c     sync
>     100008a8:   28 50 20 7d     lwarx   r9,0,r10
>     100008ac:   00 18 09 7c     cmpw    r9,r3
>     100008b0:   0c 00 c2 40     bne-    100008bc
>     100008b4:   2d 51 80 7c     stwcx.  r4,0,r10
>     100008b8:   f0 ff c2 40     bne-    100008a8
>     100008bc:   2c 01 00 4c     isync
>
> xlc (13.1.3)
>     10000888:   ac 04 00 7c     sync
>     1000088c:   28 28 c0 7c     lwarx   r6,0,r5
>     10000890:   40 00 26 7c     cmpld   r6,r0
>     10000894:   0c 00 82 40     bne     100008a0
>     10000898:   2d 29 80 7c     stwcx.  r4,0,r5
>     1000089c:   f0 ff e2 40     bne+    1000088c
>     100008a0:   2c 01 00 4c     isync
>
> On the other hand, the current OpenJDK generates following assemblies.
>
>     508:   ac 04 00 7c     sync
>     50c:   00 00 5c e9     ld      r10,0(r28)
>     510:   00 50 3b 7c     cmpd    r27,r10
>     514:   1c 00 c2 40     bne-    530
>     518:   a8 40 5c 7d     ldarx   r10,r28,r8
>     51c:   00 50 3b 7c     cmpd    r27,r10
>     520:   10 00 c2 40     bne-    530
>     524:   ad 41 3c 7d     stdcx.  r9,r28,r8
>     528:   f0 ff c2 40     bne-    518
>     52c:   ac 04 00 7c     sync
>     530:   00 50 bb 7f     ...
>
> Though we can ignore 50c-514 (because they are a duplicated guard
> condition),
> the last sync instruction (52c) makes cmpxchg more strict than
> memory_order_seq_cst.
>
> In some cases, the last sync is necessary when this thread must be able
> to read
> all of the changes in the other threads while executing from 508 to 530
> (that processes compare-and-exchange).
>
>> 2. Has there been a discussion already, establishing that the modified
>> GC code can indeed use memory_order_relaxed? Otherwise who is
>> postulating that and based on what evidence?
>
> Volker and his colleagues have investigated the current GC codes
> according to this.
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019079.html
> However, I believe, we need comments of other GC experts to change
> the shared codes.
>
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
>
>
> David Holmes <david.holmes at oracle.com> wrote on 04/22/2016 21:57:07:
>
>> From: David Holmes <david.holmes at oracle.com>
>> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
>> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net
>> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>,
> ppc-aix-port-dev at openjdk.java.net
>> Date: 04/22/2016 21:58
>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>> copy_to_survivor for ppc64
>>
>> Hi Hiroshi,
>>
>> Two initial questions:
>>
>> 1. Are the current cmpxchg semantics exactly the same as
>> memory_order_seq_cst?
>>
>> 2. Has there been a discussion already, establishing that the modified
>> GC code can indeed use memory_order_relaxed? Otherwise who is
>> postulating that and based on what evidence?
>>
>> Missing memory barriers have caused very difficult to track down bugs in
>> the past - very rare race conditions. So any relaxation here has to be
>> done with extreme confidence.
>>
>> Thanks,
>> David
>>
>> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
>> > Dear all:
>> >
>> > Can I please request reviews for the following change?
>> >
>> > Code change:
>> > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
>> > (I initially created and Martin enhanced so much)
>> >
>> > This change follows the discussion started from this mail.
>> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>> April/018960.html
>> >
>> > Description:
>> > This change provides relaxed compare-and-exchange by introducing
>> > similar semantics of C++ atomic memory operators, enum memory_order.
>> > As described in atomic_linux_ppc.inline.hpp, the current
> implementation of
>> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
>> > general purposes because twice calls of sync before and after
> cmpxchg will
>> > provide strict consistency. However, they sometimes cause overheads
>> > because
>> > sync instructions are very expensive in the current POWER chip design.
>> > In addition, for the other platforms, such as aarch64, this strict
>> > semantics
>> > may cause some overheads (according to the Andrew's mail).
>> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>> April/019073.html
>> >
>> > With this change, callers can explicitly specify constraints of memory
>> > ordering
>> > for cmpxchg with an additional parameter, memory_order order.
>> >
>> > typedef enum memory_order {
>> >    memory_order_relaxed,
>> >    memory_order_consume,
>> >    memory_order_acquire,
>> >    memory_order_release,
>> >    memory_order_acq_rel,
>> >    memory_order_seq_cst
>> > } memory_order;
>> >
>> > Because the default value of the parameter is memory_order_seq_cst,
>> > existing codes can use the same semantics of cmpxchg without any
>> > modification. The relaxed cmpxchg is implemented only on ppc
>> > in this changeset. Therefore, the behavior on the other platforms will
>> > not be changed with this changeset.
>> >
>> > In addition, with the new parameter of cmpxchg, this change improves
>> > performance of copy_to_survivor in the parallel GC.
>> > copy_to_survivor changes forward pointers by using cmpxchg. This
>> > operation doesn't require any sync instructions.  A pointer is changed
>> > at most once in a GC and when cmpxchg fails, the latest pointer is
>> > available for the caller. cas_set_mark and cas_forward_to are extended
>> > with an additional memory_order parameter as cmpxchg and
> copy_to_survivor
>> > uses memory_order_relaxed to modify the forward pointers.
>> >
>> > Summary of source code changes:
>> >
>> > * src/share/vm/runtime/atomic.hpp
>> >       - Defines enum memory_order and adds a parameter to cmpxchg.
>> >
>> > * src/share/vm/runtime/atomic.cpp
>> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
>> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
>> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
>> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
>> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
>> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
>> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
>> >       - Added a parameter for each cmpxchg function to follow
>> >          the change of atomic.hpp. Their implementations are not
> changed.
>> >
>> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
>> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>> >       - Added a parameter for each cmpxchg function to follow
>> >          the change of atomic.hpp. In addition, implementations
>> >          are changed corresponding to the specified memory_order.
>> >
>> > * src/share/vm/oops/oop.hpp
>> > * src/share/vm/oops/oop.inline.hpp
>> >       - Add a memory_order parameter to use relaxed cmpxchg in
>> >          cas_set_mark and cas_forward_to.
>> >
>> > * src/share/vm/gc/parallel/psPromotionManager.cpp
>> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
>> >
>> > Martin tested this changeset  on linuxx86_64, linuxppc64le and
>> > darwinintel64.
>> > Though more time is needed to test on the other platform, we would
> like to
>> > ask
>> > reviews and start discussion on this changeset.
>> > I also tested this changeset with SPECjbb2013 and confirmed that gc
> pause
>> > time
>> > is reduced.
>> >
>> > Regards,
>> > Hiroshi
>> > -----------------------
>> > Hiroshi Horii, Ph.D.
>> > IBM Research - Tokyo
>> >
>> >
>>
>

From HORII at jp.ibm.com  Fri May  6 10:11:24 2016
From: HORII at jp.ibm.com (Hiroshi H Horii)
Date: Fri, 6 May 2016 19:11:24 +0900
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<571A1FA3.9030006@oracle.com>
	<201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com>
	<1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com>
Message-ID: <201605061011.u46ABbAa024898@d19av08.sagamino.japan.ibm.com>

Hi David,

Thank you for your comments.

As Martin suggested me, I would like to separate this proposal to
  - relaxing memory order of cmpxchg
  - improvement of copy_to_survivior with relaxed cmpxchg
and discuss the former first.

Martin thankfully created a new webrev that include a change of cmpxchg.
http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/
He has already tested it with AIX, linuxx86_64, linuxppc64le and 
darwinintel64.
(Please tell me if I need to send a new mail for this PFR)

> What I would prefer to see is an additional memory_order value (such as 
> memory_order_ignored) which is the default for all methods declared to 
> take a memory_order parameter. 

We added simple enum to specify memory order in atomic.hpp as follows.

typedef enum cmpxchg_cmpxchg_memory_order {
  memory_order_relaxed,
  memory_order_conservative
} cmpxchg_memory_order;

All of cmpxchg functions have an argument of cmpxchg_memory_order
with a default value memory_order_conservative that uses the same 
semantics with the existing cmpxchg and requires no change for the 
existing
callers. If you think "memory_order_ignored" is better than 
"memory_order_conservative", I will be happy to modify this change. 
(I just thought, "ignored" may resemble "relaxed" and may make 
people who are familiar with C++11's memory semantics confused.
I would like to know thoughts of native speakers.)

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


David Holmes <david.holmes at oracle.com> wrote on 05/04/2016 14:55:29:

> From: David Holmes <david.holmes at oracle.com>
> To: Hiroshi H Horii/Japan/IBM at IBMJP
> Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime-
> dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison
> <Tim_Ellison at uk.ibm.com>, Volker Simonis <volker.simonis at gmail.com>,
> "Doerr, Martin" <martin.doerr at sap.com>, "Lindenmaier, Goetz" 
> <goetz.lindenmaier at sap.com>
> Date: 05/04/2016 14:57
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and 
> copy_to_survivor for ppc64
> 
> Hi Hiroshi,
> 
> Sorry for the delay on getting back to this.
> 
> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote:
> > Hi David,
> >
> > Thank you for your comments and questions.
> >
> >> 1. Are the current cmpxchg semantics exactly the same as
> >> memory_order_seq_cst?
> >
> > This is very good question..
> >
> > I guess, cmpxchg needs a more conservative constraint for memory 
ordering
> > than C++11, to add sync after a compare-and-exchange operation.
> >
> > Could someone give comments or thoughts?
> 
> I don't want to comment on the comparison with C++11. What I would 
> prefer to see is an additional memory_order value (such as 
> memory_order_ignored) which is the default for all methods declared to 
> take a memory_order parameter. That way existing implementations are 
> clearly ignoring the memory_order attribute and there is no potential 
> for confusion as to whether the existing implementations equate to 
> memory_order_seq_cst or not.
> 
> That said, I'm not sure it makes sense to add the memory_order parameter 

> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark, 
> oopDesc::cas_forward_to, unless those methods can sensibly be called 
> with any value for memory_order - which seems highly unlikely. Perhaps 
> those methods should identify the weakest form of memory_order they 
> support and that should be hard-wired into them?
> 
> Thanks,
> David
> 
> > memory_order_seq_cst is defined as
> >     "Any operation with this memory order is both an acquire operation 
and
> >      a release operation, plus a single total order exists in which 
all
> > threads
> >      observe all modifications (see below) in the same order."
> > (http://en.cppreference.com/w/cpp/atomic/memory_order)
> >
> > In my environment, g++ and xlc generate following assemblies on 
ppc64le.
> > (interestingly, they generates the same assemblies for any 
memory_order)
> >
> > g++ (4.9.2)
> >     100008a4:   ac 04 00 7c     sync
> >     100008a8:   28 50 20 7d     lwarx   r9,0,r10
> >     100008ac:   00 18 09 7c     cmpw    r9,r3
> >     100008b0:   0c 00 c2 40     bne-    100008bc
> >     100008b4:   2d 51 80 7c     stwcx.  r4,0,r10
> >     100008b8:   f0 ff c2 40     bne-    100008a8
> >     100008bc:   2c 01 00 4c     isync
> >
> > xlc (13.1.3)
> >     10000888:   ac 04 00 7c     sync
> >     1000088c:   28 28 c0 7c     lwarx   r6,0,r5
> >     10000890:   40 00 26 7c     cmpld   r6,r0
> >     10000894:   0c 00 82 40     bne     100008a0
> >     10000898:   2d 29 80 7c     stwcx.  r4,0,r5
> >     1000089c:   f0 ff e2 40     bne+    1000088c
> >     100008a0:   2c 01 00 4c     isync
> >
> > On the other hand, the current OpenJDK generates following assemblies.
> >
> >     508:   ac 04 00 7c     sync
> >     50c:   00 00 5c e9     ld      r10,0(r28)
> >     510:   00 50 3b 7c     cmpd    r27,r10
> >     514:   1c 00 c2 40     bne-    530
> >     518:   a8 40 5c 7d     ldarx   r10,r28,r8
> >     51c:   00 50 3b 7c     cmpd    r27,r10
> >     520:   10 00 c2 40     bne-    530
> >     524:   ad 41 3c 7d     stdcx.  r9,r28,r8
> >     528:   f0 ff c2 40     bne-    518
> >     52c:   ac 04 00 7c     sync
> >     530:   00 50 bb 7f     ...
> >
> > Though we can ignore 50c-514 (because they are a duplicated guard
> > condition),
> > the last sync instruction (52c) makes cmpxchg more strict than
> > memory_order_seq_cst.
> >
> > In some cases, the last sync is necessary when this thread must be 
able
> > to read
> > all of the changes in the other threads while executing from 508 to 
530
> > (that processes compare-and-exchange).
> >
> >> 2. Has there been a discussion already, establishing that the 
modified
> >> GC code can indeed use memory_order_relaxed? Otherwise who is
> >> postulating that and based on what evidence?
> >
> > Volker and his colleagues have investigated the current GC codes
> > according to this.
> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> April/019079.html
> > However, I believe, we need comments of other GC experts to change
> > the shared codes.
> >
> > Regards,
> > Hiroshi
> > -----------------------
> > Hiroshi Horii, Ph.D.
> > IBM Research - Tokyo
> >
> >
> > David Holmes <david.holmes at oracle.com> wrote on 04/22/2016 21:57:07:
> >
> >> From: David Holmes <david.holmes at oracle.com>
> >> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
> >> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net
> >> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>,
> > ppc-aix-port-dev at openjdk.java.net
> >> Date: 04/22/2016 21:58
> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> >> copy_to_survivor for ppc64
> >>
> >> Hi Hiroshi,
> >>
> >> Two initial questions:
> >>
> >> 1. Are the current cmpxchg semantics exactly the same as
> >> memory_order_seq_cst?
> >>
> >> 2. Has there been a discussion already, establishing that the 
modified
> >> GC code can indeed use memory_order_relaxed? Otherwise who is
> >> postulating that and based on what evidence?
> >>
> >> Missing memory barriers have caused very difficult to track down bugs 
in
> >> the past - very rare race conditions. So any relaxation here has to 
be
> >> done with extreme confidence.
> >>
> >> Thanks,
> >> David
> >>
> >> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
> >> > Dear all:
> >> >
> >> > Can I please request reviews for the following change?
> >> >
> >> > Code change:
> >> > 
http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
> >> > (I initially created and Martin enhanced so much)
> >> >
> >> > This change follows the discussion started from this mail.
> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> >> April/018960.html
> >> >
> >> > Description:
> >> > This change provides relaxed compare-and-exchange by introducing
> >> > similar semantics of C++ atomic memory operators, enum 
memory_order.
> >> > As described in atomic_linux_ppc.inline.hpp, the current
> > implementation of
> >> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
> >> > general purposes because twice calls of sync before and after
> > cmpxchg will
> >> > provide strict consistency. However, they sometimes cause overheads
> >> > because
> >> > sync instructions are very expensive in the current POWER chip 
design.
> >> > In addition, for the other platforms, such as aarch64, this strict
> >> > semantics
> >> > may cause some overheads (according to the Andrew's mail).
> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> >> April/019073.html
> >> >
> >> > With this change, callers can explicitly specify constraints of 
memory
> >> > ordering
> >> > for cmpxchg with an additional parameter, memory_order order.
> >> >
> >> > typedef enum memory_order {
> >> >    memory_order_relaxed,
> >> >    memory_order_consume,
> >> >    memory_order_acquire,
> >> >    memory_order_release,
> >> >    memory_order_acq_rel,
> >> >    memory_order_seq_cst
> >> > } memory_order;
> >> >
> >> > Because the default value of the parameter is memory_order_seq_cst,
> >> > existing codes can use the same semantics of cmpxchg without any
> >> > modification. The relaxed cmpxchg is implemented only on ppc
> >> > in this changeset. Therefore, the behavior on the other platforms 
will
> >> > not be changed with this changeset.
> >> >
> >> > In addition, with the new parameter of cmpxchg, this change 
improves
> >> > performance of copy_to_survivor in the parallel GC.
> >> > copy_to_survivor changes forward pointers by using cmpxchg. This
> >> > operation doesn't require any sync instructions.  A pointer is 
changed
> >> > at most once in a GC and when cmpxchg fails, the latest pointer is
> >> > available for the caller. cas_set_mark and cas_forward_to are 
extended
> >> > with an additional memory_order parameter as cmpxchg and
> > copy_to_survivor
> >> > uses memory_order_relaxed to modify the forward pointers.
> >> >
> >> > Summary of source code changes:
> >> >
> >> > * src/share/vm/runtime/atomic.hpp
> >> >       - Defines enum memory_order and adds a parameter to cmpxchg.
> >> >
> >> > * src/share/vm/runtime/atomic.cpp
> >> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
> >> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
> >> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
> >> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
> >> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
> >> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
> >> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
> >> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
> >> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
> >> >       - Added a parameter for each cmpxchg function to follow
> >> >          the change of atomic.hpp. Their implementations are not
> > changed.
> >> >
> >> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
> >> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
> >> >       - Added a parameter for each cmpxchg function to follow
> >> >          the change of atomic.hpp. In addition, implementations
> >> >          are changed corresponding to the specified memory_order.
> >> >
> >> > * src/share/vm/oops/oop.hpp
> >> > * src/share/vm/oops/oop.inline.hpp
> >> >       - Add a memory_order parameter to use relaxed cmpxchg in
> >> >          cas_set_mark and cas_forward_to.
> >> >
> >> > * src/share/vm/gc/parallel/psPromotionManager.cpp
> >> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
> >> >
> >> > Martin tested this changeset  on linuxx86_64, linuxppc64le and
> >> > darwinintel64.
> >> > Though more time is needed to test on the other platform, we would
> > like to
> >> > ask
> >> > reviews and start discussion on this changeset.
> >> > I also tested this changeset with SPECjbb2013 and confirmed that gc
> > pause
> >> > time
> >> > is reduced.
> >> >
> >> > Regards,
> >> > Hiroshi
> >> > -----------------------
> >> > Hiroshi Horii, Ph.D.
> >> > IBM Research - Tokyo
> >> >
> >> >
> >>
> >
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160506/b659abf6/attachment-0001.html>

From david.holmes at oracle.com  Tue May 10 07:34:32 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 10 May 2016 17:34:32 +1000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<571A1FA3.9030006@oracle.com>
	<201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com>
	<1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com>
	<201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com>
Message-ID: <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com>

Hi Hiroshi,

On 6/05/2016 8:11 PM, Hiroshi H Horii wrote:
> Hi David,
>
> Thank you for your comments.
>
> As Martin suggested me, I would like to separate this proposal to
>   - relaxing memory order of cmpxchg
>   - improvement of copy_to_survivior with relaxed cmpxchg
> and discuss the former first.
>
> Martin thankfully created a new webrev that include a change of cmpxchg.
> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/
> He has already tested it with AIX, linuxx86_64, linuxppc64le and
> darwinintel64.
> (Please tell me if I need to send a new mail for this PFR)

Please do as it will be simpler to track that way.

>> What I would prefer to see is an additional memory_order value (such as
>> memory_order_ignored) which is the default for all methods declared to
>> take a memory_order parameter.
>
> We added simple enum to specify memory order in atomic.hpp as follows.
>
> typedef enum cmpxchg_cmpxchg_memory_order {
>   memory_order_relaxed,
>   memory_order_conservative
> } cmpxchg_memory_order;
>
> All of cmpxchg functions have an argument of cmpxchg_memory_order
> with a default value memory_order_conservative that uses the same
> semantics with the existing cmpxchg and requires no change for the existing
> callers. If you think "memory_order_ignored" is better than
> "memory_order_conservative", I will be happy to modify this change.
> (I just thought, "ignored" may resemble "relaxed" and may make
> people who are familiar with C++11's memory semantics confused.
> I would like to know thoughts of native speakers.)

That is fine by me. I don't think "ignored" would be confused with 
"relaxed", but "conservative" is fine.

I will run the patch through our internal build system while you prepare 
the updated RFR. My only concern is "unused argument" warnings from the 
compiler. :)

We are quickly running into a hard deadline with Feature Complete 
however - possibly less than 24 hours - for hotspot changes. If this 
doesn't get in in time I will see if I can shepherd it through the 
approval process.

Thanks,
David


> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
>
>
> David Holmes <david.holmes at oracle.com> wrote on 05/04/2016 14:55:29:
>
>> From: David Holmes <david.holmes at oracle.com>
>> To: Hiroshi H Horii/Japan/IBM at IBMJP
>> Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime-
>> dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison
>> <Tim_Ellison at uk.ibm.com>, Volker Simonis <volker.simonis at gmail.com>,
>> "Doerr, Martin" <martin.doerr at sap.com>, "Lindenmaier, Goetz"
>> <goetz.lindenmaier at sap.com>
>> Date: 05/04/2016 14:57
>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>> copy_to_survivor for ppc64
>>
>> Hi Hiroshi,
>>
>> Sorry for the delay on getting back to this.
>>
>> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote:
>> > Hi David,
>> >
>> > Thank you for your comments and questions.
>> >
>> >> 1. Are the current cmpxchg semantics exactly the same as
>> >> memory_order_seq_cst?
>> >
>> > This is very good question..
>> >
>> > I guess, cmpxchg needs a more conservative constraint for memory
> ordering
>> > than C++11, to add sync after a compare-and-exchange operation.
>> >
>> > Could someone give comments or thoughts?
>>
>> I don't want to comment on the comparison with C++11. What I would
>> prefer to see is an additional memory_order value (such as
>> memory_order_ignored) which is the default for all methods declared to
>> take a memory_order parameter. That way existing implementations are
>> clearly ignoring the memory_order attribute and there is no potential
>> for confusion as to whether the existing implementations equate to
>> memory_order_seq_cst or not.
>>
>> That said, I'm not sure it makes sense to add the memory_order parameter
>> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark,
>> oopDesc::cas_forward_to, unless those methods can sensibly be called
>> with any value for memory_order - which seems highly unlikely. Perhaps
>> those methods should identify the weakest form of memory_order they
>> support and that should be hard-wired into them?
>>
>> Thanks,
>> David
>>
>> > memory_order_seq_cst is defined as
>> >     "Any operation with this memory order is both an acquire
> operation and
>> >      a release operation, plus a single total order exists in which all
>> > threads
>> >      observe all modifications (see below) in the same order."
>> > (http://en.cppreference.com/w/cpp/atomic/memory_order)
>> >
>> > In my environment, g++ and xlc generate following assemblies on ppc64le.
>> > (interestingly, they generates the same assemblies for any memory_order)
>> >
>> > g++ (4.9.2)
>> >     100008a4:   ac 04 00 7c     sync
>> >     100008a8:   28 50 20 7d     lwarx   r9,0,r10
>> >     100008ac:   00 18 09 7c     cmpw    r9,r3
>> >     100008b0:   0c 00 c2 40     bne-    100008bc
>> >     100008b4:   2d 51 80 7c     stwcx.  r4,0,r10
>> >     100008b8:   f0 ff c2 40     bne-    100008a8
>> >     100008bc:   2c 01 00 4c     isync
>> >
>> > xlc (13.1.3)
>> >     10000888:   ac 04 00 7c     sync
>> >     1000088c:   28 28 c0 7c     lwarx   r6,0,r5
>> >     10000890:   40 00 26 7c     cmpld   r6,r0
>> >     10000894:   0c 00 82 40     bne     100008a0
>> >     10000898:   2d 29 80 7c     stwcx.  r4,0,r5
>> >     1000089c:   f0 ff e2 40     bne+    1000088c
>> >     100008a0:   2c 01 00 4c     isync
>> >
>> > On the other hand, the current OpenJDK generates following assemblies.
>> >
>> >     508:   ac 04 00 7c     sync
>> >     50c:   00 00 5c e9     ld      r10,0(r28)
>> >     510:   00 50 3b 7c     cmpd    r27,r10
>> >     514:   1c 00 c2 40     bne-    530
>> >     518:   a8 40 5c 7d     ldarx   r10,r28,r8
>> >     51c:   00 50 3b 7c     cmpd    r27,r10
>> >     520:   10 00 c2 40     bne-    530
>> >     524:   ad 41 3c 7d     stdcx.  r9,r28,r8
>> >     528:   f0 ff c2 40     bne-    518
>> >     52c:   ac 04 00 7c     sync
>> >     530:   00 50 bb 7f     ...
>> >
>> > Though we can ignore 50c-514 (because they are a duplicated guard
>> > condition),
>> > the last sync instruction (52c) makes cmpxchg more strict than
>> > memory_order_seq_cst.
>> >
>> > In some cases, the last sync is necessary when this thread must be able
>> > to read
>> > all of the changes in the other threads while executing from 508 to 530
>> > (that processes compare-and-exchange).
>> >
>> >> 2. Has there been a discussion already, establishing that the modified
>> >> GC code can indeed use memory_order_relaxed? Otherwise who is
>> >> postulating that and based on what evidence?
>> >
>> > Volker and his colleagues have investigated the current GC codes
>> > according to this.
>> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>> April/019079.html
>> > However, I believe, we need comments of other GC experts to change
>> > the shared codes.
>> >
>> > Regards,
>> > Hiroshi
>> > -----------------------
>> > Hiroshi Horii, Ph.D.
>> > IBM Research - Tokyo
>> >
>> >
>> > David Holmes <david.holmes at oracle.com> wrote on 04/22/2016 21:57:07:
>> >
>> >> From: David Holmes <david.holmes at oracle.com>
>> >> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
>> >> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net
>> >> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>,
>> > ppc-aix-port-dev at openjdk.java.net
>> >> Date: 04/22/2016 21:58
>> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>> >> copy_to_survivor for ppc64
>> >>
>> >> Hi Hiroshi,
>> >>
>> >> Two initial questions:
>> >>
>> >> 1. Are the current cmpxchg semantics exactly the same as
>> >> memory_order_seq_cst?
>> >>
>> >> 2. Has there been a discussion already, establishing that the modified
>> >> GC code can indeed use memory_order_relaxed? Otherwise who is
>> >> postulating that and based on what evidence?
>> >>
>> >> Missing memory barriers have caused very difficult to track down
> bugs in
>> >> the past - very rare race conditions. So any relaxation here has to be
>> >> done with extreme confidence.
>> >>
>> >> Thanks,
>> >> David
>> >>
>> >> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
>> >> > Dear all:
>> >> >
>> >> > Can I please request reviews for the following change?
>> >> >
>> >> > Code change:
>> >> >
> http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
>> >> > (I initially created and Martin enhanced so much)
>> >> >
>> >> > This change follows the discussion started from this mail.
>> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>> >> April/018960.html
>> >> >
>> >> > Description:
>> >> > This change provides relaxed compare-and-exchange by introducing
>> >> > similar semantics of C++ atomic memory operators, enum memory_order.
>> >> > As described in atomic_linux_ppc.inline.hpp, the current
>> > implementation of
>> >> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
>> >> > general purposes because twice calls of sync before and after
>> > cmpxchg will
>> >> > provide strict consistency. However, they sometimes cause overheads
>> >> > because
>> >> > sync instructions are very expensive in the current POWER chip
> design.
>> >> > In addition, for the other platforms, such as aarch64, this strict
>> >> > semantics
>> >> > may cause some overheads (according to the Andrew's mail).
>> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>> >> April/019073.html
>> >> >
>> >> > With this change, callers can explicitly specify constraints of
> memory
>> >> > ordering
>> >> > for cmpxchg with an additional parameter, memory_order order.
>> >> >
>> >> > typedef enum memory_order {
>> >> >    memory_order_relaxed,
>> >> >    memory_order_consume,
>> >> >    memory_order_acquire,
>> >> >    memory_order_release,
>> >> >    memory_order_acq_rel,
>> >> >    memory_order_seq_cst
>> >> > } memory_order;
>> >> >
>> >> > Because the default value of the parameter is memory_order_seq_cst,
>> >> > existing codes can use the same semantics of cmpxchg without any
>> >> > modification. The relaxed cmpxchg is implemented only on ppc
>> >> > in this changeset. Therefore, the behavior on the other platforms
> will
>> >> > not be changed with this changeset.
>> >> >
>> >> > In addition, with the new parameter of cmpxchg, this change improves
>> >> > performance of copy_to_survivor in the parallel GC.
>> >> > copy_to_survivor changes forward pointers by using cmpxchg. This
>> >> > operation doesn't require any sync instructions.  A pointer is
> changed
>> >> > at most once in a GC and when cmpxchg fails, the latest pointer is
>> >> > available for the caller. cas_set_mark and cas_forward_to are
> extended
>> >> > with an additional memory_order parameter as cmpxchg and
>> > copy_to_survivor
>> >> > uses memory_order_relaxed to modify the forward pointers.
>> >> >
>> >> > Summary of source code changes:
>> >> >
>> >> > * src/share/vm/runtime/atomic.hpp
>> >> >       - Defines enum memory_order and adds a parameter to cmpxchg.
>> >> >
>> >> > * src/share/vm/runtime/atomic.cpp
>> >> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
>> >> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>> >> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>> >> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
>> >> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
>> >> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
>> >> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
>> >> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
>> >> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
>> >> >       - Added a parameter for each cmpxchg function to follow
>> >> >          the change of atomic.hpp. Their implementations are not
>> > changed.
>> >> >
>> >> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
>> >> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>> >> >       - Added a parameter for each cmpxchg function to follow
>> >> >          the change of atomic.hpp. In addition, implementations
>> >> >          are changed corresponding to the specified memory_order.
>> >> >
>> >> > * src/share/vm/oops/oop.hpp
>> >> > * src/share/vm/oops/oop.inline.hpp
>> >> >       - Add a memory_order parameter to use relaxed cmpxchg in
>> >> >          cas_set_mark and cas_forward_to.
>> >> >
>> >> > * src/share/vm/gc/parallel/psPromotionManager.cpp
>> >> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
>> >> >
>> >> > Martin tested this changeset  on linuxx86_64, linuxppc64le and
>> >> > darwinintel64.
>> >> > Though more time is needed to test on the other platform, we would
>> > like to
>> >> > ask
>> >> > reviews and start discussion on this changeset.
>> >> > I also tested this changeset with SPECjbb2013 and confirmed that gc
>> > pause
>> >> > time
>> >> > is reduced.
>> >> >
>> >> > Regards,
>> >> > Hiroshi
>> >> > -----------------------
>> >> > Hiroshi Horii, Ph.D.
>> >> > IBM Research - Tokyo
>> >> >
>> >> >
>> >>
>> >
>>
>

From david.holmes at oracle.com  Tue May 10 09:11:20 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 10 May 2016 19:11:20 +1000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<571A1FA3.9030006@oracle.com>
	<201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com>
	<1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com>
	<201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com>
	<848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com>
Message-ID: <347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com>

The fix seems incomplete for solaris:

make/Main.gmk:232: recipe for target 'hotspot' failed
"/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp", 
line 124: Error: Too many arguments in call to 
"_Atomic_cmpxchg_long(long, volatile long*, long)".
"/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp", 
line 128: Error: Too many arguments in call to 
"_Atomic_cmpxchg_long(long, volatile long*, long)".

David

On 10/05/2016 5:34 PM, David Holmes wrote:
> Hi Hiroshi,
>
> On 6/05/2016 8:11 PM, Hiroshi H Horii wrote:
>> Hi David,
>>
>> Thank you for your comments.
>>
>> As Martin suggested me, I would like to separate this proposal to
>>   - relaxing memory order of cmpxchg
>>   - improvement of copy_to_survivior with relaxed cmpxchg
>> and discuss the former first.
>>
>> Martin thankfully created a new webrev that include a change of cmpxchg.
>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/
>> He has already tested it with AIX, linuxx86_64, linuxppc64le and
>> darwinintel64.
>> (Please tell me if I need to send a new mail for this PFR)
>
> Please do as it will be simpler to track that way.
>
>>> What I would prefer to see is an additional memory_order value (such as
>>> memory_order_ignored) which is the default for all methods declared to
>>> take a memory_order parameter.
>>
>> We added simple enum to specify memory order in atomic.hpp as follows.
>>
>> typedef enum cmpxchg_cmpxchg_memory_order {
>>   memory_order_relaxed,
>>   memory_order_conservative
>> } cmpxchg_memory_order;
>>
>> All of cmpxchg functions have an argument of cmpxchg_memory_order
>> with a default value memory_order_conservative that uses the same
>> semantics with the existing cmpxchg and requires no change for the
>> existing
>> callers. If you think "memory_order_ignored" is better than
>> "memory_order_conservative", I will be happy to modify this change.
>> (I just thought, "ignored" may resemble "relaxed" and may make
>> people who are familiar with C++11's memory semantics confused.
>> I would like to know thoughts of native speakers.)
>
> That is fine by me. I don't think "ignored" would be confused with
> "relaxed", but "conservative" is fine.
>
> I will run the patch through our internal build system while you prepare
> the updated RFR. My only concern is "unused argument" warnings from the
> compiler. :)
>
> We are quickly running into a hard deadline with Feature Complete
> however - possibly less than 24 hours - for hotspot changes. If this
> doesn't get in in time I will see if I can shepherd it through the
> approval process.
>
> Thanks,
> David
>
>
>> Regards,
>> Hiroshi
>> -----------------------
>> Hiroshi Horii, Ph.D.
>> IBM Research - Tokyo
>>
>>
>> David Holmes <david.holmes at oracle.com> wrote on 05/04/2016 14:55:29:
>>
>>> From: David Holmes <david.holmes at oracle.com>
>>> To: Hiroshi H Horii/Japan/IBM at IBMJP
>>> Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime-
>>> dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison
>>> <Tim_Ellison at uk.ibm.com>, Volker Simonis <volker.simonis at gmail.com>,
>>> "Doerr, Martin" <martin.doerr at sap.com>, "Lindenmaier, Goetz"
>>> <goetz.lindenmaier at sap.com>
>>> Date: 05/04/2016 14:57
>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>>> copy_to_survivor for ppc64
>>>
>>> Hi Hiroshi,
>>>
>>> Sorry for the delay on getting back to this.
>>>
>>> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote:
>>> > Hi David,
>>> >
>>> > Thank you for your comments and questions.
>>> >
>>> >> 1. Are the current cmpxchg semantics exactly the same as
>>> >> memory_order_seq_cst?
>>> >
>>> > This is very good question..
>>> >
>>> > I guess, cmpxchg needs a more conservative constraint for memory
>> ordering
>>> > than C++11, to add sync after a compare-and-exchange operation.
>>> >
>>> > Could someone give comments or thoughts?
>>>
>>> I don't want to comment on the comparison with C++11. What I would
>>> prefer to see is an additional memory_order value (such as
>>> memory_order_ignored) which is the default for all methods declared to
>>> take a memory_order parameter. That way existing implementations are
>>> clearly ignoring the memory_order attribute and there is no potential
>>> for confusion as to whether the existing implementations equate to
>>> memory_order_seq_cst or not.
>>>
>>> That said, I'm not sure it makes sense to add the memory_order parameter
>>> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark,
>>> oopDesc::cas_forward_to, unless those methods can sensibly be called
>>> with any value for memory_order - which seems highly unlikely. Perhaps
>>> those methods should identify the weakest form of memory_order they
>>> support and that should be hard-wired into them?
>>>
>>> Thanks,
>>> David
>>>
>>> > memory_order_seq_cst is defined as
>>> >     "Any operation with this memory order is both an acquire
>> operation and
>>> >      a release operation, plus a single total order exists in which
>>> all
>>> > threads
>>> >      observe all modifications (see below) in the same order."
>>> > (http://en.cppreference.com/w/cpp/atomic/memory_order)
>>> >
>>> > In my environment, g++ and xlc generate following assemblies on
>>> ppc64le.
>>> > (interestingly, they generates the same assemblies for any
>>> memory_order)
>>> >
>>> > g++ (4.9.2)
>>> >     100008a4:   ac 04 00 7c     sync
>>> >     100008a8:   28 50 20 7d     lwarx   r9,0,r10
>>> >     100008ac:   00 18 09 7c     cmpw    r9,r3
>>> >     100008b0:   0c 00 c2 40     bne-    100008bc
>>> >     100008b4:   2d 51 80 7c     stwcx.  r4,0,r10
>>> >     100008b8:   f0 ff c2 40     bne-    100008a8
>>> >     100008bc:   2c 01 00 4c     isync
>>> >
>>> > xlc (13.1.3)
>>> >     10000888:   ac 04 00 7c     sync
>>> >     1000088c:   28 28 c0 7c     lwarx   r6,0,r5
>>> >     10000890:   40 00 26 7c     cmpld   r6,r0
>>> >     10000894:   0c 00 82 40     bne     100008a0
>>> >     10000898:   2d 29 80 7c     stwcx.  r4,0,r5
>>> >     1000089c:   f0 ff e2 40     bne+    1000088c
>>> >     100008a0:   2c 01 00 4c     isync
>>> >
>>> > On the other hand, the current OpenJDK generates following assemblies.
>>> >
>>> >     508:   ac 04 00 7c     sync
>>> >     50c:   00 00 5c e9     ld      r10,0(r28)
>>> >     510:   00 50 3b 7c     cmpd    r27,r10
>>> >     514:   1c 00 c2 40     bne-    530
>>> >     518:   a8 40 5c 7d     ldarx   r10,r28,r8
>>> >     51c:   00 50 3b 7c     cmpd    r27,r10
>>> >     520:   10 00 c2 40     bne-    530
>>> >     524:   ad 41 3c 7d     stdcx.  r9,r28,r8
>>> >     528:   f0 ff c2 40     bne-    518
>>> >     52c:   ac 04 00 7c     sync
>>> >     530:   00 50 bb 7f     ...
>>> >
>>> > Though we can ignore 50c-514 (because they are a duplicated guard
>>> > condition),
>>> > the last sync instruction (52c) makes cmpxchg more strict than
>>> > memory_order_seq_cst.
>>> >
>>> > In some cases, the last sync is necessary when this thread must be
>>> able
>>> > to read
>>> > all of the changes in the other threads while executing from 508 to
>>> 530
>>> > (that processes compare-and-exchange).
>>> >
>>> >> 2. Has there been a discussion already, establishing that the
>>> modified
>>> >> GC code can indeed use memory_order_relaxed? Otherwise who is
>>> >> postulating that and based on what evidence?
>>> >
>>> > Volker and his colleagues have investigated the current GC codes
>>> > according to this.
>>> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>> April/019079.html
>>> > However, I believe, we need comments of other GC experts to change
>>> > the shared codes.
>>> >
>>> > Regards,
>>> > Hiroshi
>>> > -----------------------
>>> > Hiroshi Horii, Ph.D.
>>> > IBM Research - Tokyo
>>> >
>>> >
>>> > David Holmes <david.holmes at oracle.com> wrote on 04/22/2016 21:57:07:
>>> >
>>> >> From: David Holmes <david.holmes at oracle.com>
>>> >> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
>>> >> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net
>>> >> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>,
>>> > ppc-aix-port-dev at openjdk.java.net
>>> >> Date: 04/22/2016 21:58
>>> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>>> >> copy_to_survivor for ppc64
>>> >>
>>> >> Hi Hiroshi,
>>> >>
>>> >> Two initial questions:
>>> >>
>>> >> 1. Are the current cmpxchg semantics exactly the same as
>>> >> memory_order_seq_cst?
>>> >>
>>> >> 2. Has there been a discussion already, establishing that the
>>> modified
>>> >> GC code can indeed use memory_order_relaxed? Otherwise who is
>>> >> postulating that and based on what evidence?
>>> >>
>>> >> Missing memory barriers have caused very difficult to track down
>> bugs in
>>> >> the past - very rare race conditions. So any relaxation here has
>>> to be
>>> >> done with extreme confidence.
>>> >>
>>> >> Thanks,
>>> >> David
>>> >>
>>> >> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
>>> >> > Dear all:
>>> >> >
>>> >> > Can I please request reviews for the following change?
>>> >> >
>>> >> > Code change:
>>> >> >
>> http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
>>> >> > (I initially created and Martin enhanced so much)
>>> >> >
>>> >> > This change follows the discussion started from this mail.
>>> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>> >> April/018960.html
>>> >> >
>>> >> > Description:
>>> >> > This change provides relaxed compare-and-exchange by introducing
>>> >> > similar semantics of C++ atomic memory operators, enum
>>> memory_order.
>>> >> > As described in atomic_linux_ppc.inline.hpp, the current
>>> > implementation of
>>> >> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
>>> >> > general purposes because twice calls of sync before and after
>>> > cmpxchg will
>>> >> > provide strict consistency. However, they sometimes cause overheads
>>> >> > because
>>> >> > sync instructions are very expensive in the current POWER chip
>> design.
>>> >> > In addition, for the other platforms, such as aarch64, this strict
>>> >> > semantics
>>> >> > may cause some overheads (according to the Andrew's mail).
>>> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>> >> April/019073.html
>>> >> >
>>> >> > With this change, callers can explicitly specify constraints of
>> memory
>>> >> > ordering
>>> >> > for cmpxchg with an additional parameter, memory_order order.
>>> >> >
>>> >> > typedef enum memory_order {
>>> >> >    memory_order_relaxed,
>>> >> >    memory_order_consume,
>>> >> >    memory_order_acquire,
>>> >> >    memory_order_release,
>>> >> >    memory_order_acq_rel,
>>> >> >    memory_order_seq_cst
>>> >> > } memory_order;
>>> >> >
>>> >> > Because the default value of the parameter is memory_order_seq_cst,
>>> >> > existing codes can use the same semantics of cmpxchg without any
>>> >> > modification. The relaxed cmpxchg is implemented only on ppc
>>> >> > in this changeset. Therefore, the behavior on the other platforms
>> will
>>> >> > not be changed with this changeset.
>>> >> >
>>> >> > In addition, with the new parameter of cmpxchg, this change
>>> improves
>>> >> > performance of copy_to_survivor in the parallel GC.
>>> >> > copy_to_survivor changes forward pointers by using cmpxchg. This
>>> >> > operation doesn't require any sync instructions.  A pointer is
>> changed
>>> >> > at most once in a GC and when cmpxchg fails, the latest pointer is
>>> >> > available for the caller. cas_set_mark and cas_forward_to are
>> extended
>>> >> > with an additional memory_order parameter as cmpxchg and
>>> > copy_to_survivor
>>> >> > uses memory_order_relaxed to modify the forward pointers.
>>> >> >
>>> >> > Summary of source code changes:
>>> >> >
>>> >> > * src/share/vm/runtime/atomic.hpp
>>> >> >       - Defines enum memory_order and adds a parameter to cmpxchg.
>>> >> >
>>> >> > * src/share/vm/runtime/atomic.cpp
>>> >> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
>>> >> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>>> >> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>>> >> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
>>> >> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
>>> >> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
>>> >> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
>>> >> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
>>> >> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
>>> >> >       - Added a parameter for each cmpxchg function to follow
>>> >> >          the change of atomic.hpp. Their implementations are not
>>> > changed.
>>> >> >
>>> >> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
>>> >> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>> >> >       - Added a parameter for each cmpxchg function to follow
>>> >> >          the change of atomic.hpp. In addition, implementations
>>> >> >          are changed corresponding to the specified memory_order.
>>> >> >
>>> >> > * src/share/vm/oops/oop.hpp
>>> >> > * src/share/vm/oops/oop.inline.hpp
>>> >> >       - Add a memory_order parameter to use relaxed cmpxchg in
>>> >> >          cas_set_mark and cas_forward_to.
>>> >> >
>>> >> > * src/share/vm/gc/parallel/psPromotionManager.cpp
>>> >> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
>>> >> >
>>> >> > Martin tested this changeset  on linuxx86_64, linuxppc64le and
>>> >> > darwinintel64.
>>> >> > Though more time is needed to test on the other platform, we would
>>> > like to
>>> >> > ask
>>> >> > reviews and start discussion on this changeset.
>>> >> > I also tested this changeset with SPECjbb2013 and confirmed that gc
>>> > pause
>>> >> > time
>>> >> > is reduced.
>>> >> >
>>> >> > Regards,
>>> >> > Hiroshi
>>> >> > -----------------------
>>> >> > Hiroshi Horii, Ph.D.
>>> >> > IBM Research - Tokyo
>>> >> >
>>> >> >
>>> >>
>>> >
>>>
>>

From HORII at jp.ibm.com  Tue May 10 09:35:51 2016
From: HORII at jp.ibm.com (Hiroshi H Horii)
Date: Tue, 10 May 2016 18:35:51 +0900
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<571A1FA3.9030006@oracle.com>
	<201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com>
	<1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com>
	<201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com>
	<848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com>
Message-ID: <201605100936.u4A9a6IP008871@d19av05.sagamino.japan.ibm.com>

Hi David,

> > Martin thankfully created a new webrev that include a change of 
cmpxchg.
> > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/
> > He has already tested it with AIX, linuxx86_64, linuxppc64le and
> > darwinintel64.
> > (Please tell me if I need to send a new mail for this PFR)
> 
> Please do as it will be simpler to track that way.

I will send a new PFR for this cmxchg change.

> That is fine by me. I don't think "ignored" would be confused with 
> "relaxed", but "conservative" is fine.

Sure.

> We are quickly running into a hard deadline with Feature Complete 
> however - possibly less than 24 hours - for hotspot changes. If this 
> doesn't get in in time I will see if I can shepherd it through the 
> approval process.

Thanks.

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


David Holmes <david.holmes at oracle.com> wrote on 05/10/2016 16:34:32:

> From: David Holmes <david.holmes at oracle.com>
> To: Hiroshi H Horii/Japan/IBM at IBMJP
> Cc: "Lindenmaier, Goetz" <goetz.lindenmaier at sap.com>, hotspot-gc-
> dev at openjdk.java.net, hotspot-runtime-dev at openjdk.java.net, "Doerr, 
> Martin" <martin.doerr at sap.com>, ppc-aix-port-dev at openjdk.java.net, 
> Tim Ellison <Tim_Ellison at uk.ibm.com>, Volker Simonis 
> <volker.simonis at gmail.com>
> Date: 05/10/2016 16:35
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and 
> copy_to_survivor for ppc64
> 
> Hi Hiroshi,
> 
> On 6/05/2016 8:11 PM, Hiroshi H Horii wrote:
> > Hi David,
> >
> > Thank you for your comments.
> >
> > As Martin suggested me, I would like to separate this proposal to
> >   - relaxing memory order of cmpxchg
> >   - improvement of copy_to_survivior with relaxed cmpxchg
> > and discuss the former first.
> >
> > Martin thankfully created a new webrev that include a change of 
cmpxchg.
> > http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/
> > He has already tested it with AIX, linuxx86_64, linuxppc64le and
> > darwinintel64.
> > (Please tell me if I need to send a new mail for this PFR)
> 
> Please do as it will be simpler to track that way.
> 
> >> What I would prefer to see is an additional memory_order value (such 
as
> >> memory_order_ignored) which is the default for all methods declared 
to
> >> take a memory_order parameter.
> >
> > We added simple enum to specify memory order in atomic.hpp as follows.
> >
> > typedef enum cmpxchg_cmpxchg_memory_order {
> >   memory_order_relaxed,
> >   memory_order_conservative
> > } cmpxchg_memory_order;
> >
> > All of cmpxchg functions have an argument of cmpxchg_memory_order
> > with a default value memory_order_conservative that uses the same
> > semantics with the existing cmpxchg and requires no change for the 
existing
> > callers. If you think "memory_order_ignored" is better than
> > "memory_order_conservative", I will be happy to modify this change.
> > (I just thought, "ignored" may resemble "relaxed" and may make
> > people who are familiar with C++11's memory semantics confused.
> > I would like to know thoughts of native speakers.)
> 
> That is fine by me. I don't think "ignored" would be confused with 
> "relaxed", but "conservative" is fine.
> 
> I will run the patch through our internal build system while you prepare 

> the updated RFR. My only concern is "unused argument" warnings from the 
> compiler. :)
> 
> We are quickly running into a hard deadline with Feature Complete 
> however - possibly less than 24 hours - for hotspot changes. If this 
> doesn't get in in time I will see if I can shepherd it through the 
> approval process.
> 
> Thanks,
> David
> 
> 
> > Regards,
> > Hiroshi
> > -----------------------
> > Hiroshi Horii, Ph.D.
> > IBM Research - Tokyo
> >
> >
> > David Holmes <david.holmes at oracle.com> wrote on 05/04/2016 14:55:29:
> >
> >> From: David Holmes <david.holmes at oracle.com>
> >> To: Hiroshi H Horii/Japan/IBM at IBMJP
> >> Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime-
> >> dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison
> >> <Tim_Ellison at uk.ibm.com>, Volker Simonis <volker.simonis at gmail.com>,
> >> "Doerr, Martin" <martin.doerr at sap.com>, "Lindenmaier, Goetz"
> >> <goetz.lindenmaier at sap.com>
> >> Date: 05/04/2016 14:57
> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> >> copy_to_survivor for ppc64
> >>
> >> Hi Hiroshi,
> >>
> >> Sorry for the delay on getting back to this.
> >>
> >> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote:
> >> > Hi David,
> >> >
> >> > Thank you for your comments and questions.
> >> >
> >> >> 1. Are the current cmpxchg semantics exactly the same as
> >> >> memory_order_seq_cst?
> >> >
> >> > This is very good question..
> >> >
> >> > I guess, cmpxchg needs a more conservative constraint for memory
> > ordering
> >> > than C++11, to add sync after a compare-and-exchange operation.
> >> >
> >> > Could someone give comments or thoughts?
> >>
> >> I don't want to comment on the comparison with C++11. What I would
> >> prefer to see is an additional memory_order value (such as
> >> memory_order_ignored) which is the default for all methods declared 
to
> >> take a memory_order parameter. That way existing implementations are
> >> clearly ignoring the memory_order attribute and there is no potential
> >> for confusion as to whether the existing implementations equate to
> >> memory_order_seq_cst or not.
> >>
> >> That said, I'm not sure it makes sense to add the memory_order 
parameter
> >> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark,
> >> oopDesc::cas_forward_to, unless those methods can sensibly be called
> >> with any value for memory_order - which seems highly unlikely. 
Perhaps
> >> those methods should identify the weakest form of memory_order they
> >> support and that should be hard-wired into them?
> >>
> >> Thanks,
> >> David
> >>
> >> > memory_order_seq_cst is defined as
> >> >     "Any operation with this memory order is both an acquire
> > operation and
> >> >      a release operation, plus a single total order exists in which 
all
> >> > threads
> >> >      observe all modifications (see below) in the same order."
> >> > (http://en.cppreference.com/w/cpp/atomic/memory_order)
> >> >
> >> > In my environment, g++ and xlc generate following assemblies on 
ppc64le.
> >> > (interestingly, they generates the same assemblies for any 
memory_order)
> >> >
> >> > g++ (4.9.2)
> >> >     100008a4:   ac 04 00 7c     sync
> >> >     100008a8:   28 50 20 7d     lwarx   r9,0,r10
> >> >     100008ac:   00 18 09 7c     cmpw    r9,r3
> >> >     100008b0:   0c 00 c2 40     bne-    100008bc
> >> >     100008b4:   2d 51 80 7c     stwcx.  r4,0,r10
> >> >     100008b8:   f0 ff c2 40     bne-    100008a8
> >> >     100008bc:   2c 01 00 4c     isync
> >> >
> >> > xlc (13.1.3)
> >> >     10000888:   ac 04 00 7c     sync
> >> >     1000088c:   28 28 c0 7c     lwarx   r6,0,r5
> >> >     10000890:   40 00 26 7c     cmpld   r6,r0
> >> >     10000894:   0c 00 82 40     bne     100008a0
> >> >     10000898:   2d 29 80 7c     stwcx.  r4,0,r5
> >> >     1000089c:   f0 ff e2 40     bne+    1000088c
> >> >     100008a0:   2c 01 00 4c     isync
> >> >
> >> > On the other hand, the current OpenJDK generates following 
assemblies.
> >> >
> >> >     508:   ac 04 00 7c     sync
> >> >     50c:   00 00 5c e9     ld      r10,0(r28)
> >> >     510:   00 50 3b 7c     cmpd    r27,r10
> >> >     514:   1c 00 c2 40     bne-    530
> >> >     518:   a8 40 5c 7d     ldarx   r10,r28,r8
> >> >     51c:   00 50 3b 7c     cmpd    r27,r10
> >> >     520:   10 00 c2 40     bne-    530
> >> >     524:   ad 41 3c 7d     stdcx.  r9,r28,r8
> >> >     528:   f0 ff c2 40     bne-    518
> >> >     52c:   ac 04 00 7c     sync
> >> >     530:   00 50 bb 7f     ...
> >> >
> >> > Though we can ignore 50c-514 (because they are a duplicated guard
> >> > condition),
> >> > the last sync instruction (52c) makes cmpxchg more strict than
> >> > memory_order_seq_cst.
> >> >
> >> > In some cases, the last sync is necessary when this thread must be 
able
> >> > to read
> >> > all of the changes in the other threads while executing from 508 to 
530
> >> > (that processes compare-and-exchange).
> >> >
> >> >> 2. Has there been a discussion already, establishing that the 
modified
> >> >> GC code can indeed use memory_order_relaxed? Otherwise who is
> >> >> postulating that and based on what evidence?
> >> >
> >> > Volker and his colleagues have investigated the current GC codes
> >> > according to this.
> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> >> April/019079.html
> >> > However, I believe, we need comments of other GC experts to change
> >> > the shared codes.
> >> >
> >> > Regards,
> >> > Hiroshi
> >> > -----------------------
> >> > Hiroshi Horii, Ph.D.
> >> > IBM Research - Tokyo
> >> >
> >> >
> >> > David Holmes <david.holmes at oracle.com> wrote on 04/22/2016 
21:57:07:
> >> >
> >> >> From: David Holmes <david.holmes at oracle.com>
> >> >> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
> >> >> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net
> >> >> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>,
> >> > ppc-aix-port-dev at openjdk.java.net
> >> >> Date: 04/22/2016 21:58
> >> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> >> >> copy_to_survivor for ppc64
> >> >>
> >> >> Hi Hiroshi,
> >> >>
> >> >> Two initial questions:
> >> >>
> >> >> 1. Are the current cmpxchg semantics exactly the same as
> >> >> memory_order_seq_cst?
> >> >>
> >> >> 2. Has there been a discussion already, establishing that the 
modified
> >> >> GC code can indeed use memory_order_relaxed? Otherwise who is
> >> >> postulating that and based on what evidence?
> >> >>
> >> >> Missing memory barriers have caused very difficult to track down
> > bugs in
> >> >> the past - very rare race conditions. So any relaxation here has 
to be
> >> >> done with extreme confidence.
> >> >>
> >> >> Thanks,
> >> >> David
> >> >>
> >> >> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
> >> >> > Dear all:
> >> >> >
> >> >> > Can I please request reviews for the following change?
> >> >> >
> >> >> > Code change:
> >> >> >
> > http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
> >> >> > (I initially created and Martin enhanced so much)
> >> >> >
> >> >> > This change follows the discussion started from this mail.
> >> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> >> >> April/018960.html
> >> >> >
> >> >> > Description:
> >> >> > This change provides relaxed compare-and-exchange by introducing
> >> >> > similar semantics of C++ atomic memory operators, enum 
memory_order.
> >> >> > As described in atomic_linux_ppc.inline.hpp, the current
> >> > implementation of
> >> >> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful 
for
> >> >> > general purposes because twice calls of sync before and after
> >> > cmpxchg will
> >> >> > provide strict consistency. However, they sometimes cause 
overheads
> >> >> > because
> >> >> > sync instructions are very expensive in the current POWER chip
> > design.
> >> >> > In addition, for the other platforms, such as aarch64, this 
strict
> >> >> > semantics
> >> >> > may cause some overheads (according to the Andrew's mail).
> >> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> >> >> April/019073.html
> >> >> >
> >> >> > With this change, callers can explicitly specify constraints of
> > memory
> >> >> > ordering
> >> >> > for cmpxchg with an additional parameter, memory_order order.
> >> >> >
> >> >> > typedef enum memory_order {
> >> >> >    memory_order_relaxed,
> >> >> >    memory_order_consume,
> >> >> >    memory_order_acquire,
> >> >> >    memory_order_release,
> >> >> >    memory_order_acq_rel,
> >> >> >    memory_order_seq_cst
> >> >> > } memory_order;
> >> >> >
> >> >> > Because the default value of the parameter is 
memory_order_seq_cst,
> >> >> > existing codes can use the same semantics of cmpxchg without any
> >> >> > modification. The relaxed cmpxchg is implemented only on ppc
> >> >> > in this changeset. Therefore, the behavior on the other 
platforms
> > will
> >> >> > not be changed with this changeset.
> >> >> >
> >> >> > In addition, with the new parameter of cmpxchg, this change 
improves
> >> >> > performance of copy_to_survivor in the parallel GC.
> >> >> > copy_to_survivor changes forward pointers by using cmpxchg. This
> >> >> > operation doesn't require any sync instructions.  A pointer is
> > changed
> >> >> > at most once in a GC and when cmpxchg fails, the latest pointer 
is
> >> >> > available for the caller. cas_set_mark and cas_forward_to are
> > extended
> >> >> > with an additional memory_order parameter as cmpxchg and
> >> > copy_to_survivor
> >> >> > uses memory_order_relaxed to modify the forward pointers.
> >> >> >
> >> >> > Summary of source code changes:
> >> >> >
> >> >> > * src/share/vm/runtime/atomic.hpp
> >> >> >       - Defines enum memory_order and adds a parameter to 
cmpxchg.
> >> >> >
> >> >> > * src/share/vm/runtime/atomic.cpp
> >> >> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
> >> >> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
> >> >> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
> >> >> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
> >> >> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
> >> >> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
> >> >> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
> >> >> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
> >> >> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
> >> >> >       - Added a parameter for each cmpxchg function to follow
> >> >> >          the change of atomic.hpp. Their implementations are not
> >> > changed.
> >> >> >
> >> >> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
> >> >> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
> >> >> >       - Added a parameter for each cmpxchg function to follow
> >> >> >          the change of atomic.hpp. In addition, implementations
> >> >> >          are changed corresponding to the specified 
memory_order.
> >> >> >
> >> >> > * src/share/vm/oops/oop.hpp
> >> >> > * src/share/vm/oops/oop.inline.hpp
> >> >> >       - Add a memory_order parameter to use relaxed cmpxchg in
> >> >> >          cas_set_mark and cas_forward_to.
> >> >> >
> >> >> > * src/share/vm/gc/parallel/psPromotionManager.cpp
> >> >> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
> >> >> >
> >> >> > Martin tested this changeset  on linuxx86_64, linuxppc64le and
> >> >> > darwinintel64.
> >> >> > Though more time is needed to test on the other platform, we 
would
> >> > like to
> >> >> > ask
> >> >> > reviews and start discussion on this changeset.
> >> >> > I also tested this changeset with SPECjbb2013 and confirmed that 
gc
> >> > pause
> >> >> > time
> >> >> > is reduced.
> >> >> >
> >> >> > Regards,
> >> >> > Hiroshi
> >> >> > -----------------------
> >> >> > Hiroshi Horii, Ph.D.
> >> >> > IBM Research - Tokyo
> >> >> >
> >> >> >
> >> >>
> >> >
> >>
> >
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160510/79b9f845/attachment-0001.html>

From martin.doerr at sap.com  Tue May 10 09:41:31 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 10 May 2016 09:41:31 +0000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<571A1FA3.9030006@oracle.com>
	<201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com>
	<1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com>
	<201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com>
	<848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com>
	<347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com>
Message-ID: <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>

Hi David,

thank you very much for testing the other platforms.

Here's an updated webrev:
http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/

Best regards,
Martin

-----Original Message-----
From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of David Holmes
Sent: Dienstag, 10. Mai 2016 11:11
To: Hiroshi H Horii <HORII at jp.ibm.com>
Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

The fix seems incomplete for solaris:

make/Main.gmk:232: recipe for target 'hotspot' failed
"/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp", 
line 124: Error: Too many arguments in call to 
"_Atomic_cmpxchg_long(long, volatile long*, long)".
"/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp", 
line 128: Error: Too many arguments in call to 
"_Atomic_cmpxchg_long(long, volatile long*, long)".

David

On 10/05/2016 5:34 PM, David Holmes wrote:
> Hi Hiroshi,
>
> On 6/05/2016 8:11 PM, Hiroshi H Horii wrote:
>> Hi David,
>>
>> Thank you for your comments.
>>
>> As Martin suggested me, I would like to separate this proposal to
>>   - relaxing memory order of cmpxchg
>>   - improvement of copy_to_survivior with relaxed cmpxchg
>> and discuss the former first.
>>
>> Martin thankfully created a new webrev that include a change of cmpxchg.
>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/
>> He has already tested it with AIX, linuxx86_64, linuxppc64le and
>> darwinintel64.
>> (Please tell me if I need to send a new mail for this PFR)
>
> Please do as it will be simpler to track that way.
>
>>> What I would prefer to see is an additional memory_order value (such as
>>> memory_order_ignored) which is the default for all methods declared to
>>> take a memory_order parameter.
>>
>> We added simple enum to specify memory order in atomic.hpp as follows.
>>
>> typedef enum cmpxchg_cmpxchg_memory_order {
>>   memory_order_relaxed,
>>   memory_order_conservative
>> } cmpxchg_memory_order;
>>
>> All of cmpxchg functions have an argument of cmpxchg_memory_order
>> with a default value memory_order_conservative that uses the same
>> semantics with the existing cmpxchg and requires no change for the
>> existing
>> callers. If you think "memory_order_ignored" is better than
>> "memory_order_conservative", I will be happy to modify this change.
>> (I just thought, "ignored" may resemble "relaxed" and may make
>> people who are familiar with C++11's memory semantics confused.
>> I would like to know thoughts of native speakers.)
>
> That is fine by me. I don't think "ignored" would be confused with
> "relaxed", but "conservative" is fine.
>
> I will run the patch through our internal build system while you prepare
> the updated RFR. My only concern is "unused argument" warnings from the
> compiler. :)
>
> We are quickly running into a hard deadline with Feature Complete
> however - possibly less than 24 hours - for hotspot changes. If this
> doesn't get in in time I will see if I can shepherd it through the
> approval process.
>
> Thanks,
> David
>
>
>> Regards,
>> Hiroshi
>> -----------------------
>> Hiroshi Horii, Ph.D.
>> IBM Research - Tokyo
>>
>>
>> David Holmes <david.holmes at oracle.com> wrote on 05/04/2016 14:55:29:
>>
>>> From: David Holmes <david.holmes at oracle.com>
>>> To: Hiroshi H Horii/Japan/IBM at IBMJP
>>> Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime-
>>> dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison
>>> <Tim_Ellison at uk.ibm.com>, Volker Simonis <volker.simonis at gmail.com>,
>>> "Doerr, Martin" <martin.doerr at sap.com>, "Lindenmaier, Goetz"
>>> <goetz.lindenmaier at sap.com>
>>> Date: 05/04/2016 14:57
>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>>> copy_to_survivor for ppc64
>>>
>>> Hi Hiroshi,
>>>
>>> Sorry for the delay on getting back to this.
>>>
>>> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote:
>>> > Hi David,
>>> >
>>> > Thank you for your comments and questions.
>>> >
>>> >> 1. Are the current cmpxchg semantics exactly the same as
>>> >> memory_order_seq_cst?
>>> >
>>> > This is very good question..
>>> >
>>> > I guess, cmpxchg needs a more conservative constraint for memory
>> ordering
>>> > than C++11, to add sync after a compare-and-exchange operation.
>>> >
>>> > Could someone give comments or thoughts?
>>>
>>> I don't want to comment on the comparison with C++11. What I would
>>> prefer to see is an additional memory_order value (such as
>>> memory_order_ignored) which is the default for all methods declared to
>>> take a memory_order parameter. That way existing implementations are
>>> clearly ignoring the memory_order attribute and there is no potential
>>> for confusion as to whether the existing implementations equate to
>>> memory_order_seq_cst or not.
>>>
>>> That said, I'm not sure it makes sense to add the memory_order parameter
>>> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark,
>>> oopDesc::cas_forward_to, unless those methods can sensibly be called
>>> with any value for memory_order - which seems highly unlikely. Perhaps
>>> those methods should identify the weakest form of memory_order they
>>> support and that should be hard-wired into them?
>>>
>>> Thanks,
>>> David
>>>
>>> > memory_order_seq_cst is defined as
>>> >     "Any operation with this memory order is both an acquire
>> operation and
>>> >      a release operation, plus a single total order exists in which
>>> all
>>> > threads
>>> >      observe all modifications (see below) in the same order."
>>> > (http://en.cppreference.com/w/cpp/atomic/memory_order)
>>> >
>>> > In my environment, g++ and xlc generate following assemblies on
>>> ppc64le.
>>> > (interestingly, they generates the same assemblies for any
>>> memory_order)
>>> >
>>> > g++ (4.9.2)
>>> >     100008a4:   ac 04 00 7c     sync
>>> >     100008a8:   28 50 20 7d     lwarx   r9,0,r10
>>> >     100008ac:   00 18 09 7c     cmpw    r9,r3
>>> >     100008b0:   0c 00 c2 40     bne-    100008bc
>>> >     100008b4:   2d 51 80 7c     stwcx.  r4,0,r10
>>> >     100008b8:   f0 ff c2 40     bne-    100008a8
>>> >     100008bc:   2c 01 00 4c     isync
>>> >
>>> > xlc (13.1.3)
>>> >     10000888:   ac 04 00 7c     sync
>>> >     1000088c:   28 28 c0 7c     lwarx   r6,0,r5
>>> >     10000890:   40 00 26 7c     cmpld   r6,r0
>>> >     10000894:   0c 00 82 40     bne     100008a0
>>> >     10000898:   2d 29 80 7c     stwcx.  r4,0,r5
>>> >     1000089c:   f0 ff e2 40     bne+    1000088c
>>> >     100008a0:   2c 01 00 4c     isync
>>> >
>>> > On the other hand, the current OpenJDK generates following assemblies.
>>> >
>>> >     508:   ac 04 00 7c     sync
>>> >     50c:   00 00 5c e9     ld      r10,0(r28)
>>> >     510:   00 50 3b 7c     cmpd    r27,r10
>>> >     514:   1c 00 c2 40     bne-    530
>>> >     518:   a8 40 5c 7d     ldarx   r10,r28,r8
>>> >     51c:   00 50 3b 7c     cmpd    r27,r10
>>> >     520:   10 00 c2 40     bne-    530
>>> >     524:   ad 41 3c 7d     stdcx.  r9,r28,r8
>>> >     528:   f0 ff c2 40     bne-    518
>>> >     52c:   ac 04 00 7c     sync
>>> >     530:   00 50 bb 7f     ...
>>> >
>>> > Though we can ignore 50c-514 (because they are a duplicated guard
>>> > condition),
>>> > the last sync instruction (52c) makes cmpxchg more strict than
>>> > memory_order_seq_cst.
>>> >
>>> > In some cases, the last sync is necessary when this thread must be
>>> able
>>> > to read
>>> > all of the changes in the other threads while executing from 508 to
>>> 530
>>> > (that processes compare-and-exchange).
>>> >
>>> >> 2. Has there been a discussion already, establishing that the
>>> modified
>>> >> GC code can indeed use memory_order_relaxed? Otherwise who is
>>> >> postulating that and based on what evidence?
>>> >
>>> > Volker and his colleagues have investigated the current GC codes
>>> > according to this.
>>> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>> April/019079.html
>>> > However, I believe, we need comments of other GC experts to change
>>> > the shared codes.
>>> >
>>> > Regards,
>>> > Hiroshi
>>> > -----------------------
>>> > Hiroshi Horii, Ph.D.
>>> > IBM Research - Tokyo
>>> >
>>> >
>>> > David Holmes <david.holmes at oracle.com> wrote on 04/22/2016 21:57:07:
>>> >
>>> >> From: David Holmes <david.holmes at oracle.com>
>>> >> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
>>> >> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net
>>> >> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>,
>>> > ppc-aix-port-dev at openjdk.java.net
>>> >> Date: 04/22/2016 21:58
>>> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>>> >> copy_to_survivor for ppc64
>>> >>
>>> >> Hi Hiroshi,
>>> >>
>>> >> Two initial questions:
>>> >>
>>> >> 1. Are the current cmpxchg semantics exactly the same as
>>> >> memory_order_seq_cst?
>>> >>
>>> >> 2. Has there been a discussion already, establishing that the
>>> modified
>>> >> GC code can indeed use memory_order_relaxed? Otherwise who is
>>> >> postulating that and based on what evidence?
>>> >>
>>> >> Missing memory barriers have caused very difficult to track down
>> bugs in
>>> >> the past - very rare race conditions. So any relaxation here has
>>> to be
>>> >> done with extreme confidence.
>>> >>
>>> >> Thanks,
>>> >> David
>>> >>
>>> >> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
>>> >> > Dear all:
>>> >> >
>>> >> > Can I please request reviews for the following change?
>>> >> >
>>> >> > Code change:
>>> >> >
>> http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
>>> >> > (I initially created and Martin enhanced so much)
>>> >> >
>>> >> > This change follows the discussion started from this mail.
>>> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>> >> April/018960.html
>>> >> >
>>> >> > Description:
>>> >> > This change provides relaxed compare-and-exchange by introducing
>>> >> > similar semantics of C++ atomic memory operators, enum
>>> memory_order.
>>> >> > As described in atomic_linux_ppc.inline.hpp, the current
>>> > implementation of
>>> >> > cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
>>> >> > general purposes because twice calls of sync before and after
>>> > cmpxchg will
>>> >> > provide strict consistency. However, they sometimes cause overheads
>>> >> > because
>>> >> > sync instructions are very expensive in the current POWER chip
>> design.
>>> >> > In addition, for the other platforms, such as aarch64, this strict
>>> >> > semantics
>>> >> > may cause some overheads (according to the Andrew's mail).
>>> >> > http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>> >> April/019073.html
>>> >> >
>>> >> > With this change, callers can explicitly specify constraints of
>> memory
>>> >> > ordering
>>> >> > for cmpxchg with an additional parameter, memory_order order.
>>> >> >
>>> >> > typedef enum memory_order {
>>> >> >    memory_order_relaxed,
>>> >> >    memory_order_consume,
>>> >> >    memory_order_acquire,
>>> >> >    memory_order_release,
>>> >> >    memory_order_acq_rel,
>>> >> >    memory_order_seq_cst
>>> >> > } memory_order;
>>> >> >
>>> >> > Because the default value of the parameter is memory_order_seq_cst,
>>> >> > existing codes can use the same semantics of cmpxchg without any
>>> >> > modification. The relaxed cmpxchg is implemented only on ppc
>>> >> > in this changeset. Therefore, the behavior on the other platforms
>> will
>>> >> > not be changed with this changeset.
>>> >> >
>>> >> > In addition, with the new parameter of cmpxchg, this change
>>> improves
>>> >> > performance of copy_to_survivor in the parallel GC.
>>> >> > copy_to_survivor changes forward pointers by using cmpxchg. This
>>> >> > operation doesn't require any sync instructions.  A pointer is
>> changed
>>> >> > at most once in a GC and when cmpxchg fails, the latest pointer is
>>> >> > available for the caller. cas_set_mark and cas_forward_to are
>> extended
>>> >> > with an additional memory_order parameter as cmpxchg and
>>> > copy_to_survivor
>>> >> > uses memory_order_relaxed to modify the forward pointers.
>>> >> >
>>> >> > Summary of source code changes:
>>> >> >
>>> >> > * src/share/vm/runtime/atomic.hpp
>>> >> >       - Defines enum memory_order and adds a parameter to cmpxchg.
>>> >> >
>>> >> > * src/share/vm/runtime/atomic.cpp
>>> >> > * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
>>> >> > * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>>> >> > * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>>> >> > * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
>>> >> > * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
>>> >> > * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
>>> >> > * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
>>> >> > * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
>>> >> > * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
>>> >> >       - Added a parameter for each cmpxchg function to follow
>>> >> >          the change of atomic.hpp. Their implementations are not
>>> > changed.
>>> >> >
>>> >> > * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
>>> >> > * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>> >> >       - Added a parameter for each cmpxchg function to follow
>>> >> >          the change of atomic.hpp. In addition, implementations
>>> >> >          are changed corresponding to the specified memory_order.
>>> >> >
>>> >> > * src/share/vm/oops/oop.hpp
>>> >> > * src/share/vm/oops/oop.inline.hpp
>>> >> >       - Add a memory_order parameter to use relaxed cmpxchg in
>>> >> >          cas_set_mark and cas_forward_to.
>>> >> >
>>> >> > * src/share/vm/gc/parallel/psPromotionManager.cpp
>>> >> > * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
>>> >> >
>>> >> > Martin tested this changeset  on linuxx86_64, linuxppc64le and
>>> >> > darwinintel64.
>>> >> > Though more time is needed to test on the other platform, we would
>>> > like to
>>> >> > ask
>>> >> > reviews and start discussion on this changeset.
>>> >> > I also tested this changeset with SPECjbb2013 and confirmed that gc
>>> > pause
>>> >> > time
>>> >> > is reduced.
>>> >> >
>>> >> > Regards,
>>> >> > Hiroshi
>>> >> > -----------------------
>>> >> > Hiroshi Horii, Ph.D.
>>> >> > IBM Research - Tokyo
>>> >> >
>>> >> >
>>> >>
>>> >
>>>
>>

From david.holmes at oracle.com  Tue May 10 10:30:36 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 10 May 2016 20:30:36 +1000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<571A1FA3.9030006@oracle.com>
	<201604250709.u3P79jwN024101@d19av07.sagamino.japan.ibm.com>
	<1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com>
	<201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com>
	<848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com>
	<347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com>
	<0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
Message-ID: <f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>

On 10/05/2016 7:41 PM, Doerr, Martin wrote:
> Hi David,
>
> thank you very much for testing the other platforms.
>
> Here's an updated webrev:
> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/

Thanks. Second test run on its way.

David
-----

> Best regards,
> Martin
>
> -----Original Message-----
> From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of David Holmes
> Sent: Dienstag, 10. Mai 2016 11:11
> To: Hiroshi H Horii <HORII at jp.ibm.com>
> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
>
> The fix seems incomplete for solaris:
>
> make/Main.gmk:232: recipe for target 'hotspot' failed
> "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp",
> line 124: Error: Too many arguments in call to
> "_Atomic_cmpxchg_long(long, volatile long*, long)".
> "/opt/jprt/T/P1/073516.daholme/s/hotspot/src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp",
> line 128: Error: Too many arguments in call to
> "_Atomic_cmpxchg_long(long, volatile long*, long)".
>
> David
>
> On 10/05/2016 5:34 PM, David Holmes wrote:
>> Hi Hiroshi,
>>
>> On 6/05/2016 8:11 PM, Hiroshi H Horii wrote:
>>> Hi David,
>>>
>>> Thank you for your comments.
>>>
>>> As Martin suggested me, I would like to separate this proposal to
>>>   - relaxing memory order of cmpxchg
>>>   - improvement of copy_to_survivior with relaxed cmpxchg
>>> and discuss the former first.
>>>
>>> Martin thankfully created a new webrev that include a change of cmpxchg.
>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.00/
>>> He has already tested it with AIX, linuxx86_64, linuxppc64le and
>>> darwinintel64.
>>> (Please tell me if I need to send a new mail for this PFR)
>>
>> Please do as it will be simpler to track that way.
>>
>>>> What I would prefer to see is an additional memory_order value (such as
>>>> memory_order_ignored) which is the default for all methods declared to
>>>> take a memory_order parameter.
>>>
>>> We added simple enum to specify memory order in atomic.hpp as follows.
>>>
>>> typedef enum cmpxchg_cmpxchg_memory_order {
>>>   memory_order_relaxed,
>>>   memory_order_conservative
>>> } cmpxchg_memory_order;
>>>
>>> All of cmpxchg functions have an argument of cmpxchg_memory_order
>>> with a default value memory_order_conservative that uses the same
>>> semantics with the existing cmpxchg and requires no change for the
>>> existing
>>> callers. If you think "memory_order_ignored" is better than
>>> "memory_order_conservative", I will be happy to modify this change.
>>> (I just thought, "ignored" may resemble "relaxed" and may make
>>> people who are familiar with C++11's memory semantics confused.
>>> I would like to know thoughts of native speakers.)
>>
>> That is fine by me. I don't think "ignored" would be confused with
>> "relaxed", but "conservative" is fine.
>>
>> I will run the patch through our internal build system while you prepare
>> the updated RFR. My only concern is "unused argument" warnings from the
>> compiler. :)
>>
>> We are quickly running into a hard deadline with Feature Complete
>> however - possibly less than 24 hours - for hotspot changes. If this
>> doesn't get in in time I will see if I can shepherd it through the
>> approval process.
>>
>> Thanks,
>> David
>>
>>
>>> Regards,
>>> Hiroshi
>>> -----------------------
>>> Hiroshi Horii, Ph.D.
>>> IBM Research - Tokyo
>>>
>>>
>>> David Holmes <david.holmes at oracle.com> wrote on 05/04/2016 14:55:29:
>>>
>>>> From: David Holmes <david.holmes at oracle.com>
>>>> To: Hiroshi H Horii/Japan/IBM at IBMJP
>>>> Cc: hotspot-gc-dev at openjdk.java.net, hotspot-runtime-
>>>> dev at openjdk.java.net, ppc-aix-port-dev at openjdk.java.net, Tim Ellison
>>>> <Tim_Ellison at uk.ibm.com>, Volker Simonis <volker.simonis at gmail.com>,
>>>> "Doerr, Martin" <martin.doerr at sap.com>, "Lindenmaier, Goetz"
>>>> <goetz.lindenmaier at sap.com>
>>>> Date: 05/04/2016 14:57
>>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>>>> copy_to_survivor for ppc64
>>>>
>>>> Hi Hiroshi,
>>>>
>>>> Sorry for the delay on getting back to this.
>>>>
>>>> On 25/04/2016 5:09 PM, Hiroshi H Horii wrote:
>>>>> Hi David,
>>>>>
>>>>> Thank you for your comments and questions.
>>>>>
>>>>>> 1. Are the current cmpxchg semantics exactly the same as
>>>>>> memory_order_seq_cst?
>>>>>
>>>>> This is very good question..
>>>>>
>>>>> I guess, cmpxchg needs a more conservative constraint for memory
>>> ordering
>>>>> than C++11, to add sync after a compare-and-exchange operation.
>>>>>
>>>>> Could someone give comments or thoughts?
>>>>
>>>> I don't want to comment on the comparison with C++11. What I would
>>>> prefer to see is an additional memory_order value (such as
>>>> memory_order_ignored) which is the default for all methods declared to
>>>> take a memory_order parameter. That way existing implementations are
>>>> clearly ignoring the memory_order attribute and there is no potential
>>>> for confusion as to whether the existing implementations equate to
>>>> memory_order_seq_cst or not.
>>>>
>>>> That said, I'm not sure it makes sense to add the memory_order parameter
>>>> to all methods with "cas" in their name, e.g. oopDesc::cas_set_mark,
>>>> oopDesc::cas_forward_to, unless those methods can sensibly be called
>>>> with any value for memory_order - which seems highly unlikely. Perhaps
>>>> those methods should identify the weakest form of memory_order they
>>>> support and that should be hard-wired into them?
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> memory_order_seq_cst is defined as
>>>>>     "Any operation with this memory order is both an acquire
>>> operation and
>>>>>      a release operation, plus a single total order exists in which
>>>> all
>>>>> threads
>>>>>      observe all modifications (see below) in the same order."
>>>>> (http://en.cppreference.com/w/cpp/atomic/memory_order)
>>>>>
>>>>> In my environment, g++ and xlc generate following assemblies on
>>>> ppc64le.
>>>>> (interestingly, they generates the same assemblies for any
>>>> memory_order)
>>>>>
>>>>> g++ (4.9.2)
>>>>>     100008a4:   ac 04 00 7c     sync
>>>>>     100008a8:   28 50 20 7d     lwarx   r9,0,r10
>>>>>     100008ac:   00 18 09 7c     cmpw    r9,r3
>>>>>     100008b0:   0c 00 c2 40     bne-    100008bc
>>>>>     100008b4:   2d 51 80 7c     stwcx.  r4,0,r10
>>>>>     100008b8:   f0 ff c2 40     bne-    100008a8
>>>>>     100008bc:   2c 01 00 4c     isync
>>>>>
>>>>> xlc (13.1.3)
>>>>>     10000888:   ac 04 00 7c     sync
>>>>>     1000088c:   28 28 c0 7c     lwarx   r6,0,r5
>>>>>     10000890:   40 00 26 7c     cmpld   r6,r0
>>>>>     10000894:   0c 00 82 40     bne     100008a0
>>>>>     10000898:   2d 29 80 7c     stwcx.  r4,0,r5
>>>>>     1000089c:   f0 ff e2 40     bne+    1000088c
>>>>>     100008a0:   2c 01 00 4c     isync
>>>>>
>>>>> On the other hand, the current OpenJDK generates following assemblies.
>>>>>
>>>>>     508:   ac 04 00 7c     sync
>>>>>     50c:   00 00 5c e9     ld      r10,0(r28)
>>>>>     510:   00 50 3b 7c     cmpd    r27,r10
>>>>>     514:   1c 00 c2 40     bne-    530
>>>>>     518:   a8 40 5c 7d     ldarx   r10,r28,r8
>>>>>     51c:   00 50 3b 7c     cmpd    r27,r10
>>>>>     520:   10 00 c2 40     bne-    530
>>>>>     524:   ad 41 3c 7d     stdcx.  r9,r28,r8
>>>>>     528:   f0 ff c2 40     bne-    518
>>>>>     52c:   ac 04 00 7c     sync
>>>>>     530:   00 50 bb 7f     ...
>>>>>
>>>>> Though we can ignore 50c-514 (because they are a duplicated guard
>>>>> condition),
>>>>> the last sync instruction (52c) makes cmpxchg more strict than
>>>>> memory_order_seq_cst.
>>>>>
>>>>> In some cases, the last sync is necessary when this thread must be
>>>> able
>>>>> to read
>>>>> all of the changes in the other threads while executing from 508 to
>>>> 530
>>>>> (that processes compare-and-exchange).
>>>>>
>>>>>> 2. Has there been a discussion already, establishing that the
>>>> modified
>>>>>> GC code can indeed use memory_order_relaxed? Otherwise who is
>>>>>> postulating that and based on what evidence?
>>>>>
>>>>> Volker and his colleagues have investigated the current GC codes
>>>>> according to this.
>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>>> April/019079.html
>>>>> However, I believe, we need comments of other GC experts to change
>>>>> the shared codes.
>>>>>
>>>>> Regards,
>>>>> Hiroshi
>>>>> -----------------------
>>>>> Hiroshi Horii, Ph.D.
>>>>> IBM Research - Tokyo
>>>>>
>>>>>
>>>>> David Holmes <david.holmes at oracle.com> wrote on 04/22/2016 21:57:07:
>>>>>
>>>>>> From: David Holmes <david.holmes at oracle.com>
>>>>>> To: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-runtime-
>>>>>> dev at openjdk.java.net, hotspot-gc-dev at openjdk.java.net
>>>>>> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>,
>>>>> ppc-aix-port-dev at openjdk.java.net
>>>>>> Date: 04/22/2016 21:58
>>>>>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>>>>>> copy_to_survivor for ppc64
>>>>>>
>>>>>> Hi Hiroshi,
>>>>>>
>>>>>> Two initial questions:
>>>>>>
>>>>>> 1. Are the current cmpxchg semantics exactly the same as
>>>>>> memory_order_seq_cst?
>>>>>>
>>>>>> 2. Has there been a discussion already, establishing that the
>>>> modified
>>>>>> GC code can indeed use memory_order_relaxed? Otherwise who is
>>>>>> postulating that and based on what evidence?
>>>>>>
>>>>>> Missing memory barriers have caused very difficult to track down
>>> bugs in
>>>>>> the past - very rare race conditions. So any relaxation here has
>>>> to be
>>>>>> done with extreme confidence.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 22/04/2016 10:28 PM, Hiroshi H Horii wrote:
>>>>>>> Dear all:
>>>>>>>
>>>>>>> Can I please request reviews for the following change?
>>>>>>>
>>>>>>> Code change:
>>>>>>>
>>> http://cr.openjdk.java.net/~mdoerr/8154736_copy_to_survivor/webrev.00/
>>>>>>> (I initially created and Martin enhanced so much)
>>>>>>>
>>>>>>> This change follows the discussion started from this mail.
>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>>>>> April/018960.html
>>>>>>>
>>>>>>> Description:
>>>>>>> This change provides relaxed compare-and-exchange by introducing
>>>>>>> similar semantics of C++ atomic memory operators, enum
>>>> memory_order.
>>>>>>> As described in atomic_linux_ppc.inline.hpp, the current
>>>>> implementation of
>>>>>>> cmpxchg is fence_cmpxchg_acquire. This implementation is useful for
>>>>>>> general purposes because twice calls of sync before and after
>>>>> cmpxchg will
>>>>>>> provide strict consistency. However, they sometimes cause overheads
>>>>>>> because
>>>>>>> sync instructions are very expensive in the current POWER chip
>>> design.
>>>>>>> In addition, for the other platforms, such as aarch64, this strict
>>>>>>> semantics
>>>>>>> may cause some overheads (according to the Andrew's mail).
>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>>>>> April/019073.html
>>>>>>>
>>>>>>> With this change, callers can explicitly specify constraints of
>>> memory
>>>>>>> ordering
>>>>>>> for cmpxchg with an additional parameter, memory_order order.
>>>>>>>
>>>>>>> typedef enum memory_order {
>>>>>>>    memory_order_relaxed,
>>>>>>>    memory_order_consume,
>>>>>>>    memory_order_acquire,
>>>>>>>    memory_order_release,
>>>>>>>    memory_order_acq_rel,
>>>>>>>    memory_order_seq_cst
>>>>>>> } memory_order;
>>>>>>>
>>>>>>> Because the default value of the parameter is memory_order_seq_cst,
>>>>>>> existing codes can use the same semantics of cmpxchg without any
>>>>>>> modification. The relaxed cmpxchg is implemented only on ppc
>>>>>>> in this changeset. Therefore, the behavior on the other platforms
>>> will
>>>>>>> not be changed with this changeset.
>>>>>>>
>>>>>>> In addition, with the new parameter of cmpxchg, this change
>>>> improves
>>>>>>> performance of copy_to_survivor in the parallel GC.
>>>>>>> copy_to_survivor changes forward pointers by using cmpxchg. This
>>>>>>> operation doesn't require any sync instructions.  A pointer is
>>> changed
>>>>>>> at most once in a GC and when cmpxchg fails, the latest pointer is
>>>>>>> available for the caller. cas_set_mark and cas_forward_to are
>>> extended
>>>>>>> with an additional memory_order parameter as cmpxchg and
>>>>> copy_to_survivor
>>>>>>> uses memory_order_relaxed to modify the forward pointers.
>>>>>>>
>>>>>>> Summary of source code changes:
>>>>>>>
>>>>>>> * src/share/vm/runtime/atomic.hpp
>>>>>>>       - Defines enum memory_order and adds a parameter to cmpxchg.
>>>>>>>
>>>>>>> * src/share/vm/runtime/atomic.cpp
>>>>>>> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
>>>>>>> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>>>>>>> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>>>>>>> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
>>>>>>> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
>>>>>>> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
>>>>>>> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
>>>>>>> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
>>>>>>> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
>>>>>>>       - Added a parameter for each cmpxchg function to follow
>>>>>>>          the change of atomic.hpp. Their implementations are not
>>>>> changed.
>>>>>>>
>>>>>>> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
>>>>>>> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>>>>>>       - Added a parameter for each cmpxchg function to follow
>>>>>>>          the change of atomic.hpp. In addition, implementations
>>>>>>>          are changed corresponding to the specified memory_order.
>>>>>>>
>>>>>>> * src/share/vm/oops/oop.hpp
>>>>>>> * src/share/vm/oops/oop.inline.hpp
>>>>>>>       - Add a memory_order parameter to use relaxed cmpxchg in
>>>>>>>          cas_set_mark and cas_forward_to.
>>>>>>>
>>>>>>> * src/share/vm/gc/parallel/psPromotionManager.cpp
>>>>>>> * src/share/vm/gc/parallel/psPromotionManager.inline.hpp
>>>>>>>
>>>>>>> Martin tested this changeset  on linuxx86_64, linuxppc64le and
>>>>>>> darwinintel64.
>>>>>>> Though more time is needed to test on the other platform, we would
>>>>> like to
>>>>>>> ask
>>>>>>> reviews and start discussion on this changeset.
>>>>>>> I also tested this changeset with SPECjbb2013 and confirmed that gc
>>>>> pause
>>>>>>> time
>>>>>>> is reduced.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Hiroshi
>>>>>>> -----------------------
>>>>>>> Hiroshi Horii, Ph.D.
>>>>>>> IBM Research - Tokyo
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>

From HORII at jp.ibm.com  Tue May 10 10:44:36 2016
From: HORII at jp.ibm.com (Hiroshi H Horii)
Date: Tue, 10 May 2016 19:44:36 +0900
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
Message-ID: <201605101044.u4AAinRB025272@d19av05.sagamino.japan.ibm.com>

Hi All,

Can I please request reviews for the following change?

Code change:
http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/

This change follows the discussion started from these mails.
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/018960.html
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019148.html
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-May/019320.html

Description:
This change provides relaxed compare-and-exchange by introducing
relaxed memory order. As described in atomic_linux_ppc.inline.hpp, 
the current implementation of cmpxchg is fence_cmpxchg_acquire. 
This implementation is useful for general purposes because twice calls of 
sync before and after cmpxchg will provide strict consistency. 
However, they sometimes cause overheads because sync instructions are 
very expensive in the current POWER chip design.

We confirmed this change improves performance of copy_to_survivor
in the parallel GC. However, we will need more investigation of GC 
by more experts. So, We would like to request a review of the change
of cmpxchg first (as Martin requested).
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019188.html

Summary of source code changes:

* src/share/vm/runtime/atomic.hpp 
     - Defines enum memory_order and adds a parameter to cmpxchg.

* src/share/vm/runtime/atomic.cpp
* src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
* src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
* src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
* src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
* src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
* src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
* src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
* src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
* src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
     - Added a parameter for each cmpxchg function to follow
        the change of atomic.hpp. Their implementations are not changed.

* src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
* src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
     - Added a parameter for each cmpxchg function to follow
        the change of atomic.hpp. In addition, implementations 
        are changed corresponding to the specified memory_order.

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160510/1c93c70c/attachment.html>

From david.holmes at oracle.com  Tue May 10 11:04:15 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 10 May 2016 21:04:15 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
Message-ID: <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>

Hi Hiroshi,

On 10/05/2016 8:44 PM, Hiroshi H Horii wrote:
> Hi All,
>
> Can I please request reviews for the following change?
>
> Code change:
> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/

Changes look good. I'm currently running them through our internal build 
system. I will sponsor this and push the change through JPRT.

Just need another reviewer to chime in - given you and Martin are both 
contributors. Or are you the main contributor with Martin being a reviewer?

Thanks,
David

PS. It's my night now so I'll be signing off and will pick this up in 
the morning.

> This change follows the discussion started from these mails.
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/018960.html
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019148.html
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-May/019320.html
>
> Description:
> This change provides relaxed compare-and-exchange by introducing
> relaxed memory order. As described in atomic_linux_ppc.inline.hpp,
> the current implementation of cmpxchg is fence_cmpxchg_acquire.
> This implementation is useful for general purposes because twice calls of
> sync before and after cmpxchg will provide strict consistency.
> However, they sometimes cause overheads because sync instructions are
> very expensive in the current POWER chip design.
>
> We confirmed this change improves performance of copy_to_survivor
> in the parallel GC. However, we will need more investigation of GC
> by more experts. So, We would like to request a review of the change
> of cmpxchg first (as Martin requested).
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019188.html
>
> Summary of source code changes:
>
> * src/share/vm/runtime/atomic.hpp
>      - Defines enum memory_order and adds a parameter to cmpxchg.
>
> * src/share/vm/runtime/atomic.cpp
> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
>      - Added a parameter for each cmpxchg function to follow
>         the change of atomic.hpp. Their implementations are not changed.
>
> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>      - Added a parameter for each cmpxchg function to follow
>         the change of atomic.hpp. In addition, implementations
>         are changed corresponding to the specified memory_order.
>
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
>

From david.holmes at oracle.com  Tue May 10 12:29:53 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 10 May 2016 22:29:53 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
Message-ID: <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>

On 10/05/2016 9:04 PM, David Holmes wrote:
> Hi Hiroshi,
>
> On 10/05/2016 8:44 PM, Hiroshi H Horii wrote:
>> Hi All,
>>
>> Can I please request reviews for the following change?
>>
>> Code change:
>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/
>
> Changes look good. I'm currently running them through our internal build
> system. I will sponsor this and push the change through JPRT.

Still a problem on Solaris sparc:

"/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/atomic.inline.hpp", 
line 96: Error: Could not find a match for static Atomic::cmpxchg(signed 
char, volatile signed char*, signed char).
1 Error(s) detected.

Needs this patch:

diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp
--- a/src/share/vm/runtime/atomic.inline.hpp
+++ b/src/share/vm/runtime/atomic.inline.hpp
@@ -92,7 +92,7 @@

  #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE
  // See comment in atomic.cpp how to override.
-inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte 
*dest, jbyte comparand)
+inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte 
*dest, jbyte comparand, cmpxchg_memory_order order)
  {
    return cmpxchg_general(exchange_value, dest, comparand);
  }

David
-----

> Just need another reviewer to chime in - given you and Martin are both
> contributors. Or are you the main contributor with Martin being a reviewer?
>
> Thanks,
> David
>
> PS. It's my night now so I'll be signing off and will pick this up in
> the morning.
>
>> This change follows the discussion started from these mails.
>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/018960.html
>>
>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019148.html
>>
>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-May/019320.html
>>
>>
>> Description:
>> This change provides relaxed compare-and-exchange by introducing
>> relaxed memory order. As described in atomic_linux_ppc.inline.hpp,
>> the current implementation of cmpxchg is fence_cmpxchg_acquire.
>> This implementation is useful for general purposes because twice calls of
>> sync before and after cmpxchg will provide strict consistency.
>> However, they sometimes cause overheads because sync instructions are
>> very expensive in the current POWER chip design.
>>
>> We confirmed this change improves performance of copy_to_survivor
>> in the parallel GC. However, we will need more investigation of GC
>> by more experts. So, We would like to request a review of the change
>> of cmpxchg first (as Martin requested).
>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019188.html
>>
>>
>> Summary of source code changes:
>>
>> * src/share/vm/runtime/atomic.hpp
>>      - Defines enum memory_order and adds a parameter to cmpxchg.
>>
>> * src/share/vm/runtime/atomic.cpp
>> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
>> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
>> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
>> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
>> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
>> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
>> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
>>      - Added a parameter for each cmpxchg function to follow
>>         the change of atomic.hpp. Their implementations are not changed.
>>
>> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
>> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>      - Added a parameter for each cmpxchg function to follow
>>         the change of atomic.hpp. In addition, implementations
>>         are changed corresponding to the specified memory_order.
>>
>> Regards,
>> Hiroshi
>> -----------------------
>> Hiroshi Horii, Ph.D.
>> IBM Research - Tokyo
>>

From HORII at jp.ibm.com  Tue May 10 13:17:46 2016
From: HORII at jp.ibm.com (Hiroshi H Horii)
Date: Tue, 10 May 2016 22:17:46 +0900
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
Message-ID: <201605101318.u4ADI0TG028528@d19av07.sagamino.japan.ibm.com>

Hi David,

> Just need another reviewer to chime in - given you and Martin are both
> contributors. Or are you the main contributor with Martin being a 
reviewer?

Martin and I are contributors of this change.

> Still a problem on Solaris sparc:

Martin, could you create a new change in webrev with the patch that David 
sent?

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


David Holmes <david.holmes at oracle.com> wrote on 05/10/2016 21:29:53:

> From: David Holmes <david.holmes at oracle.com>
> To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-runtime-
> dev at openjdk.java.net" <hotspot-runtime-dev at openjdk.java.net>
> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>, "ppc-aix-port-
> dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>, "hotspot-
> gc-dev at openjdk.java.net" <hotspot-gc-dev at openjdk.java.net>
> Date: 05/10/2016 21:31
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
> 
> On 10/05/2016 9:04 PM, David Holmes wrote:
> > Hi Hiroshi,
> >
> > On 10/05/2016 8:44 PM, Hiroshi H Horii wrote:
> >> Hi All,
> >>
> >> Can I please request reviews for the following change?
> >>
> >> Code change:
> >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/
> >
> > Changes look good. I'm currently running them through our internal 
build
> > system. I will sponsor this and push the change through JPRT.
> 
> Still a problem on Solaris sparc:
> 
> "/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/
> atomic.inline.hpp", 
> line 96: Error: Could not find a match for static Atomic::cmpxchg(signed 

> char, volatile signed char*, signed char).
> 1 Error(s) detected.
> 
> Needs this patch:
> 
> diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp
> --- a/src/share/vm/runtime/atomic.inline.hpp
> +++ b/src/share/vm/runtime/atomic.inline.hpp
> @@ -92,7 +92,7 @@
> 
>   #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE
>   // See comment in atomic.cpp how to override.
> -inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte 
> *dest, jbyte comparand)
> +inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte 
> *dest, jbyte comparand, cmpxchg_memory_order order)
>   {
>     return cmpxchg_general(exchange_value, dest, comparand);
>   }
> 
> David
> -----
> 
> > Just need another reviewer to chime in - given you and Martin are both
> > contributors. Or are you the main contributor with Martin being a 
reviewer?
> >
> > Thanks,
> > David
> >
> > PS. It's my night now so I'll be signing off and will pick this up in
> > the morning.
> >
> >> This change follows the discussion started from these mails.
> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> April/018960.html
> >>
> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> April/019148.html
> >>
> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> May/019320.html
> >>
> >>
> >> Description:
> >> This change provides relaxed compare-and-exchange by introducing
> >> relaxed memory order. As described in atomic_linux_ppc.inline.hpp,
> >> the current implementation of cmpxchg is fence_cmpxchg_acquire.
> >> This implementation is useful for general purposes because twice 
calls of
> >> sync before and after cmpxchg will provide strict consistency.
> >> However, they sometimes cause overheads because sync instructions are
> >> very expensive in the current POWER chip design.
> >>
> >> We confirmed this change improves performance of copy_to_survivor
> >> in the parallel GC. However, we will need more investigation of GC
> >> by more experts. So, We would like to request a review of the change
> >> of cmpxchg first (as Martin requested).
> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> April/019188.html
> >>
> >>
> >> Summary of source code changes:
> >>
> >> * src/share/vm/runtime/atomic.hpp
> >>      - Defines enum memory_order and adds a parameter to cmpxchg.
> >>
> >> * src/share/vm/runtime/atomic.cpp
> >> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
> >> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
> >> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
> >> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
> >> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
> >> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
> >> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
> >> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
> >> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
> >>      - Added a parameter for each cmpxchg function to follow
> >>         the change of atomic.hpp. Their implementations are not 
changed.
> >>
> >> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
> >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
> >>      - Added a parameter for each cmpxchg function to follow
> >>         the change of atomic.hpp. In addition, implementations
> >>         are changed corresponding to the specified memory_order.
> >>
> >> Regards,
> >> Hiroshi
> >> -----------------------
> >> Hiroshi Horii, Ph.D.
> >> IBM Research - Tokyo
> >>
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160510/d3efc909/attachment-0001.html>

From martin.doerr at sap.com  Tue May 10 14:27:52 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 10 May 2016 14:27:52 +0000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
Message-ID: <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>

Hello everybody,

thanks for finding this issue. New webrev is here:
http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/

Best regards,
Martin

From: Hiroshi H Horii [mailto:HORII at jp.ibm.com]
Sent: Dienstag, 10. Mai 2016 15:18
To: David Holmes <david.holmes at oracle.com>
Cc: hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Tim Ellison <Tim_Ellison at uk.ibm.com>; Doerr, Martin <martin.doerr at sap.com>
Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg

Hi David,

> Just need another reviewer to chime in - given you and Martin are both
> contributors. Or are you the main contributor with Martin being a reviewer?

Martin and I are contributors of this change.

> Still a problem on Solaris sparc:

Martin, could you create a new change in webrev with the patch that David sent?

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


David Holmes <david.holmes at oracle.com<mailto:david.holmes at oracle.com>> wrote on 05/10/2016 21:29:53:

> From: David Holmes <david.holmes at oracle.com<mailto:david.holmes at oracle.com>>
> To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-runtime-
> dev at openjdk.java.net<mailto:dev at openjdk.java.net>" <hotspot-runtime-dev at openjdk.java.net<mailto:hotspot-runtime-dev at openjdk.java.net>>
> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com<mailto:Tim_Ellison at uk.ibm.com>>, "ppc-aix-port-
> dev at openjdk.java.net<mailto:dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>>, "hotspot-
> gc-dev at openjdk.java.net<mailto:gc-dev at openjdk.java.net>" <hotspot-gc-dev at openjdk.java.net<mailto:hotspot-gc-dev at openjdk.java.net>>
> Date: 05/10/2016 21:31
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
> On 10/05/2016 9:04 PM, David Holmes wrote:
> > Hi Hiroshi,
> >
> > On 10/05/2016 8:44 PM, Hiroshi H Horii wrote:
> >> Hi All,
> >>
> >> Can I please request reviews for the following change?
> >>
> >> Code change:
> >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/
> >
> > Changes look good. I'm currently running them through our internal build
> > system. I will sponsor this and push the change through JPRT.
>
> Still a problem on Solaris sparc:
>
> "/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/
> atomic.inline.hpp",
> line 96: Error: Could not find a match for static Atomic::cmpxchg(signed
> char, volatile signed char*, signed char).
> 1 Error(s) detected.
>
> Needs this patch:
>
> diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp
> --- a/src/share/vm/runtime/atomic.inline.hpp
> +++ b/src/share/vm/runtime/atomic.inline.hpp
> @@ -92,7 +92,7 @@
>
>   #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE
>   // See comment in atomic.cpp how to override.
> -inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte
> *dest, jbyte comparand)
> +inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte
> *dest, jbyte comparand, cmpxchg_memory_order order)
>   {
>     return cmpxchg_general(exchange_value, dest, comparand);
>   }
>
> David
> -----
>
> > Just need another reviewer to chime in - given you and Martin are both
> > contributors. Or are you the main contributor with Martin being a reviewer?
> >
> > Thanks,
> > David
> >
> > PS. It's my night now so I'll be signing off and will pick this up in
> > the morning.
> >
> >> This change follows the discussion started from these mails.
> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> April/018960.html
> >>
> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> April/019148.html
> >>
> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> May/019320.html
> >>
> >>
> >> Description:
> >> This change provides relaxed compare-and-exchange by introducing
> >> relaxed memory order. As described in atomic_linux_ppc.inline.hpp,
> >> the current implementation of cmpxchg is fence_cmpxchg_acquire.
> >> This implementation is useful for general purposes because twice calls of
> >> sync before and after cmpxchg will provide strict consistency.
> >> However, they sometimes cause overheads because sync instructions are
> >> very expensive in the current POWER chip design.
> >>
> >> We confirmed this change improves performance of copy_to_survivor
> >> in the parallel GC. However, we will need more investigation of GC
> >> by more experts. So, We would like to request a review of the change
> >> of cmpxchg first (as Martin requested).
> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> April/019188.html
> >>
> >>
> >> Summary of source code changes:
> >>
> >> * src/share/vm/runtime/atomic.hpp
> >>      - Defines enum memory_order and adds a parameter to cmpxchg.
> >>
> >> * src/share/vm/runtime/atomic.cpp
> >> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
> >> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
> >> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
> >> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
> >> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
> >> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
> >> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
> >> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
> >> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
> >>      - Added a parameter for each cmpxchg function to follow
> >>         the change of atomic.hpp. Their implementations are not changed.
> >>
> >> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
> >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
> >>      - Added a parameter for each cmpxchg function to follow
> >>         the change of atomic.hpp. In addition, implementations
> >>         are changed corresponding to the specified memory_order.
> >>
> >> Regards,
> >> Hiroshi
> >> -----------------------
> >> Hiroshi Horii, Ph.D.
> >> IBM Research - Tokyo
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160510/ba65b2d9/attachment.html>

From david.holmes at oracle.com  Tue May 10 20:56:09 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 11 May 2016 06:56:09 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
Message-ID: <b5434cda-58d2-eae1-57de-b09a46da438e@oracle.com>

On 11/05/2016 12:27 AM, Doerr, Martin wrote:
> Hello everybody,
>
> thanks for finding this issue. New webrev is here:
>
> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/

Unfortunately my test run hit a crash on Solaris sparc:

# Problematic frame:
# V  [libjvm.so+0xcc35c4] 
markOopDesc*markOopDesc::displaced_mark_helper()const+0x64

I'm going to have to do some more testing to see if that is actually 
related to the change. I know it should not be, but given we CAS marks I 
have to wonder if there's some subtle bad interaction here. :(

David
-----

>
>
> Best regards,
>
> Martin
>
>
>
> *From:*Hiroshi H Horii [mailto:HORII at jp.ibm.com]
> *Sent:* Dienstag, 10. Mai 2016 15:18
> *To:* David Holmes <david.holmes at oracle.com>
> *Cc:* hotspot-gc-dev at openjdk.java.net;
> hotspot-runtime-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net;
> Tim Ellison <Tim_Ellison at uk.ibm.com>; Doerr, Martin <martin.doerr at sap.com>
> *Subject:* Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
>
>
> Hi David,
>
>> Just need another reviewer to chime in - given you and Martin are both
>> contributors. Or are you the main contributor with Martin being a reviewer?
>
> Martin and I are contributors of this change.
>
>> Still a problem on Solaris sparc:
>
> Martin, could you create a new change in webrev with the patch that
> David sent?
>
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
>
>
> David Holmes <david.holmes at oracle.com <mailto:david.holmes at oracle.com>>
> wrote on 05/10/2016 21:29:53:
>
>> From: David Holmes <david.holmes at oracle.com <mailto:david.holmes at oracle.com>>
>> To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-runtime-
>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>"
> <hotspot-runtime-dev at openjdk.java.net
> <mailto:hotspot-runtime-dev at openjdk.java.net>>
>> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com <mailto:Tim_Ellison at uk.ibm.com>>, "ppc-aix-port-
>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>"
> <ppc-aix-port-dev at openjdk.java.net
> <mailto:ppc-aix-port-dev at openjdk.java.net>>, "hotspot-
>> gc-dev at openjdk.java.net <mailto:gc-dev at openjdk.java.net>"
> <hotspot-gc-dev at openjdk.java.net <mailto:hotspot-gc-dev at openjdk.java.net>>
>> Date: 05/10/2016 21:31
>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>
>> On 10/05/2016 9:04 PM, David Holmes wrote:
>> > Hi Hiroshi,
>> >
>> > On 10/05/2016 8:44 PM, Hiroshi H Horii wrote:
>> >> Hi All,
>> >>
>> >> Can I please request reviews for the following change?
>> >>
>> >> Code change:
>> >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/
>> >
>> > Changes look good. I'm currently running them through our internal build
>> > system. I will sponsor this and push the change through JPRT.
>>
>> Still a problem on Solaris sparc:
>>
>> "/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/
>> atomic.inline.hpp",
>> line 96: Error: Could not find a match for static Atomic::cmpxchg(signed
>> char, volatile signed char*, signed char).
>> 1 Error(s) detected.
>>
>> Needs this patch:
>>
>> diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp
>> --- a/src/share/vm/runtime/atomic.inline.hpp
>> +++ b/src/share/vm/runtime/atomic.inline.hpp
>> @@ -92,7 +92,7 @@
>>
>>   #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE
>>   // See comment in atomic.cpp how to override.
>> -inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte
>> *dest, jbyte comparand)
>> +inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte
>> *dest, jbyte comparand, cmpxchg_memory_order order)
>>   {
>>     return cmpxchg_general(exchange_value, dest, comparand);
>>   }
>>
>> David
>> -----
>>
>> > Just need another reviewer to chime in - given you and Martin are both
>> > contributors. Or are you the main contributor with Martin being a reviewer?
>> >
>> > Thanks,
>> > David
>> >
>> > PS. It's my night now so I'll be signing off and will pick this up in
>> > the morning.
>> >
>> >> This change follows the discussion started from these mails.
>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>> April/018960.html
>> >>
>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>> April/019148.html
>> >>
>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>> May/019320.html
>> >>
>> >>
>> >> Description:
>> >> This change provides relaxed compare-and-exchange by introducing
>> >> relaxed memory order. As described in atomic_linux_ppc.inline.hpp,
>> >> the current implementation of cmpxchg is fence_cmpxchg_acquire.
>> >> This implementation is useful for general purposes because twice calls of
>> >> sync before and after cmpxchg will provide strict consistency.
>> >> However, they sometimes cause overheads because sync instructions are
>> >> very expensive in the current POWER chip design.
>> >>
>> >> We confirmed this change improves performance of copy_to_survivor
>> >> in the parallel GC. However, we will need more investigation of GC
>> >> by more experts. So, We would like to request a review of the change
>> >> of cmpxchg first (as Martin requested).
>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>> April/019188.html
>> >>
>> >>
>> >> Summary of source code changes:
>> >>
>> >> * src/share/vm/runtime/atomic.hpp
>> >>      - Defines enum memory_order and adds a parameter to cmpxchg.
>> >>
>> >> * src/share/vm/runtime/atomic.cpp
>> >> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
>> >> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>> >> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>> >> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
>> >> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
>> >> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
>> >> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
>> >> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
>> >> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
>> >>      - Added a parameter for each cmpxchg function to follow
>> >>         the change of atomic.hpp. Their implementations are not changed.
>> >>
>> >> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
>> >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>> >>      - Added a parameter for each cmpxchg function to follow
>> >>         the change of atomic.hpp. In addition, implementations
>> >>         are changed corresponding to the specified memory_order.
>> >>
>> >> Regards,
>> >> Hiroshi
>> >> -----------------------
>> >> Hiroshi Horii, Ph.D.
>> >> IBM Research - Tokyo
>> >>
>>
>

From david.holmes at oracle.com  Wed May 11 04:41:06 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 11 May 2016 14:41:06 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <b5434cda-58d2-eae1-57de-b09a46da438e@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<b5434cda-58d2-eae1-57de-b09a46da438e@oracle.com>
Message-ID: <b375616e-df42-a0c4-7b22-579df6418f8a@oracle.com>

Adding hotspot-dev to cc to expand scope of reviewer pool :)

On 11/05/2016 6:56 AM, David Holmes wrote:
> On 11/05/2016 12:27 AM, Doerr, Martin wrote:
>> Hello everybody,
>>
>> thanks for finding this issue. New webrev is here:
>>
>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/
>
> Unfortunately my test run hit a crash on Solaris sparc:
>
> # Problematic frame:
> # V  [libjvm.so+0xcc35c4]
> markOopDesc*markOopDesc::displaced_mark_helper()const+0x64
>
> I'm going to have to do some more testing to see if that is actually
> related to the change. I know it should not be, but given we CAS marks I
> have to wonder if there's some subtle bad interaction here. :(

Further testing has not shown any failures on Solaris sparc, and the 
same testing showed some spurious failures on other platforms even 
without these changes. So while I will file a bug for this crash I think 
it unlikely to be related to the current changes.

So on that note we just need a second hotspot reviewer to sign off on this.

Thanks,
David


> David
> -----
>
>>
>>
>> Best regards,
>>
>> Martin
>>
>>
>>
>> *From:*Hiroshi H Horii [mailto:HORII at jp.ibm.com]
>> *Sent:* Dienstag, 10. Mai 2016 15:18
>> *To:* David Holmes <david.holmes at oracle.com>
>> *Cc:* hotspot-gc-dev at openjdk.java.net;
>> hotspot-runtime-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net;
>> Tim Ellison <Tim_Ellison at uk.ibm.com>; Doerr, Martin
>> <martin.doerr at sap.com>
>> *Subject:* Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>
>>
>>
>> Hi David,
>>
>>> Just need another reviewer to chime in - given you and Martin are both
>>> contributors. Or are you the main contributor with Martin being a
>>> reviewer?
>>
>> Martin and I are contributors of this change.
>>
>>> Still a problem on Solaris sparc:
>>
>> Martin, could you create a new change in webrev with the patch that
>> David sent?
>>
>> Regards,
>> Hiroshi
>> -----------------------
>> Hiroshi Horii, Ph.D.
>> IBM Research - Tokyo
>>
>>
>> David Holmes <david.holmes at oracle.com <mailto:david.holmes at oracle.com>>
>> wrote on 05/10/2016 21:29:53:
>>
>>> From: David Holmes <david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>>
>>> To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-runtime-
>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>"
>> <hotspot-runtime-dev at openjdk.java.net
>> <mailto:hotspot-runtime-dev at openjdk.java.net>>
>>> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com
>>> <mailto:Tim_Ellison at uk.ibm.com>>, "ppc-aix-port-
>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>"
>> <ppc-aix-port-dev at openjdk.java.net
>> <mailto:ppc-aix-port-dev at openjdk.java.net>>, "hotspot-
>>> gc-dev at openjdk.java.net <mailto:gc-dev at openjdk.java.net>"
>> <hotspot-gc-dev at openjdk.java.net
>> <mailto:hotspot-gc-dev at openjdk.java.net>>
>>> Date: 05/10/2016 21:31
>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>
>>> On 10/05/2016 9:04 PM, David Holmes wrote:
>>> > Hi Hiroshi,
>>> >
>>> > On 10/05/2016 8:44 PM, Hiroshi H Horii wrote:
>>> >> Hi All,
>>> >>
>>> >> Can I please request reviews for the following change?
>>> >>
>>> >> Code change:
>>> >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/
>>> >
>>> > Changes look good. I'm currently running them through our internal
>>> build
>>> > system. I will sponsor this and push the change through JPRT.
>>>
>>> Still a problem on Solaris sparc:
>>>
>>> "/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/
>>> atomic.inline.hpp",
>>> line 96: Error: Could not find a match for static Atomic::cmpxchg(signed
>>> char, volatile signed char*, signed char).
>>> 1 Error(s) detected.
>>>
>>> Needs this patch:
>>>
>>> diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp
>>> --- a/src/share/vm/runtime/atomic.inline.hpp
>>> +++ b/src/share/vm/runtime/atomic.inline.hpp
>>> @@ -92,7 +92,7 @@
>>>
>>>   #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE
>>>   // See comment in atomic.cpp how to override.
>>> -inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte
>>> *dest, jbyte comparand)
>>> +inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte
>>> *dest, jbyte comparand, cmpxchg_memory_order order)
>>>   {
>>>     return cmpxchg_general(exchange_value, dest, comparand);
>>>   }
>>>
>>> David
>>> -----
>>>
>>> > Just need another reviewer to chime in - given you and Martin are both
>>> > contributors. Or are you the main contributor with Martin being a
>>> reviewer?
>>> >
>>> > Thanks,
>>> > David
>>> >
>>> > PS. It's my night now so I'll be signing off and will pick this up in
>>> > the morning.
>>> >
>>> >> This change follows the discussion started from these mails.
>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>> April/018960.html
>>> >>
>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>> April/019148.html
>>> >>
>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>> May/019320.html
>>> >>
>>> >>
>>> >> Description:
>>> >> This change provides relaxed compare-and-exchange by introducing
>>> >> relaxed memory order. As described in atomic_linux_ppc.inline.hpp,
>>> >> the current implementation of cmpxchg is fence_cmpxchg_acquire.
>>> >> This implementation is useful for general purposes because twice
>>> calls of
>>> >> sync before and after cmpxchg will provide strict consistency.
>>> >> However, they sometimes cause overheads because sync instructions are
>>> >> very expensive in the current POWER chip design.
>>> >>
>>> >> We confirmed this change improves performance of copy_to_survivor
>>> >> in the parallel GC. However, we will need more investigation of GC
>>> >> by more experts. So, We would like to request a review of the change
>>> >> of cmpxchg first (as Martin requested).
>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>> April/019188.html
>>> >>
>>> >>
>>> >> Summary of source code changes:
>>> >>
>>> >> * src/share/vm/runtime/atomic.hpp
>>> >>      - Defines enum memory_order and adds a parameter to cmpxchg.
>>> >>
>>> >> * src/share/vm/runtime/atomic.cpp
>>> >> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
>>> >> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>>> >> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>>> >> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
>>> >> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
>>> >> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
>>> >> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
>>> >> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
>>> >> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
>>> >>      - Added a parameter for each cmpxchg function to follow
>>> >>         the change of atomic.hpp. Their implementations are not
>>> changed.
>>> >>
>>> >> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
>>> >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>> >>      - Added a parameter for each cmpxchg function to follow
>>> >>         the change of atomic.hpp. In addition, implementations
>>> >>         are changed corresponding to the specified memory_order.
>>> >>
>>> >> Regards,
>>> >> Hiroshi
>>> >> -----------------------
>>> >> Hiroshi Horii, Ph.D.
>>> >> IBM Research - Tokyo
>>> >>
>>>
>>

From gromero at linux.vnet.ibm.com  Wed May 11 21:06:41 2016
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Wed, 11 May 2016 18:06:41 -0300
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com>
References: <56FEDBB3.5030106@linux.vnet.ibm.com>
	<CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com>
Message-ID: <57339EE1.2040500@linux.vnet.ibm.com>

Hi Volker, Hi Martin

Sincere apologies for the long delay.

My initial approach to test the VSX load/store was from an
extracted snippet regarding just the mass copy loop "grafted" inside an inline
asm, performing isolated tests with "perf" tool focused only on aligned source and
destination (best case).

The extracted code, called "Original" in the plot below (black line), is here:
https://github.com/gromero/arraycopy/blob/2pairs/arraycopy.c#L27-L36

That extracted, after some experiments, evolved into this one that employs VSX
load/store, Data Stream deepest pre-fetch, d-cache touch, and backbranch aligned
to 32-byte:
https://github.com/gromero/arraycopy/blob/2pairs/arraycopy_vsx.c#L27-L41

All runs where "pinned" using `numactl --cpunodebind --membind` to avoid any
scheduler decision that could add noise to the measure.

VSX, deepest data pre-fetch, d-cache touch, and 32-bytes align proved to be better
in the isolated code (red line) in comparison to the original extracted code
(black line):
http://gromero.github.io/openjdk/original_vsx_non_pf_vsx_pf_deepest.pdf

So I proceeded to implement the VSX loop in OpenJDK based on the best case
result (VSX, pre-fetch deepest, d-cache touch, and backbranch target align -
goetz TODO note).

OpenJDK 8 webrev:
http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/8/

OpenJDK 9 webrev:
http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/9/

I've tested the change on OpenJDK 8 using this script that calls
System.arraycopy() on shorts:
https://goo.gl/8UWtLm

The results for all data alignment cases:
http://gromero.github.io/openjdk/src_0_dst_0.pdf
http://gromero.github.io/openjdk/src_1_dst_0.pdf
http://gromero.github.io/openjdk/src_0_dst_1.pdf
http://gromero.github.io/openjdk/src_1_dst_1.pdf

Martin, I added the vsx test to the feature-string. Regarding the ABI, I'm just
using two VSR: vsr0 and vsr1, both volatile.

Volker, as the loop unrolling was removed now the loop copies 16 elemets a time,
like the non-VSX loop, and not 32 elements. I just verified the change on Little
endian. Sorry I didn't understand your question regarding "instructions for
aligned load/stores". Did you mean instructions for unaligned load/stores? I think
both fixed-point (ld/std) and VSX instructions will do load/store slower in
unaligned scenario. However VMX load/store is different and expects aligned
operands. Thank you very much for opening the bug
https://bugs.openjdk.java.net/browse/JDK-8154156

I don't have the profiling per function for each SPEC{jbb,jvm} benchmark
in order to determine which one would stress the proposed change better.
Could I use a better benchmark?

Thank you!

Best regards,
Gustavo

On 05-04-2016 14:23, Volker Simonis wrote:
> Hi Gustavo,
> 
> thanks a lot for your contribution.
> 
> Can you please describe if you've run benchmarks and which performance
> improvements you saw?
> 
> With your change if we're running on Power 8, we will only use the
> fast path for arrays with at least 32 elements. For smaller arrays, we
> will fall-back to copying only 2 elements at a time which will be
> slower than the initial version which copied 4 at a time in that case.
> 
> Did you verified your changes on both, little and big endian?
> 
> And what about unaligned memory accesses? As far as I read,
> lxvd2x/stxvd2x still work, but may be slower. I saw there also exist
> instructions for aligned load/stores. Would it make sens
> (performance-wise) to use them for the cases where we can be sure that
> we have aligned memory accesses?
> 
> Thank you and best regards,
> Volker
> 
> 
> On Fri, Apr 1, 2016 at 10:36 PM, Gustavo Romero
> <gromero at linux.vnet.ibm.com> wrote:
>> Hi Martin, Hi Volker
>>
>> Currently VSX load/store instructions are not being used in PPC64 stubs,
>> particularly in arraycopy stubs inside generate_arraycopy_stubs() like,
>> but not limited to, generate_disjoint_{byte,short,int,long}_copy.
>>
>> We can speed up mass copy using VSX (Vector-Scalar Extension) load/store
>> instruction in processors >= POWER8, the same way it's already done for
>> libc memcpy().
>>
>> This is an initial patch just for jshort_disjoint_arraycopy() VSX vector
>> load/store:
>>
>> http://81.de.7a9f.ip4.static.sl-reverse.com/202539/webrev
>>
>> What are your thoughts on that? Is there any impediment to use VSX
>> instructions in OpenJDK at the moment?
>>
>> Thank you.
>>
>> Best regards,
>> Gustavo
>>
> 


From gromero at linux.vnet.ibm.com  Wed May 11 21:26:43 2016
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Wed, 11 May 2016 18:26:43 -0300
Subject: JVM 24.95 SIGSEGV on C2 Compiler Thread
In-Reply-To: <56EAC8EB.9020609@linux.vnet.ibm.com>
References: <56EAB89B.9050206@linux.vnet.ibm.com>
	<56EABF93.2070707@oracle.com> <56EAC8EB.9020609@linux.vnet.ibm.com>
Message-ID: <5733A393.8010005@linux.vnet.ibm.com>

Hi Tobias

Your wild guess on
https://bugs.openjdk.java.net/browse/JDK-6675699
was correct. Loop peeling issue was the culprit.

Also I finally was able to test exhaustively the code on OpenJDK 8
and could not reproduce it. Fixed on 8 as you said.

Thanks a lot.

Best regards,
Gustavo


On 17-03-2016 12:10, Gustavo Romero wrote:
> Hi Tobias,
> 
> I'll try to reproduce it with 8 on PPC64 and let you know about the
> result.
> 
> Thank you.
> 
> Regards,
> Gustavo
> 
> On 17-03-2016 11:30, Tobias Hartmann wrote:
>> Hi Gustavo,
>>
>> just a wild guess, but this could be one of
>> https://bugs.openjdk.java.net/browse/JDK-6675699
>> https://bugs.openjdk.java.net/browse/JDK-8027388
>>
>> Both were not backported to 7. Did you try to reproduce this with 8?
>>
>> Best regards,
>> Tobias
>>
>> On 17.03.2016 15:00, Gustavo Romero wrote:
>>> Hi Martin,
>>>
>>> I'm facing a problem with a JVM 24.95 when running an application.
>>> However it's being hard to reproduce it, ie many times C2 will
>>> optimize the method fine and the application terminates fine. Just
>>> after many complete runs, one of them will crash. Apparently it is
>>> related to https://bugs.openjdk.java.net/browse/JDK-7068051 bug,
>>> but it was fixed in hs22 and as I could not isolate it on PPC64,
>>> I can't tell if it still exists upstream on PPC64.
>>>
>>> Do you have any clue on how to isolate/debug this problem?
>>>
>>> Hotspot error log:
>>> http://hastebin.com/raw/pepajuwepu
>>>
>>> Backtrace from the thread that caused the segfault:
>>> http://hastebin.com/raw/zirelokuto
>>>
>>> Thank you!
>>>
>>> Best regards,
>>> Gustavo
>>>
>>
> 


From gromero at linux.vnet.ibm.com  Wed May 11 22:32:45 2016
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Wed, 11 May 2016 19:32:45 -0300
Subject: SIGILL crashes JVM on PPC64 LE
Message-ID: <5733B30D.6010201@linux.vnet.ibm.com>

Hi

I'm getting a nasty SIGILL that crashes the JVM on PPC64 LE.

hs_err log:
http://hastebin.com/raw/fovagunaci

The application employs methods from both java.nio.ByteBuffer and
sun.misc.Unsafe classes in order to write and read from an allocated buffer.

A interesting thing is that after debugging the instruction that caused the
said SIGILL:

   0x3fff902839a4:	cmpwi   cr6,r17,0
   0x3fff902839a8:	beq     cr6,0x3fff90283ae4
   0x3fff902839ac:	.long 0xea2f0013 <============ illegal instruction
   0x3fff902839b0:	add     r15,r15,r17
   0x3fff902839b4:	add     r14,r17,r14

I found that when its endianness is changed it turns out to be a valid
instruction: vsel v24,v0,v5,v31

However, I'm still unable to determine if it's an application issue, something
with JVM unsafe interface code, or something else.

Any clue on how to narrow down this SIGILL?

Thank you!

Regards,
Gustavo


From david.holmes at oracle.com  Wed May 11 22:50:21 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 12 May 2016 08:50:21 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <b375616e-df42-a0c4-7b22-579df6418f8a@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<b5434cda-58d2-eae1-57de-b09a46da438e@oracle.com>
	<b375616e-df42-a0c4-7b22-579df6418f8a@oracle.com>
Message-ID: <53b7bb82-89f0-bac5-5e5a-3234ffecdb50@oracle.com>

This has about 3 hours to be reviewed and pushed to make the FC deadline.

David

On 11/05/2016 2:41 PM, David Holmes wrote:
> Adding hotspot-dev to cc to expand scope of reviewer pool :)
>
> On 11/05/2016 6:56 AM, David Holmes wrote:
>> On 11/05/2016 12:27 AM, Doerr, Martin wrote:
>>> Hello everybody,
>>>
>>> thanks for finding this issue. New webrev is here:
>>>
>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/
>>
>> Unfortunately my test run hit a crash on Solaris sparc:
>>
>> # Problematic frame:
>> # V  [libjvm.so+0xcc35c4]
>> markOopDesc*markOopDesc::displaced_mark_helper()const+0x64
>>
>> I'm going to have to do some more testing to see if that is actually
>> related to the change. I know it should not be, but given we CAS marks I
>> have to wonder if there's some subtle bad interaction here. :(
>
> Further testing has not shown any failures on Solaris sparc, and the
> same testing showed some spurious failures on other platforms even
> without these changes. So while I will file a bug for this crash I think
> it unlikely to be related to the current changes.
>
> So on that note we just need a second hotspot reviewer to sign off on this.
>
> Thanks,
> David
>
>
>> David
>> -----
>>
>>>
>>>
>>> Best regards,
>>>
>>> Martin
>>>
>>>
>>>
>>> *From:*Hiroshi H Horii [mailto:HORII at jp.ibm.com]
>>> *Sent:* Dienstag, 10. Mai 2016 15:18
>>> *To:* David Holmes <david.holmes at oracle.com>
>>> *Cc:* hotspot-gc-dev at openjdk.java.net;
>>> hotspot-runtime-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net;
>>> Tim Ellison <Tim_Ellison at uk.ibm.com>; Doerr, Martin
>>> <martin.doerr at sap.com>
>>> *Subject:* Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>
>>>
>>>
>>> Hi David,
>>>
>>>> Just need another reviewer to chime in - given you and Martin are both
>>>> contributors. Or are you the main contributor with Martin being a
>>>> reviewer?
>>>
>>> Martin and I are contributors of this change.
>>>
>>>> Still a problem on Solaris sparc:
>>>
>>> Martin, could you create a new change in webrev with the patch that
>>> David sent?
>>>
>>> Regards,
>>> Hiroshi
>>> -----------------------
>>> Hiroshi Horii, Ph.D.
>>> IBM Research - Tokyo
>>>
>>>
>>> David Holmes <david.holmes at oracle.com <mailto:david.holmes at oracle.com>>
>>> wrote on 05/10/2016 21:29:53:
>>>
>>>> From: David Holmes <david.holmes at oracle.com
>>>> <mailto:david.holmes at oracle.com>>
>>>> To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-runtime-
>>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>"
>>> <hotspot-runtime-dev at openjdk.java.net
>>> <mailto:hotspot-runtime-dev at openjdk.java.net>>
>>>> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com
>>>> <mailto:Tim_Ellison at uk.ibm.com>>, "ppc-aix-port-
>>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>"
>>> <ppc-aix-port-dev at openjdk.java.net
>>> <mailto:ppc-aix-port-dev at openjdk.java.net>>, "hotspot-
>>>> gc-dev at openjdk.java.net <mailto:gc-dev at openjdk.java.net>"
>>> <hotspot-gc-dev at openjdk.java.net
>>> <mailto:hotspot-gc-dev at openjdk.java.net>>
>>>> Date: 05/10/2016 21:31
>>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>>
>>>> On 10/05/2016 9:04 PM, David Holmes wrote:
>>>> > Hi Hiroshi,
>>>> >
>>>> > On 10/05/2016 8:44 PM, Hiroshi H Horii wrote:
>>>> >> Hi All,
>>>> >>
>>>> >> Can I please request reviews for the following change?
>>>> >>
>>>> >> Code change:
>>>> >> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/
>>>> >
>>>> > Changes look good. I'm currently running them through our internal
>>>> build
>>>> > system. I will sponsor this and push the change through JPRT.
>>>>
>>>> Still a problem on Solaris sparc:
>>>>
>>>> "/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/
>>>> atomic.inline.hpp",
>>>> line 96: Error: Could not find a match for static
>>>> Atomic::cmpxchg(signed
>>>> char, volatile signed char*, signed char).
>>>> 1 Error(s) detected.
>>>>
>>>> Needs this patch:
>>>>
>>>> diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp
>>>> --- a/src/share/vm/runtime/atomic.inline.hpp
>>>> +++ b/src/share/vm/runtime/atomic.inline.hpp
>>>> @@ -92,7 +92,7 @@
>>>>
>>>>   #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE
>>>>   // See comment in atomic.cpp how to override.
>>>> -inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte
>>>> *dest, jbyte comparand)
>>>> +inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte
>>>> *dest, jbyte comparand, cmpxchg_memory_order order)
>>>>   {
>>>>     return cmpxchg_general(exchange_value, dest, comparand);
>>>>   }
>>>>
>>>> David
>>>> -----
>>>>
>>>> > Just need another reviewer to chime in - given you and Martin are
>>>> both
>>>> > contributors. Or are you the main contributor with Martin being a
>>>> reviewer?
>>>> >
>>>> > Thanks,
>>>> > David
>>>> >
>>>> > PS. It's my night now so I'll be signing off and will pick this up in
>>>> > the morning.
>>>> >
>>>> >> This change follows the discussion started from these mails.
>>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>>> April/018960.html
>>>> >>
>>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>>> April/019148.html
>>>> >>
>>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>>> May/019320.html
>>>> >>
>>>> >>
>>>> >> Description:
>>>> >> This change provides relaxed compare-and-exchange by introducing
>>>> >> relaxed memory order. As described in atomic_linux_ppc.inline.hpp,
>>>> >> the current implementation of cmpxchg is fence_cmpxchg_acquire.
>>>> >> This implementation is useful for general purposes because twice
>>>> calls of
>>>> >> sync before and after cmpxchg will provide strict consistency.
>>>> >> However, they sometimes cause overheads because sync instructions
>>>> are
>>>> >> very expensive in the current POWER chip design.
>>>> >>
>>>> >> We confirmed this change improves performance of copy_to_survivor
>>>> >> in the parallel GC. However, we will need more investigation of GC
>>>> >> by more experts. So, We would like to request a review of the change
>>>> >> of cmpxchg first (as Martin requested).
>>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>>> April/019188.html
>>>> >>
>>>> >>
>>>> >> Summary of source code changes:
>>>> >>
>>>> >> * src/share/vm/runtime/atomic.hpp
>>>> >>      - Defines enum memory_order and adds a parameter to cmpxchg.
>>>> >>
>>>> >> * src/share/vm/runtime/atomic.cpp
>>>> >> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
>>>> >> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>>>> >> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>>>> >> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
>>>> >> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
>>>> >> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
>>>> >> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
>>>> >> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
>>>> >> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
>>>> >>      - Added a parameter for each cmpxchg function to follow
>>>> >>         the change of atomic.hpp. Their implementations are not
>>>> changed.
>>>> >>
>>>> >> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
>>>> >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>>> >>      - Added a parameter for each cmpxchg function to follow
>>>> >>         the change of atomic.hpp. In addition, implementations
>>>> >>         are changed corresponding to the specified memory_order.
>>>> >>
>>>> >> Regards,
>>>> >> Hiroshi
>>>> >> -----------------------
>>>> >> Hiroshi Horii, Ph.D.
>>>> >> IBM Research - Tokyo
>>>> >>
>>>>
>>>

From tobias.hartmann at oracle.com  Thu May 12 06:36:00 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Thu, 12 May 2016 08:36:00 +0200
Subject: JVM 24.95 SIGSEGV on C2 Compiler Thread
In-Reply-To: <5733A393.8010005@linux.vnet.ibm.com>
References: <56EAB89B.9050206@linux.vnet.ibm.com>
	<56EABF93.2070707@oracle.com> <56EAC8EB.9020609@linux.vnet.ibm.com>
	<5733A393.8010005@linux.vnet.ibm.com>
Message-ID: <57342450.5050901@oracle.com>

Hi Gustavo,

On 11.05.2016 23:26, Gustavo Romero wrote:
> Hi Tobias
> 
> Your wild guess on
> https://bugs.openjdk.java.net/browse/JDK-6675699
> was correct. Loop peeling issue was the culprit.
> 
> Also I finally was able to test exhaustively the code on OpenJDK 8
> and could not reproduce it. Fixed on 8 as you said.

Good, thanks for the update!

[CC'ing hotspot-dev for the record]

Best regards,
Tobias

> Thanks a lot.
> 
> Best regards,
> Gustavo
> 
> 
> On 17-03-2016 12:10, Gustavo Romero wrote:
>> Hi Tobias,
>>
>> I'll try to reproduce it with 8 on PPC64 and let you know about the
>> result.
>>
>> Thank you.
>>
>> Regards,
>> Gustavo
>>
>> On 17-03-2016 11:30, Tobias Hartmann wrote:
>>> Hi Gustavo,
>>>
>>> just a wild guess, but this could be one of
>>> https://bugs.openjdk.java.net/browse/JDK-6675699
>>> https://bugs.openjdk.java.net/browse/JDK-8027388
>>>
>>> Both were not backported to 7. Did you try to reproduce this with 8?
>>>
>>> Best regards,
>>> Tobias
>>>
>>> On 17.03.2016 15:00, Gustavo Romero wrote:
>>>> Hi Martin,
>>>>
>>>> I'm facing a problem with a JVM 24.95 when running an application.
>>>> However it's being hard to reproduce it, ie many times C2 will
>>>> optimize the method fine and the application terminates fine. Just
>>>> after many complete runs, one of them will crash. Apparently it is
>>>> related to https://bugs.openjdk.java.net/browse/JDK-7068051 bug,
>>>> but it was fixed in hs22 and as I could not isolate it on PPC64,
>>>> I can't tell if it still exists upstream on PPC64.
>>>>
>>>> Do you have any clue on how to isolate/debug this problem?
>>>>
>>>> Hotspot error log:
>>>> http://hastebin.com/raw/pepajuwepu
>>>>
>>>> Backtrace from the thread that caused the segfault:
>>>> http://hastebin.com/raw/zirelokuto
>>>>
>>>> Thank you!
>>>>
>>>> Best regards,
>>>> Gustavo
>>>>
>>>
>>
> 

From goetz.lindenmaier at sap.com  Thu May 12 08:50:09 2016
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Thu, 12 May 2016 08:50:09 +0000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <53b7bb82-89f0-bac5-5e5a-3234ffecdb50@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<b5434cda-58d2-eae1-57de-b09a46da438e@oracle.com>
	<b375616e-df42-a0c4-7b22-579df6418f8a@oracle.com>
	<53b7bb82-89f0-bac5-5e5a-3234ffecdb50@oracle.com>
Message-ID: <d6aa7a27c24b47b5a04a567906654130@DEWDFE13DE09.global.corp.sap>

Hi,

atomic_bsd_zero.inline.hpp:303
The order argument is not passed on to the inner cmpxchg_ptr call.
But I guess this is not really relevant as the argument is not used 
anyways.  (This method should be moved to the shared atomic.inline.hpp 
file, but not in this change.)

Besides that the change looks good.   Reviewed.

In case this now really is too late (http://openjdk.java.net/projects/jdk9/ states 26.5. for dev close, but hs is more early?)
will there be jdk10 repos soon, or jdk9u?

Best regards,
  Goetz.


> -----Original Message-----
> From: hotspot-gc-dev [mailto:hotspot-gc-dev-bounces at openjdk.java.net]
> On Behalf Of David Holmes
> Sent: Donnerstag, 12. Mai 2016 00:50
> To: Doerr, Martin <martin.doerr at sap.com>; Hiroshi H Horii
> <HORII at jp.ibm.com>
> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-
> dev at openjdk.java.net; hotspot-dev developers <hotspot-
> dev at openjdk.java.net>; hotspot-gc-dev at openjdk.java.net; hotspot-
> runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
> 
> This has about 3 hours to be reviewed and pushed to make the FC deadline.
> 
> David
> 
> On 11/05/2016 2:41 PM, David Holmes wrote:
> > Adding hotspot-dev to cc to expand scope of reviewer pool :)
> >
> > On 11/05/2016 6:56 AM, David Holmes wrote:
> >> On 11/05/2016 12:27 AM, Doerr, Martin wrote:
> >>> Hello everybody,
> >>>
> >>> thanks for finding this issue. New webrev is here:
> >>>
> >>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/
> >>
> >> Unfortunately my test run hit a crash on Solaris sparc:
> >>
> >> # Problematic frame:
> >> # V  [libjvm.so+0xcc35c4]
> >> markOopDesc*markOopDesc::displaced_mark_helper()const+0x64
> >>
> >> I'm going to have to do some more testing to see if that is actually
> >> related to the change. I know it should not be, but given we CAS marks I
> >> have to wonder if there's some subtle bad interaction here. :(
> >
> > Further testing has not shown any failures on Solaris sparc, and the
> > same testing showed some spurious failures on other platforms even
> > without these changes. So while I will file a bug for this crash I think
> > it unlikely to be related to the current changes.
> >
> > So on that note we just need a second hotspot reviewer to sign off on this.
> >
> > Thanks,
> > David
> >
> >
> >> David
> >> -----
> >>
> >>>
> >>>
> >>> Best regards,
> >>>
> >>> Martin
> >>>
> >>>
> >>>
> >>> *From:*Hiroshi H Horii [mailto:HORII at jp.ibm.com]
> >>> *Sent:* Dienstag, 10. Mai 2016 15:18
> >>> *To:* David Holmes <david.holmes at oracle.com>
> >>> *Cc:* hotspot-gc-dev at openjdk.java.net;
> >>> hotspot-runtime-dev at openjdk.java.net; ppc-aix-port-
> dev at openjdk.java.net;
> >>> Tim Ellison <Tim_Ellison at uk.ibm.com>; Doerr, Martin
> >>> <martin.doerr at sap.com>
> >>> *Subject:* Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
> >>>
> >>>
> >>>
> >>> Hi David,
> >>>
> >>>> Just need another reviewer to chime in - given you and Martin are both
> >>>> contributors. Or are you the main contributor with Martin being a
> >>>> reviewer?
> >>>
> >>> Martin and I are contributors of this change.
> >>>
> >>>> Still a problem on Solaris sparc:
> >>>
> >>> Martin, could you create a new change in webrev with the patch that
> >>> David sent?
> >>>
> >>> Regards,
> >>> Hiroshi
> >>> -----------------------
> >>> Hiroshi Horii, Ph.D.
> >>> IBM Research - Tokyo
> >>>
> >>>
> >>> David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>>
> >>> wrote on 05/10/2016 21:29:53:
> >>>
> >>>> From: David Holmes <david.holmes at oracle.com
> >>>> <mailto:david.holmes at oracle.com>>
> >>>> To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-runtime-
> >>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>"
> >>> <hotspot-runtime-dev at openjdk.java.net
> >>> <mailto:hotspot-runtime-dev at openjdk.java.net>>
> >>>> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com
> >>>> <mailto:Tim_Ellison at uk.ibm.com>>, "ppc-aix-port-
> >>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>"
> >>> <ppc-aix-port-dev at openjdk.java.net
> >>> <mailto:ppc-aix-port-dev at openjdk.java.net>>, "hotspot-
> >>>> gc-dev at openjdk.java.net <mailto:gc-dev at openjdk.java.net>"
> >>> <hotspot-gc-dev at openjdk.java.net
> >>> <mailto:hotspot-gc-dev at openjdk.java.net>>
> >>>> Date: 05/10/2016 21:31
> >>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
> >>>>
> >>>> On 10/05/2016 9:04 PM, David Holmes wrote:
> >>>> > Hi Hiroshi,
> >>>> >
> >>>> > On 10/05/2016 8:44 PM, Hiroshi H Horii wrote:
> >>>> >> Hi All,
> >>>> >>
> >>>> >> Can I please request reviews for the following change?
> >>>> >>
> >>>> >> Code change:
> >>>> >>
> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/
> >>>> >
> >>>> > Changes look good. I'm currently running them through our internal
> >>>> build
> >>>> > system. I will sponsor this and push the change through JPRT.
> >>>>
> >>>> Still a problem on Solaris sparc:
> >>>>
> >>>> "/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/
> >>>> atomic.inline.hpp",
> >>>> line 96: Error: Could not find a match for static
> >>>> Atomic::cmpxchg(signed
> >>>> char, volatile signed char*, signed char).
> >>>> 1 Error(s) detected.
> >>>>
> >>>> Needs this patch:
> >>>>
> >>>> diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp
> >>>> --- a/src/share/vm/runtime/atomic.inline.hpp
> >>>> +++ b/src/share/vm/runtime/atomic.inline.hpp
> >>>> @@ -92,7 +92,7 @@
> >>>>
> >>>>   #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE
> >>>>   // See comment in atomic.cpp how to override.
> >>>> -inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte
> >>>> *dest, jbyte comparand)
> >>>> +inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte
> >>>> *dest, jbyte comparand, cmpxchg_memory_order order)
> >>>>   {
> >>>>     return cmpxchg_general(exchange_value, dest, comparand);
> >>>>   }
> >>>>
> >>>> David
> >>>> -----
> >>>>
> >>>> > Just need another reviewer to chime in - given you and Martin are
> >>>> both
> >>>> > contributors. Or are you the main contributor with Martin being a
> >>>> reviewer?
> >>>> >
> >>>> > Thanks,
> >>>> > David
> >>>> >
> >>>> > PS. It's my night now so I'll be signing off and will pick this up in
> >>>> > the morning.
> >>>> >
> >>>> >> This change follows the discussion started from these mails.
> >>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> >>>> April/018960.html
> >>>> >>
> >>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> >>>> April/019148.html
> >>>> >>
> >>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> >>>> May/019320.html
> >>>> >>
> >>>> >>
> >>>> >> Description:
> >>>> >> This change provides relaxed compare-and-exchange by introducing
> >>>> >> relaxed memory order. As described in
> atomic_linux_ppc.inline.hpp,
> >>>> >> the current implementation of cmpxchg is fence_cmpxchg_acquire.
> >>>> >> This implementation is useful for general purposes because twice
> >>>> calls of
> >>>> >> sync before and after cmpxchg will provide strict consistency.
> >>>> >> However, they sometimes cause overheads because sync
> instructions
> >>>> are
> >>>> >> very expensive in the current POWER chip design.
> >>>> >>
> >>>> >> We confirmed this change improves performance of
> copy_to_survivor
> >>>> >> in the parallel GC. However, we will need more investigation of GC
> >>>> >> by more experts. So, We would like to request a review of the
> change
> >>>> >> of cmpxchg first (as Martin requested).
> >>>> >> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
> >>>> April/019188.html
> >>>> >>
> >>>> >>
> >>>> >> Summary of source code changes:
> >>>> >>
> >>>> >> * src/share/vm/runtime/atomic.hpp
> >>>> >>      - Defines enum memory_order and adds a parameter to
> cmpxchg.
> >>>> >>
> >>>> >> * src/share/vm/runtime/atomic.cpp
> >>>> >> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
> >>>> >> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
> >>>> >> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
> >>>> >> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
> >>>> >> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
> >>>> >> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
> >>>> >> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
> >>>> >> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
> >>>> >> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
> >>>> >>      - Added a parameter for each cmpxchg function to follow
> >>>> >>         the change of atomic.hpp. Their implementations are not
> >>>> changed.
> >>>> >>
> >>>> >> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
> >>>> >> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
> >>>> >>      - Added a parameter for each cmpxchg function to follow
> >>>> >>         the change of atomic.hpp. In addition, implementations
> >>>> >>         are changed corresponding to the specified memory_order.
> >>>> >>
> >>>> >> Regards,
> >>>> >> Hiroshi
> >>>> >> -----------------------
> >>>> >> Hiroshi Horii, Ph.D.
> >>>> >> IBM Research - Tokyo
> >>>> >>
> >>>>
> >>>

From martin.doerr at sap.com  Thu May 12 09:33:03 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 12 May 2016 09:33:03 +0000
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <57339EE1.2040500@linux.vnet.ibm.com>
References: <56FEDBB3.5030106@linux.vnet.ibm.com>
	<CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com>
	<57339EE1.2040500@linux.vnet.ibm.com>
Message-ID: <da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>

Hi Gustavo,

thanks for providing the webrevs. The change looks basically good.

I only have the following concerns:
- We basically support configuring dscr by various DSCR switches. Your code resets the value to hardware default instead of the possibly modified values. We're currently only using default DSCR values, but maybe we may want to play with them in the future.
We could use a static variable for the default dscr value. It could be modified in VM_Version::config_dscr() and used by your restore code (load_const_optimized(tmp1, ...) instead of li(tmp1, 0)).

- The PPC-elf64abi-1.9 says: "Functions must ensure that the appropriate bits in the vrsave register are set for any vector registers they use. ...". I think not touching vrsave is the right thing for AIX and ppc64le, but I think we will either have to skip the optimization on ppc64 big endian or handle vrsave. Do you agree?

Best regards,
Martin


-----Original Message-----
From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] 
Sent: Mittwoch, 11. Mai 2016 23:07
To: Volker Simonis <volker.simonis at gmail.com>
Cc: Doerr, Martin <martin.doerr at sap.com>; Simonis, Volker <volker.simonis at sap.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; brenohl at br.ibm.com
Subject: Re: PPC64 VSX load/store instructions in stubs
Importance: High

Hi Volker, Hi Martin

Sincere apologies for the long delay.

My initial approach to test the VSX load/store was from an
extracted snippet regarding just the mass copy loop "grafted" inside an inline
asm, performing isolated tests with "perf" tool focused only on aligned source and
destination (best case).

The extracted code, called "Original" in the plot below (black line), is here:
https://github.com/gromero/arraycopy/blob/2pairs/arraycopy.c#L27-L36

That extracted, after some experiments, evolved into this one that employs VSX
load/store, Data Stream deepest pre-fetch, d-cache touch, and backbranch aligned
to 32-byte:
https://github.com/gromero/arraycopy/blob/2pairs/arraycopy_vsx.c#L27-L41

All runs where "pinned" using `numactl --cpunodebind --membind` to avoid any
scheduler decision that could add noise to the measure.

VSX, deepest data pre-fetch, d-cache touch, and 32-bytes align proved to be better
in the isolated code (red line) in comparison to the original extracted code
(black line):
http://gromero.github.io/openjdk/original_vsx_non_pf_vsx_pf_deepest.pdf

So I proceeded to implement the VSX loop in OpenJDK based on the best case
result (VSX, pre-fetch deepest, d-cache touch, and backbranch target align -
goetz TODO note).

OpenJDK 8 webrev:
http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/8/

OpenJDK 9 webrev:
http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/9/

I've tested the change on OpenJDK 8 using this script that calls
System.arraycopy() on shorts:
https://goo.gl/8UWtLm

The results for all data alignment cases:
http://gromero.github.io/openjdk/src_0_dst_0.pdf
http://gromero.github.io/openjdk/src_1_dst_0.pdf
http://gromero.github.io/openjdk/src_0_dst_1.pdf
http://gromero.github.io/openjdk/src_1_dst_1.pdf

Martin, I added the vsx test to the feature-string. Regarding the ABI, I'm just
using two VSR: vsr0 and vsr1, both volatile.

Volker, as the loop unrolling was removed now the loop copies 16 elemets a time,
like the non-VSX loop, and not 32 elements. I just verified the change on Little
endian. Sorry I didn't understand your question regarding "instructions for
aligned load/stores". Did you mean instructions for unaligned load/stores? I think
both fixed-point (ld/std) and VSX instructions will do load/store slower in
unaligned scenario. However VMX load/store is different and expects aligned
operands. Thank you very much for opening the bug
https://bugs.openjdk.java.net/browse/JDK-8154156

I don't have the profiling per function for each SPEC{jbb,jvm} benchmark
in order to determine which one would stress the proposed change better.
Could I use a better benchmark?

Thank you!

Best regards,
Gustavo

On 05-04-2016 14:23, Volker Simonis wrote:
> Hi Gustavo,
> 
> thanks a lot for your contribution.
> 
> Can you please describe if you've run benchmarks and which performance
> improvements you saw?
> 
> With your change if we're running on Power 8, we will only use the
> fast path for arrays with at least 32 elements. For smaller arrays, we
> will fall-back to copying only 2 elements at a time which will be
> slower than the initial version which copied 4 at a time in that case.
> 
> Did you verified your changes on both, little and big endian?
> 
> And what about unaligned memory accesses? As far as I read,
> lxvd2x/stxvd2x still work, but may be slower. I saw there also exist
> instructions for aligned load/stores. Would it make sens
> (performance-wise) to use them for the cases where we can be sure that
> we have aligned memory accesses?
> 
> Thank you and best regards,
> Volker
> 
> 
> On Fri, Apr 1, 2016 at 10:36 PM, Gustavo Romero
> <gromero at linux.vnet.ibm.com> wrote:
>> Hi Martin, Hi Volker
>>
>> Currently VSX load/store instructions are not being used in PPC64 stubs,
>> particularly in arraycopy stubs inside generate_arraycopy_stubs() like,
>> but not limited to, generate_disjoint_{byte,short,int,long}_copy.
>>
>> We can speed up mass copy using VSX (Vector-Scalar Extension) load/store
>> instruction in processors >= POWER8, the same way it's already done for
>> libc memcpy().
>>
>> This is an initial patch just for jshort_disjoint_arraycopy() VSX vector
>> load/store:
>>
>> http://81.de.7a9f.ip4.static.sl-reverse.com/202539/webrev
>>
>> What are your thoughts on that? Is there any impediment to use VSX
>> instructions in OpenJDK at the moment?
>>
>> Thank you.
>>
>> Best regards,
>> Gustavo
>>
> 


From david.holmes at oracle.com  Thu May 12 09:52:14 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 12 May 2016 19:52:14 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <d6aa7a27c24b47b5a04a567906654130@DEWDFE13DE09.global.corp.sap>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<b5434cda-58d2-eae1-57de-b09a46da438e@oracle.com>
	<b375616e-df42-a0c4-7b22-579df6418f8a@oracle.com>
	<53b7bb82-89f0-bac5-5e5a-3234ffecdb50@oracle.com>
	<d6aa7a27c24b47b5a04a567906654130@DEWDFE13DE09.global.corp.sap>
Message-ID: <63c2519a-7909-804c-b30c-bd8ee814a328@oracle.com>

On 12/05/2016 6:50 PM, Lindenmaier, Goetz wrote:
> Hi,
>
> atomic_bsd_zero.inline.hpp:303
> The order argument is not passed on to the inner cmpxchg_ptr call.
> But I guess this is not really relevant as the argument is not used

Right - that pattern is used throughout the changes.

> anyways.  (This method should be moved to the shared atomic.inline.hpp
> file, but not in this change.)
>
> Besides that the change looks good.   Reviewed.
>
> In case this now really is too late (http://openjdk.java.net/projects/jdk9/ states 26.5. for dev close, but hs is more early?)
> will there be jdk10 repos soon, or jdk9u?

Yes hs has to finalize sooner as the FC date is for things to be in 
jdk9/jdk9 and it takes time for changes to get from hs to jdk9.

There will be a process for requesting approval for changes post FC but 
that hasn't yet been announced either.

No word yet on when jdk10 forests will open up.

David
-----

> Best regards,
>   Goetz.
>
>
>
>
>
>> -----Original Message-----
>> From: hotspot-gc-dev [mailto:hotspot-gc-dev-bounces at openjdk.java.net]
>> On Behalf Of David Holmes
>> Sent: Donnerstag, 12. Mai 2016 00:50
>> To: Doerr, Martin <martin.doerr at sap.com>; Hiroshi H Horii
>> <HORII at jp.ibm.com>
>> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-
>> dev at openjdk.java.net; hotspot-dev developers <hotspot-
>> dev at openjdk.java.net>; hotspot-gc-dev at openjdk.java.net; hotspot-
>> runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>
>> This has about 3 hours to be reviewed and pushed to make the FC deadline.
>>
>> David
>>
>> On 11/05/2016 2:41 PM, David Holmes wrote:
>>> Adding hotspot-dev to cc to expand scope of reviewer pool :)
>>>
>>> On 11/05/2016 6:56 AM, David Holmes wrote:
>>>> On 11/05/2016 12:27 AM, Doerr, Martin wrote:
>>>>> Hello everybody,
>>>>>
>>>>> thanks for finding this issue. New webrev is here:
>>>>>
>>>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/
>>>>
>>>> Unfortunately my test run hit a crash on Solaris sparc:
>>>>
>>>> # Problematic frame:
>>>> # V  [libjvm.so+0xcc35c4]
>>>> markOopDesc*markOopDesc::displaced_mark_helper()const+0x64
>>>>
>>>> I'm going to have to do some more testing to see if that is actually
>>>> related to the change. I know it should not be, but given we CAS marks I
>>>> have to wonder if there's some subtle bad interaction here. :(
>>>
>>> Further testing has not shown any failures on Solaris sparc, and the
>>> same testing showed some spurious failures on other platforms even
>>> without these changes. So while I will file a bug for this crash I think
>>> it unlikely to be related to the current changes.
>>>
>>> So on that note we just need a second hotspot reviewer to sign off on this.
>>>
>>> Thanks,
>>> David
>>>
>>>
>>>> David
>>>> -----
>>>>
>>>>>
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>>
>>>>> *From:*Hiroshi H Horii [mailto:HORII at jp.ibm.com]
>>>>> *Sent:* Dienstag, 10. Mai 2016 15:18
>>>>> *To:* David Holmes <david.holmes at oracle.com>
>>>>> *Cc:* hotspot-gc-dev at openjdk.java.net;
>>>>> hotspot-runtime-dev at openjdk.java.net; ppc-aix-port-
>> dev at openjdk.java.net;
>>>>> Tim Ellison <Tim_Ellison at uk.ibm.com>; Doerr, Martin
>>>>> <martin.doerr at sap.com>
>>>>> *Subject:* Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>>>
>>>>>
>>>>>
>>>>> Hi David,
>>>>>
>>>>>> Just need another reviewer to chime in - given you and Martin are both
>>>>>> contributors. Or are you the main contributor with Martin being a
>>>>>> reviewer?
>>>>>
>>>>> Martin and I are contributors of this change.
>>>>>
>>>>>> Still a problem on Solaris sparc:
>>>>>
>>>>> Martin, could you create a new change in webrev with the patch that
>>>>> David sent?
>>>>>
>>>>> Regards,
>>>>> Hiroshi
>>>>> -----------------------
>>>>> Hiroshi Horii, Ph.D.
>>>>> IBM Research - Tokyo
>>>>>
>>>>>
>>>>> David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>>
>>>>> wrote on 05/10/2016 21:29:53:
>>>>>
>>>>>> From: David Holmes <david.holmes at oracle.com
>>>>>> <mailto:david.holmes at oracle.com>>
>>>>>> To: Hiroshi H Horii/Japan/IBM at IBMJP, "hotspot-runtime-
>>>>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>"
>>>>> <hotspot-runtime-dev at openjdk.java.net
>>>>> <mailto:hotspot-runtime-dev at openjdk.java.net>>
>>>>>> Cc: Tim Ellison <Tim_Ellison at uk.ibm.com
>>>>>> <mailto:Tim_Ellison at uk.ibm.com>>, "ppc-aix-port-
>>>>>> dev at openjdk.java.net <mailto:dev at openjdk.java.net>"
>>>>> <ppc-aix-port-dev at openjdk.java.net
>>>>> <mailto:ppc-aix-port-dev at openjdk.java.net>>, "hotspot-
>>>>>> gc-dev at openjdk.java.net <mailto:gc-dev at openjdk.java.net>"
>>>>> <hotspot-gc-dev at openjdk.java.net
>>>>> <mailto:hotspot-gc-dev at openjdk.java.net>>
>>>>>> Date: 05/10/2016 21:31
>>>>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>>>>
>>>>>> On 10/05/2016 9:04 PM, David Holmes wrote:
>>>>>>> Hi Hiroshi,
>>>>>>>
>>>>>>> On 10/05/2016 8:44 PM, Hiroshi H Horii wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> Can I please request reviews for the following change?
>>>>>>>>
>>>>>>>> Code change:
>>>>>>>>
>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.01/
>>>>>>>
>>>>>>> Changes look good. I'm currently running them through our internal
>>>>>> build
>>>>>>> system. I will sponsor this and push the change through JPRT.
>>>>>>
>>>>>> Still a problem on Solaris sparc:
>>>>>>
>>>>>> "/opt/jprt/T/P1/102505.daholme/s/hotspot/src/share/vm/runtime/
>>>>>> atomic.inline.hpp",
>>>>>> line 96: Error: Could not find a match for static
>>>>>> Atomic::cmpxchg(signed
>>>>>> char, volatile signed char*, signed char).
>>>>>> 1 Error(s) detected.
>>>>>>
>>>>>> Needs this patch:
>>>>>>
>>>>>> diff -r 68853ef19be9 src/share/vm/runtime/atomic.inline.hpp
>>>>>> --- a/src/share/vm/runtime/atomic.inline.hpp
>>>>>> +++ b/src/share/vm/runtime/atomic.inline.hpp
>>>>>> @@ -92,7 +92,7 @@
>>>>>>
>>>>>>   #ifndef VM_HAS_SPECIALIZED_CMPXCHG_BYTE
>>>>>>   // See comment in atomic.cpp how to override.
>>>>>> -inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte
>>>>>> *dest, jbyte comparand)
>>>>>> +inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte
>>>>>> *dest, jbyte comparand, cmpxchg_memory_order order)
>>>>>>   {
>>>>>>     return cmpxchg_general(exchange_value, dest, comparand);
>>>>>>   }
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> Just need another reviewer to chime in - given you and Martin are
>>>>>> both
>>>>>>> contributors. Or are you the main contributor with Martin being a
>>>>>> reviewer?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>> PS. It's my night now so I'll be signing off and will pick this up in
>>>>>>> the morning.
>>>>>>>
>>>>>>>> This change follows the discussion started from these mails.
>>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>>>>> April/018960.html
>>>>>>>>
>>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>>>>> April/019148.html
>>>>>>>>
>>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>>>>> May/019320.html
>>>>>>>>
>>>>>>>>
>>>>>>>> Description:
>>>>>>>> This change provides relaxed compare-and-exchange by introducing
>>>>>>>> relaxed memory order. As described in
>> atomic_linux_ppc.inline.hpp,
>>>>>>>> the current implementation of cmpxchg is fence_cmpxchg_acquire.
>>>>>>>> This implementation is useful for general purposes because twice
>>>>>> calls of
>>>>>>>> sync before and after cmpxchg will provide strict consistency.
>>>>>>>> However, they sometimes cause overheads because sync
>> instructions
>>>>>> are
>>>>>>>> very expensive in the current POWER chip design.
>>>>>>>>
>>>>>>>> We confirmed this change improves performance of
>> copy_to_survivor
>>>>>>>> in the parallel GC. However, we will need more investigation of GC
>>>>>>>> by more experts. So, We would like to request a review of the
>> change
>>>>>>>> of cmpxchg first (as Martin requested).
>>>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-
>>>>>> April/019188.html
>>>>>>>>
>>>>>>>>
>>>>>>>> Summary of source code changes:
>>>>>>>>
>>>>>>>> * src/share/vm/runtime/atomic.hpp
>>>>>>>>      - Defines enum memory_order and adds a parameter to
>> cmpxchg.
>>>>>>>>
>>>>>>>> * src/share/vm/runtime/atomic.cpp
>>>>>>>> * src/os_cpu/bsd_x86/vm/atomic_bsd_x86.inline.hpp
>>>>>>>> * src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>>>>>>>> * src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>>>>>>>> * src/os_cpu/linux_sparc/vm/atomic_linux_sparc.inline.hpp
>>>>>>>> * src/os_cpu/linux_x86/vm/atomic_linux_x86.inline.hpp
>>>>>>>> * src/os_cpu/linux_zero/vm/atomic_linux_zero.inline.hpp
>>>>>>>> * src/os_cpu/solaris_sparc/vm/atomic_solaris_sparc.inline.hpp
>>>>>>>> * src/os_cpu/solaris_x86/vm/atomic_solaris_x86.inline.hpp
>>>>>>>> * src/os_cpu/windows_x86/vm/atomic_windows_x86.inline.hpp
>>>>>>>>      - Added a parameter for each cmpxchg function to follow
>>>>>>>>         the change of atomic.hpp. Their implementations are not
>>>>>> changed.
>>>>>>>>
>>>>>>>> * src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp
>>>>>>>> * src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>>>>>>>      - Added a parameter for each cmpxchg function to follow
>>>>>>>>         the change of atomic.hpp. In addition, implementations
>>>>>>>>         are changed corresponding to the specified memory_order.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Hiroshi
>>>>>>>> -----------------------
>>>>>>>> Hiroshi Horii, Ph.D.
>>>>>>>> IBM Research - Tokyo
>>>>>>>>
>>>>>>
>>>>>

From volker.simonis at gmail.com  Thu May 12 12:26:59 2016
From: volker.simonis at gmail.com (Volker Simonis)
Date: Thu, 12 May 2016 14:26:59 +0200
Subject: SIGILL crashes JVM on PPC64 LE
In-Reply-To: <5733B30D.6010201@linux.vnet.ibm.com>
References: <5733B30D.6010201@linux.vnet.ibm.com>
Message-ID: <CA+3eh1235gSyijXZ-o+9D0Q+xjCKdk1OBm+2tKbPkCipH8P9EA@mail.gmail.com>

Hi Gustavo,

thanks for the bug report. The hs_err file you provided indicates that
this crash happened with Ubuntu's openjdk 8 version. Can you still
reproduce this with the the newest jdk9 builds?

Also, I can see from the hs_err file that the crash happened in the C2
compiled method java.util.TimSort.countRunAndMakeAscending which
doesn't seem to be related to nio and unsafe.

Ideally, you could post an easy test case to reproduce the problem. If
that's not possible, it would be helpful if you could post the output
of a failing run with
"-XX:CompileCommand=print,java.util.TimSort::countRunAndMakeAscending
-XX:CompileCommand=option,java.util.TimSort::countRunAndMakeAscending,PrintOptoAssembly".
In order to get the disassembly output for compiled methods you have
to build the hsdis library from hotspot/src/share/tools/hsdis (it has
a README with build instructions).

Regards,
Volker


On Thu, May 12, 2016 at 12:32 AM, Gustavo Romero
<gromero at linux.vnet.ibm.com> wrote:
> Hi
>
> I'm getting a nasty SIGILL that crashes the JVM on PPC64 LE.
>
> hs_err log:
> http://hastebin.com/raw/fovagunaci
>
> The application employs methods from both java.nio.ByteBuffer and
> sun.misc.Unsafe classes in order to write and read from an allocated buffer.
>
> A interesting thing is that after debugging the instruction that caused the
> said SIGILL:
>
>    0x3fff902839a4:      cmpwi   cr6,r17,0
>    0x3fff902839a8:      beq     cr6,0x3fff90283ae4
>    0x3fff902839ac:      .long 0xea2f0013 <============ illegal instruction
>    0x3fff902839b0:      add     r15,r15,r17
>    0x3fff902839b4:      add     r14,r17,r14
>
> I found that when its endianness is changed it turns out to be a valid
> instruction: vsel v24,v0,v5,v31
>
> However, I'm still unable to determine if it's an application issue, something
> with JVM unsafe interface code, or something else.
>
> Any clue on how to narrow down this SIGILL?
>
> Thank you!
>
> Regards,
> Gustavo
>

From volker.simonis at gmail.com  Thu May 12 12:39:41 2016
From: volker.simonis at gmail.com (Volker Simonis)
Date: Thu, 12 May 2016 14:39:41 +0200
Subject: SIGILL crashes JVM on PPC64 LE
In-Reply-To: <CA+3eh1235gSyijXZ-o+9D0Q+xjCKdk1OBm+2tKbPkCipH8P9EA@mail.gmail.com>
References: <5733B30D.6010201@linux.vnet.ibm.com>
	<CA+3eh1235gSyijXZ-o+9D0Q+xjCKdk1OBm+2tKbPkCipH8P9EA@mail.gmail.com>
Message-ID: <CA+3eh1364X=REW=SbPTQWB-cmZUhnxGmE8mWvVTN4S099NcSxA@mail.gmail.com>

And I forgot to mention: I've checked and we don't emit vsel
instructions in jdk8 on ppc. So it must be a coincidence that changing
the endianess of the offending instruction yields a valid 'vsel'
instruction.


On Thu, May 12, 2016 at 2:26 PM, Volker Simonis
<volker.simonis at gmail.com> wrote:
> Hi Gustavo,
>
> thanks for the bug report. The hs_err file you provided indicates that
> this crash happened with Ubuntu's openjdk 8 version. Can you still
> reproduce this with the the newest jdk9 builds?
>
> Also, I can see from the hs_err file that the crash happened in the C2
> compiled method java.util.TimSort.countRunAndMakeAscending which
> doesn't seem to be related to nio and unsafe.
>
> Ideally, you could post an easy test case to reproduce the problem. If
> that's not possible, it would be helpful if you could post the output
> of a failing run with
> "-XX:CompileCommand=print,java.util.TimSort::countRunAndMakeAscending
> -XX:CompileCommand=option,java.util.TimSort::countRunAndMakeAscending,PrintOptoAssembly".
> In order to get the disassembly output for compiled methods you have
> to build the hsdis library from hotspot/src/share/tools/hsdis (it has
> a README with build instructions).
>
> Regards,
> Volker
>
>
> On Thu, May 12, 2016 at 12:32 AM, Gustavo Romero
> <gromero at linux.vnet.ibm.com> wrote:
>> Hi
>>
>> I'm getting a nasty SIGILL that crashes the JVM on PPC64 LE.
>>
>> hs_err log:
>> http://hastebin.com/raw/fovagunaci
>>
>> The application employs methods from both java.nio.ByteBuffer and
>> sun.misc.Unsafe classes in order to write and read from an allocated buffer.
>>
>> A interesting thing is that after debugging the instruction that caused the
>> said SIGILL:
>>
>>    0x3fff902839a4:      cmpwi   cr6,r17,0
>>    0x3fff902839a8:      beq     cr6,0x3fff90283ae4
>>    0x3fff902839ac:      .long 0xea2f0013 <============ illegal instruction
>>    0x3fff902839b0:      add     r15,r15,r17
>>    0x3fff902839b4:      add     r14,r17,r14
>>
>> I found that when its endianness is changed it turns out to be a valid
>> instruction: vsel v24,v0,v5,v31
>>
>> However, I'm still unable to determine if it's an application issue, something
>> with JVM unsafe interface code, or something else.
>>
>> Any clue on how to narrow down this SIGILL?
>>
>> Thank you!
>>
>> Regards,
>> Gustavo
>>

From ENOMIKI at jp.ibm.com  Mon May 16 05:53:48 2016
From: ENOMIKI at jp.ibm.com (Miki M Enoki)
Date: Mon, 16 May 2016 14:53:48 +0900
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
References: <56FEDBB3.5030106@linux.vnet.ibm.com><CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com><57339EE1.2040500@linux.vnet.ibm.com>
	<da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
Message-ID: <201605160554.u4G5s3nn030257@d19av05.sagamino.japan.ibm.com>

Dear Gustavo, Volker, and Martin

I also implemented VSX disjoint long arraycopy.
I appreciate it if it is applied to OpenJDK, too. 

The performance was almost better than the original code.
VSX(max) means aligned case, while VSX(min) is unaligned case. In 
addition, VMX can be better if unaligned. 
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160417/fb12037e/result-0001.jpg

The benchmark code is here.
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160417/fb12037e/ArrayCopyTest1-0001.java
Server:  8247-22L (POWER8 (3.3GHz 12 cores) x2, 512GB memory), Ubuntu 
Linux 15.04 ppc64LE (kernel: 3.19.0-18-generic),
OpenJDK (build based on 1.9), JVMARGS: ?-Xmx40g ?Xms40g -Xmn20g"

created patches are for Java9.
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160417/fb12037e/ppc64le_vsx-0001.diff
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20160417/fb12037e/ppc64le_vmx-0001.diff

I would appreciate your comments. 

Best regards,
Miki


"ppc-aix-port-dev" <ppc-aix-port-dev-bounces at openjdk.java.net> wrote on 
2016/05/12 18:33:03:

> From: "Doerr, Martin" <martin.doerr at sap.com>
> To: Gustavo Romero <gromero at linux.vnet.ibm.com>, Volker Simonis 
> <volker.simonis at gmail.com>
> Cc: "Simonis, Volker" <volker.simonis at sap.com>, "ppc-aix-port-
> dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>, "hotspot-
> dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>, 
> "brenohl at br.ibm.com" <brenohl at br.ibm.com>
> Date: 2016/05/12 18:34
> Subject: RE: PPC64 VSX load/store instructions in stubs
> Sent by: "ppc-aix-port-dev" <ppc-aix-port-dev-bounces at openjdk.java.net>
> 
> Hi Gustavo,
> 
> thanks for providing the webrevs. The change looks basically good.
> 
> I only have the following concerns:
> - We basically support configuring dscr by various DSCR switches. 
> Your code resets the value to hardware default instead of the 
> possibly modified values. We're currently only using default DSCR 
> values, but maybe we may want to play with them in the future.
> We could use a static variable for the default dscr value. It could 
> be modified in VM_Version::config_dscr() and used by your restore 
> code (load_const_optimized(tmp1, ...) instead of li(tmp1, 0)).
> 
> - The PPC-elf64abi-1.9 says: "Functions must ensure that the 
> appropriate bits in the vrsave register are set for any vector 
> registers they use. ...". I think not touching vrsave is the right 
> thing for AIX and ppc64le, but I think we will either have to skip 
> the optimization on ppc64 big endian or handle vrsave. Do you agree?
> 
> Best regards,
> Martin
> 
> 
> -----Original Message-----
> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] 
> Sent: Mittwoch, 11. Mai 2016 23:07
> To: Volker Simonis <volker.simonis at gmail.com>
> Cc: Doerr, Martin <martin.doerr at sap.com>; Simonis, Volker 
> <volker.simonis at sap.com>; ppc-aix-port-dev at openjdk.java.net; 
> hotspot-dev at openjdk.java.net; brenohl at br.ibm.com
> Subject: Re: PPC64 VSX load/store instructions in stubs
> Importance: High
> 
> Hi Volker, Hi Martin
> 
> Sincere apologies for the long delay.
> 
> My initial approach to test the VSX load/store was from an
> extracted snippet regarding just the mass copy loop "grafted" insidean 
inline
> asm, performing isolated tests with "perf" tool focused only on 
> aligned source and
> destination (best case).
> 
> The extracted code, called "Original" in the plot below (black line), is 
here:
> https://github.com/gromero/arraycopy/blob/2pairs/arraycopy.c#L27-L36
> 
> That extracted, after some experiments, evolved into this one that 
employs VSX
> load/store, Data Stream deepest pre-fetch, d-cache touch, and 
> backbranch aligned
> to 32-byte:
> https://github.com/gromero/arraycopy/blob/2pairs/arraycopy_vsx.c#L27-L41
> 
> All runs where "pinned" using `numactl --cpunodebind --membind` to avoid 
any
> scheduler decision that could add noise to the measure.
> 
> VSX, deepest data pre-fetch, d-cache touch, and 32-bytes align 
> proved to be better
> in the isolated code (red line) in comparison to the original extracted 
code
> (black line):
> http://gromero.github.io/openjdk/original_vsx_non_pf_vsx_pf_deepest.pdf
> 
> So I proceeded to implement the VSX loop in OpenJDK based on the best 
case
> result (VSX, pre-fetch deepest, d-cache touch, and backbranch target 
align -
> goetz TODO note).
> 
> OpenJDK 8 webrev:
> http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/8/
> 
> OpenJDK 9 webrev:
> http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/9/
> 
> I've tested the change on OpenJDK 8 using this script that calls
> System.arraycopy() on shorts:
> https://goo.gl/8UWtLm
> 
> The results for all data alignment cases:
> http://gromero.github.io/openjdk/src_0_dst_0.pdf
> http://gromero.github.io/openjdk/src_1_dst_0.pdf
> http://gromero.github.io/openjdk/src_0_dst_1.pdf
> http://gromero.github.io/openjdk/src_1_dst_1.pdf
> 
> Martin, I added the vsx test to the feature-string. Regarding the 
> ABI, I'm just
> using two VSR: vsr0 and vsr1, both volatile.
> 
> Volker, as the loop unrolling was removed now the loop copies 16 
> elemets a time,
> like the non-VSX loop, and not 32 elements. I just verified the 
> change on Little
> endian. Sorry I didn't understand your question regarding "instructions 
for
> aligned load/stores". Did you mean instructions for unaligned load/
> stores? I think
> both fixed-point (ld/std) and VSX instructions will do load/store slower 
in
> unaligned scenario. However VMX load/store is different and expects 
aligned
> operands. Thank you very much for opening the bug
> https://bugs.openjdk.java.net/browse/JDK-8154156
> 
> I don't have the profiling per function for each SPEC{jbb,jvm} benchmark
> in order to determine which one would stress the proposed change better.
> Could I use a better benchmark?
> 
> Thank you!
> 
> Best regards,
> Gustavo
> 
> On 05-04-2016 14:23, Volker Simonis wrote:
> > Hi Gustavo,
> > 
> > thanks a lot for your contribution.
> > 
> > Can you please describe if you've run benchmarks and which performance
> > improvements you saw?
> > 
> > With your change if we're running on Power 8, we will only use the
> > fast path for arrays with at least 32 elements. For smaller arrays, we
> > will fall-back to copying only 2 elements at a time which will be
> > slower than the initial version which copied 4 at a time in that case.
> > 
> > Did you verified your changes on both, little and big endian?
> > 
> > And what about unaligned memory accesses? As far as I read,
> > lxvd2x/stxvd2x still work, but may be slower. I saw there also exist
> > instructions for aligned load/stores. Would it make sens
> > (performance-wise) to use them for the cases where we can be sure that
> > we have aligned memory accesses?
> > 
> > Thank you and best regards,
> > Volker
> > 
> > 
> > On Fri, Apr 1, 2016 at 10:36 PM, Gustavo Romero
> > <gromero at linux.vnet.ibm.com> wrote:
> >> Hi Martin, Hi Volker
> >>
> >> Currently VSX load/store instructions are not being used in PPC64 
stubs,
> >> particularly in arraycopy stubs inside generate_arraycopy_stubs() 
like,
> >> but not limited to, generate_disjoint_{byte,short,int,long}_copy.
> >>
> >> We can speed up mass copy using VSX (Vector-Scalar Extension) 
load/store
> >> instruction in processors >= POWER8, the same way it's already done 
for
> >> libc memcpy().
> >>
> >> This is an initial patch just for jshort_disjoint_arraycopy() VSX 
vector
> >> load/store:
> >>
> >> http://81.de.7a9f.ip4.static.sl-reverse.com/202539/webrev
> >>
> >> What are your thoughts on that? Is there any impediment to use VSX
> >> instructions in OpenJDK at the moment?
> >>
> >> Thank you.
> >>
> >> Best regards,
> >> Gustavo
> >>
> > 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160516/3d1adf25/attachment-0001.html>

From brenohl at br.ibm.com  Mon May 16 17:28:44 2016
From: brenohl at br.ibm.com (Breno Leitao)
Date: Mon, 16 May 2016 14:28:44 -0300
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <OFF20F9685.DD164547-ON49257FB5.001F4757-49257FB5.00206401@notes.na.collabserv.com>
References: <56FEDBB3.5030106@linux.vnet.ibm.com><CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com><57339EE1.2040500@linux.vnet.ibm.com>
	<da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
	<OFF20F9685.DD164547-ON49257FB5.001F4757-49257FB5.00206401@notes.na.collabserv.com>
Message-ID: <573A034C.9060602@br.ibm.com>

Hi Miki,

On 05/16/2016 02:53 AM, Miki M Enoki wrote:
> I also implemented VSX disjoint long arraycopy.
> I appreciate it if it is applied to OpenJDK, too.

Thanks for the summarized information, this is helpful. Based on your plot, I 
understand we can split the whole scenario in two:

  * Array size smaller than 4k, and then use VSX instructions to perform copy	
  * Array size bigger than 4k, and then use VMX instructions to perform copy

The same mechanism could be used to copy arrays of short elements, as Gustavo was 
working on. Do you agree?

That said, I understand that a new patch should be generated that contemplates 
both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence 
a webrev should be generated mapping to bug id 
https://bugs.openjdk.java.net/browse/JDK-8154156

If you need any help on the webrev[1] creation and hosting, Gustavo might help, 
since he did this process already.

[1] http://openjdk.java.net/guide/webrevHelp.html


From gromero at linux.vnet.ibm.com  Mon May 16 18:25:10 2016
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Mon, 16 May 2016 15:25:10 -0300
Subject: SIGILL crashes JVM on PPC64 LE
In-Reply-To: <CA+3eh1364X=REW=SbPTQWB-cmZUhnxGmE8mWvVTN4S099NcSxA@mail.gmail.com>
References: <5733B30D.6010201@linux.vnet.ibm.com>
	<CA+3eh1235gSyijXZ-o+9D0Q+xjCKdk1OBm+2tKbPkCipH8P9EA@mail.gmail.com>
	<CA+3eh1364X=REW=SbPTQWB-cmZUhnxGmE8mWvVTN4S099NcSxA@mail.gmail.com>
Message-ID: <201605161825.u4GIOpXL005912@mx0b-001b2d01.pphosted.com>

Hi Volker

Thanks for inspecting the Hotspot crash log.

At the moment it's no possible, AFIAK - and as I could try - to run
Cassandra on OpenJDK 9. It will hit another missing class issue before
it runs into the SIGILL issue.

I'm still trying to reproduce it with an easy test case.

However, I provide the C2 compiled method disasm:

hs_err log:
http://hastebin.com/raw/orufukacos

hs_err method disasm:
http://hastebin.com/raw/owoxamodok

Source for one of the four problematic classes that will crash JVM when
compiled (we can see them in hs_err method disasm comments):
java/org/apache/cassandra/db/rows/NativeCell.java#L133
https://goo.gl/Uefq8Y

Thanks for letting me know that `isel` is never emitted.

Thank you!

Best regards,
Gustavo

On 12-05-2016 09:39, Volker Simonis wrote:
> And I forgot to mention: I've checked and we don't emit vsel
> instructions in jdk8 on ppc. So it must be a coincidence that changing
> the endianess of the offending instruction yields a valid 'vsel'
> instruction.
> 
> 
> 
> On Thu, May 12, 2016 at 2:26 PM, Volker Simonis
> <volker.simonis at gmail.com> wrote:
>> Hi Gustavo,
>>
>> thanks for the bug report. The hs_err file you provided indicates that
>> this crash happened with Ubuntu's openjdk 8 version. Can you still
>> reproduce this with the the newest jdk9 builds?
>>
>> Also, I can see from the hs_err file that the crash happened in the C2
>> compiled method java.util.TimSort.countRunAndMakeAscending which
>> doesn't seem to be related to nio and unsafe.
>>
>> Ideally, you could post an easy test case to reproduce the problem. If
>> that's not possible, it would be helpful if you could post the output
>> of a failing run with
>> "-XX:CompileCommand=print,java.util.TimSort::countRunAndMakeAscending
>> -XX:CompileCommand=option,java.util.TimSort::countRunAndMakeAscending,PrintOptoAssembly".
>> In order to get the disassembly output for compiled methods you have
>> to build the hsdis library from hotspot/src/share/tools/hsdis (it has
>> a README with build instructions).
>>
>> Regards,
>> Volker
>>
>>
>> On Thu, May 12, 2016 at 12:32 AM, Gustavo Romero
>> <gromero at linux.vnet.ibm.com> wrote:
>>> Hi
>>>
>>> I'm getting a nasty SIGILL that crashes the JVM on PPC64 LE.
>>>
>>> hs_err log:
>>> http://hastebin.com/raw/fovagunaci
>>>
>>> The application employs methods from both java.nio.ByteBuffer and
>>> sun.misc.Unsafe classes in order to write and read from an allocated buffer.
>>>
>>> A interesting thing is that after debugging the instruction that caused the
>>> said SIGILL:
>>>
>>>    0x3fff902839a4:      cmpwi   cr6,r17,0
>>>    0x3fff902839a8:      beq     cr6,0x3fff90283ae4
>>>    0x3fff902839ac:      .long 0xea2f0013 <============ illegal instruction
>>>    0x3fff902839b0:      add     r15,r15,r17
>>>    0x3fff902839b4:      add     r14,r17,r14
>>>
>>> I found that when its endianness is changed it turns out to be a valid
>>> instruction: vsel v24,v0,v5,v31
>>>
>>> However, I'm still unable to determine if it's an application issue, something
>>> with JVM unsafe interface code, or something else.
>>>
>>> Any clue on how to narrow down this SIGILL?
>>>
>>> Thank you!
>>>
>>> Regards,
>>> Gustavo
>>>
> 


From gromero at linux.vnet.ibm.com  Mon May 16 22:09:40 2016
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Mon, 16 May 2016 19:09:40 -0300
Subject: SIGILL crashes JVM on PPC64 LE
In-Reply-To: <201605161825.u4GIO8g8023200@mx0a-001b2d01.pphosted.com>
References: <5733B30D.6010201@linux.vnet.ibm.com>
	<CA+3eh1235gSyijXZ-o+9D0Q+xjCKdk1OBm+2tKbPkCipH8P9EA@mail.gmail.com>
	<CA+3eh1364X=REW=SbPTQWB-cmZUhnxGmE8mWvVTN4S099NcSxA@mail.gmail.com>
	<201605161825.u4GIO8g8023200@mx0a-001b2d01.pphosted.com>
Message-ID: <201605162209.u4GM9Usm008246@mx0b-001b2d01.pphosted.com>

Hi Volker

I'm not sure, but it seems that the bytecode i2l is wrong for some
reason. It should be mapped to extsw asm instruction I think. Is
there just one code path that controls this mapping on PPC?

Thank you.

Regards,
Gustavo

On 16-05-2016 15:25, Gustavo Romero wrote:
> Hi Volker
> 
> Thanks for inspecting the Hotspot crash log.
> 
> At the moment it's no possible, AFIAK - and as I could try - to run
> Cassandra on OpenJDK 9. It will hit another missing class issue before
> it runs into the SIGILL issue.
> 
> I'm still trying to reproduce it with an easy test case.
> 
> However, I provide the C2 compiled method disasm:
> 
> hs_err log:
> http://hastebin.com/raw/orufukacos
> 
> hs_err method disasm:
> http://hastebin.com/raw/owoxamodok
> 
> Source for one of the four problematic classes that will crash JVM when
> compiled (we can see them in hs_err method disasm comments):
> java/org/apache/cassandra/db/rows/NativeCell.java#L133
> https://goo.gl/Uefq8Y
> 
> Thanks for letting me know that `isel` is never emitted.
> 
> Thank you!
> 
> Best regards,
> Gustavo
> 
> On 12-05-2016 09:39, Volker Simonis wrote:
>> And I forgot to mention: I've checked and we don't emit vsel
>> instructions in jdk8 on ppc. So it must be a coincidence that changing
>> the endianess of the offending instruction yields a valid 'vsel'
>> instruction.
>>
>>
>>
>> On Thu, May 12, 2016 at 2:26 PM, Volker Simonis
>> <volker.simonis at gmail.com> wrote:
>>> Hi Gustavo,
>>>
>>> thanks for the bug report. The hs_err file you provided indicates that
>>> this crash happened with Ubuntu's openjdk 8 version. Can you still
>>> reproduce this with the the newest jdk9 builds?
>>>
>>> Also, I can see from the hs_err file that the crash happened in the C2
>>> compiled method java.util.TimSort.countRunAndMakeAscending which
>>> doesn't seem to be related to nio and unsafe.
>>>
>>> Ideally, you could post an easy test case to reproduce the problem. If
>>> that's not possible, it would be helpful if you could post the output
>>> of a failing run with
>>> "-XX:CompileCommand=print,java.util.TimSort::countRunAndMakeAscending
>>> -XX:CompileCommand=option,java.util.TimSort::countRunAndMakeAscending,PrintOptoAssembly".
>>> In order to get the disassembly output for compiled methods you have
>>> to build the hsdis library from hotspot/src/share/tools/hsdis (it has
>>> a README with build instructions).
>>>
>>> Regards,
>>> Volker
>>>
>>>
>>> On Thu, May 12, 2016 at 12:32 AM, Gustavo Romero
>>> <gromero at linux.vnet.ibm.com> wrote:
>>>> Hi
>>>>
>>>> I'm getting a nasty SIGILL that crashes the JVM on PPC64 LE.
>>>>
>>>> hs_err log:
>>>> http://hastebin.com/raw/fovagunaci
>>>>
>>>> The application employs methods from both java.nio.ByteBuffer and
>>>> sun.misc.Unsafe classes in order to write and read from an allocated buffer.
>>>>
>>>> A interesting thing is that after debugging the instruction that caused the
>>>> said SIGILL:
>>>>
>>>>    0x3fff902839a4:      cmpwi   cr6,r17,0
>>>>    0x3fff902839a8:      beq     cr6,0x3fff90283ae4
>>>>    0x3fff902839ac:      .long 0xea2f0013 <============ illegal instruction
>>>>    0x3fff902839b0:      add     r15,r15,r17
>>>>    0x3fff902839b4:      add     r14,r17,r14
>>>>
>>>> I found that when its endianness is changed it turns out to be a valid
>>>> instruction: vsel v24,v0,v5,v31
>>>>
>>>> However, I'm still unable to determine if it's an application issue, something
>>>> with JVM unsafe interface code, or something else.
>>>>
>>>> Any clue on how to narrow down this SIGILL?
>>>>
>>>> Thank you!
>>>>
>>>> Regards,
>>>> Gustavo
>>>>
>>
> 


From kim.barrett at oracle.com  Wed May 18 01:26:19 2016
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 17 May 2016 21:26:19 -0400
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
Message-ID: <C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>

> On May 10, 2016, at 10:27 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
> 
> Hello everybody,
>  
> thanks for finding this issue. New webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/
>  

------------------------------------------------------------------------------ 
src/share/vm/runtime/atomic.hpp
  30 typedef enum cmpxchg_cmpxchg_memory_order {
  31   memory_order_relaxed,
  32   // Use value which doesn't interfere with C++2011. We need to be more conservative.
  33   memory_order_conservative = 8
  34 } cmpxchg_memory_order;

This is C++, where enum tag names are types, so we don't need a
typedef here. Just use "enum cmpxchg_memory_order { ... };".

------------------------------------------------------------------------------ 
src/share/vm/runtime/atomic.cpp
59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
  60                            volatile unsigned int* dest, unsigned int compare_value,
  61                            cmpxchg_memory_order order) {

Misaligned parameters.

I'm surprised this was ever out-of-line. But with this change it's
quite bad to be out-of-line, as that's going to kill the constant
propogation of the order value.

------------------------------------------------------------------------------ 
src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp 

In each use, the cmpxchg_post_membar's are after the exit label,
whereas the acquire fences they are replacing were before the exit
label.  This means we'll be fencing on failure exit, where we weren't
doing so before.

It's not clear whether this change is intentional.  Note that this
change is consistent with the C++11 one-order forms of cmpxchg, where
the single order argument is used as the sucess order and (with
potentially some modification) as the failure order.

[I was going to suggest the asm goto syntax might be used to obtain
the original ordering, but "An asm goto statement cannot have
outputs." So some non-trivial restructuring will probably be needed to
get the original ordering.]

Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. 

------------------------------------------------------------------------------ 
src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp 

These repeated comments need updating:
 315   // Note that cmpxchg guarantees a two-way memory barrier across
 316   // the cmpxchg, so it's really a a 'fence_cmpxchg_acquire'
 317   // (see atomic.hpp).

Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. 

------------------------------------------------------------------------------ 
src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp 

[pre-existing]

The cmpxchg asm sequences all originally looked like

  /* fence */
  strasm_sync
  ...
  /* acquire */
  strasm_sync
  ...

So they were using strasm_sync (the full fence) in both places, even
though the comments suggest it could/should have been
  strasm_fence ... strasm_acquire
However, the description in runtime/atomic.hpp seems to indicate
something stronger than "acquire" is required here, so the second
comment seems wrong. Maybe its a good thing the comments are being
removed by the proposed change.

Similarly in other corresponding places in other files.

------------------------------------------------------------------------------ 
src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
 138   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
 139                               (volatile intptr_t*) dest,
 140                               (intptr_t) compare_value, order);

I'd prefer the order argument be placed on its own line, rather than
added to an existing line where it's kind of hiding.

Similarly in other corresponding places in other files.

------------------------------------------------------------------------------ 
src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp 
 271 inline jint Atomic::cmpxchg(jint exchange_value,
 272                             volatile jint* dest,
 273                             jint compare_value, cmpxchg_memory_order order) {

and similarly elsewhere, I'd prefer the order parameter be on it's own
line like the other parameters.

Similarly in other corresponding places here and in other files.

------------------------------------------------------------------------------
src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp 
 306 inline void* Atomic::cmpxchg_ptr(void* exchange_value,
 307                                  volatile void* dest,
 308                                  void* compare_value, cmpxchg_memory_order order) {
 309 
 310   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
 311                               (volatile intptr_t*) dest,
 312                               (intptr_t) compare_value);
 313 }

Inner cmpxchg_ptr is missing the order argument. This will discard an
outer relaxed order.  (atomic_linux_zero is OK.)

------------------------------------------------------------------------------ 
src/share/vm/runtime/atomic.inline.hpp

The unspecialized Atomic::cmpxchg for jbyte isn't passing the order
argument through to cmpxchg_general.  Of course, then we might want to
figure out what cmpxchg_general should be doing with the order.

------------------------------------------------------------------------------


From martin.doerr at sap.com  Wed May 18 10:12:24 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 18 May 2016 10:12:24 +0000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
Message-ID: <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>

Hi Kim,

thank you very much for the detailed review.

I agree with your comments and I have made all your requested changes here:
http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/

It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.

For performance optimization, we should better use (or introduce additional) enum values.

Thanks and best regards,
Martin


-----Original Message-----
From: Kim Barrett [mailto:kim.barrett at oracle.com] 
Sent: Mittwoch, 18. Mai 2016 03:26
To: Doerr, Martin <martin.doerr at sap.com>
Cc: Hiroshi H Horii <HORII at jp.ibm.com>; David Holmes <david.holmes at oracle.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg

> On May 10, 2016, at 10:27 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
> 
> Hello everybody,
>  
> thanks for finding this issue. New webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/
>  

------------------------------------------------------------------------------ 
src/share/vm/runtime/atomic.hpp
  30 typedef enum cmpxchg_cmpxchg_memory_order {
  31   memory_order_relaxed,
  32   // Use value which doesn't interfere with C++2011. We need to be more conservative.
  33   memory_order_conservative = 8
  34 } cmpxchg_memory_order;

This is C++, where enum tag names are types, so we don't need a
typedef here. Just use "enum cmpxchg_memory_order { ... };".

------------------------------------------------------------------------------ 
src/share/vm/runtime/atomic.cpp
59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
  60                            volatile unsigned int* dest, unsigned int compare_value,
  61                            cmpxchg_memory_order order) {

Misaligned parameters.

I'm surprised this was ever out-of-line. But with this change it's
quite bad to be out-of-line, as that's going to kill the constant
propogation of the order value.

------------------------------------------------------------------------------ 
src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp 

In each use, the cmpxchg_post_membar's are after the exit label,
whereas the acquire fences they are replacing were before the exit
label.  This means we'll be fencing on failure exit, where we weren't
doing so before.

It's not clear whether this change is intentional.  Note that this
change is consistent with the C++11 one-order forms of cmpxchg, where
the single order argument is used as the sucess order and (with
potentially some modification) as the failure order.

[I was going to suggest the asm goto syntax might be used to obtain
the original ordering, but "An asm goto statement cannot have
outputs." So some non-trivial restructuring will probably be needed to
get the original ordering.]

Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. 

------------------------------------------------------------------------------ 
src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp 

These repeated comments need updating:
 315   // Note that cmpxchg guarantees a two-way memory barrier across
 316   // the cmpxchg, so it's really a a 'fence_cmpxchg_acquire'
 317   // (see atomic.hpp).

Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp. 

------------------------------------------------------------------------------ 
src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp 

[pre-existing]

The cmpxchg asm sequences all originally looked like

  /* fence */
  strasm_sync
  ...
  /* acquire */
  strasm_sync
  ...

So they were using strasm_sync (the full fence) in both places, even
though the comments suggest it could/should have been
  strasm_fence ... strasm_acquire
However, the description in runtime/atomic.hpp seems to indicate
something stronger than "acquire" is required here, so the second
comment seems wrong. Maybe its a good thing the comments are being
removed by the proposed change.

Similarly in other corresponding places in other files.

------------------------------------------------------------------------------ 
src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
 138   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
 139                               (volatile intptr_t*) dest,
 140                               (intptr_t) compare_value, order);

I'd prefer the order argument be placed on its own line, rather than
added to an existing line where it's kind of hiding.

Similarly in other corresponding places in other files.

------------------------------------------------------------------------------ 
src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp 
 271 inline jint Atomic::cmpxchg(jint exchange_value,
 272                             volatile jint* dest,
 273                             jint compare_value, cmpxchg_memory_order order) {

and similarly elsewhere, I'd prefer the order parameter be on it's own
line like the other parameters.

Similarly in other corresponding places here and in other files.

------------------------------------------------------------------------------
src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp 
 306 inline void* Atomic::cmpxchg_ptr(void* exchange_value,
 307                                  volatile void* dest,
 308                                  void* compare_value, cmpxchg_memory_order order) {
 309 
 310   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
 311                               (volatile intptr_t*) dest,
 312                               (intptr_t) compare_value);
 313 }

Inner cmpxchg_ptr is missing the order argument. This will discard an
outer relaxed order.  (atomic_linux_zero is OK.)

------------------------------------------------------------------------------ 
src/share/vm/runtime/atomic.inline.hpp

The unspecialized Atomic::cmpxchg for jbyte isn't passing the order
argument through to cmpxchg_general.  Of course, then we might want to
figure out what cmpxchg_general should be doing with the order.

------------------------------------------------------------------------------


From david.holmes at oracle.com  Wed May 18 10:52:03 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 18 May 2016 20:52:03 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
Message-ID: <95a96fb1-f35b-2e04-9f57-af39e2da4c9c@oracle.com>

On 18/05/2016 8:12 PM, Doerr, Martin wrote:
> Hi Kim,
>
> thank you very much for the detailed review.
>
> I agree with your comments and I have made all your requested changes here:
> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>
> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.

What further specification are you looking for:

   // All of the atomic operations that imply a read-modify-write action
   // guarantee a two-way memory barrier across that operation.

??

David


> For performance optimization, we should better use (or introduce additional) enum values.
>
> Thanks and best regards,
> Martin
>
>
> -----Original Message-----
> From: Kim Barrett [mailto:kim.barrett at oracle.com]
> Sent: Mittwoch, 18. Mai 2016 03:26
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; David Holmes <david.holmes at oracle.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
>> On May 10, 2016, at 10:27 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>
>> Hello everybody,
>>
>> thanks for finding this issue. New webrev is here:
>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/
>>
>
> ------------------------------------------------------------------------------
> src/share/vm/runtime/atomic.hpp
>   30 typedef enum cmpxchg_cmpxchg_memory_order {
>   31   memory_order_relaxed,
>   32   // Use value which doesn't interfere with C++2011. We need to be more conservative.
>   33   memory_order_conservative = 8
>   34 } cmpxchg_memory_order;
>
> This is C++, where enum tag names are types, so we don't need a
> typedef here. Just use "enum cmpxchg_memory_order { ... };".
>
> ------------------------------------------------------------------------------
> src/share/vm/runtime/atomic.cpp
> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>   60                            volatile unsigned int* dest, unsigned int compare_value,
>   61                            cmpxchg_memory_order order) {
>
> Misaligned parameters.
>
> I'm surprised this was ever out-of-line. But with this change it's
> quite bad to be out-of-line, as that's going to kill the constant
> propogation of the order value.
>
> ------------------------------------------------------------------------------
> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>
> In each use, the cmpxchg_post_membar's are after the exit label,
> whereas the acquire fences they are replacing were before the exit
> label.  This means we'll be fencing on failure exit, where we weren't
> doing so before.
>
> It's not clear whether this change is intentional.  Note that this
> change is consistent with the C++11 one-order forms of cmpxchg, where
> the single order argument is used as the sucess order and (with
> potentially some modification) as the failure order.
>
> [I was going to suggest the asm goto syntax might be used to obtain
> the original ordering, but "An asm goto statement cannot have
> outputs." So some non-trivial restructuring will probably be needed to
> get the original ordering.]
>
> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp.
>
> ------------------------------------------------------------------------------
> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>
> These repeated comments need updating:
>  315   // Note that cmpxchg guarantees a two-way memory barrier across
>  316   // the cmpxchg, so it's really a a 'fence_cmpxchg_acquire'
>  317   // (see atomic.hpp).
>
> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp.
>
> ------------------------------------------------------------------------------
> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>
> [pre-existing]
>
> The cmpxchg asm sequences all originally looked like
>
>   /* fence */
>   strasm_sync
>   ...
>   /* acquire */
>   strasm_sync
>   ...
>
> So they were using strasm_sync (the full fence) in both places, even
> though the comments suggest it could/should have been
>   strasm_fence ... strasm_acquire
> However, the description in runtime/atomic.hpp seems to indicate
> something stronger than "acquire" is required here, so the second
> comment seems wrong. Maybe its a good thing the comments are being
> removed by the proposed change.
>
> Similarly in other corresponding places in other files.
>
> ------------------------------------------------------------------------------
> src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>  138   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
>  139                               (volatile intptr_t*) dest,
>  140                               (intptr_t) compare_value, order);
>
> I'd prefer the order argument be placed on its own line, rather than
> added to an existing line where it's kind of hiding.
>
> Similarly in other corresponding places in other files.
>
> ------------------------------------------------------------------------------
> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>  271 inline jint Atomic::cmpxchg(jint exchange_value,
>  272                             volatile jint* dest,
>  273                             jint compare_value, cmpxchg_memory_order order) {
>
> and similarly elsewhere, I'd prefer the order parameter be on it's own
> line like the other parameters.
>
> Similarly in other corresponding places here and in other files.
>
> ------------------------------------------------------------------------------
> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>  306 inline void* Atomic::cmpxchg_ptr(void* exchange_value,
>  307                                  volatile void* dest,
>  308                                  void* compare_value, cmpxchg_memory_order order) {
>  309
>  310   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
>  311                               (volatile intptr_t*) dest,
>  312                               (intptr_t) compare_value);
>  313 }
>
> Inner cmpxchg_ptr is missing the order argument. This will discard an
> outer relaxed order.  (atomic_linux_zero is OK.)
>
> ------------------------------------------------------------------------------
> src/share/vm/runtime/atomic.inline.hpp
>
> The unspecialized Atomic::cmpxchg for jbyte isn't passing the order
> argument through to cmpxchg_general.  Of course, then we might want to
> figure out what cmpxchg_general should be doing with the order.
>
> ------------------------------------------------------------------------------
>

From martin.doerr at sap.com  Wed May 18 11:08:52 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 18 May 2016 11:08:52 +0000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <95a96fb1-f35b-2e04-9f57-af39e2da4c9c@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<95a96fb1-f35b-2e04-9f57-af39e2da4c9c@oracle.com>
Message-ID: <13d10493b45840b4912274f68fedc855@DEWDFE13DE14.global.corp.sap>

Hi David,

in comparison to C++11 or JEP 193, the hotspot C++ semantics are kind of unprecise for PPC64. That's the reason why we use 2 sync instructions which is the maximum conservative implementation.

C++11 and JEP 193 specify which barriers are needed in case cmpxchg fails.

In addition, I think one could implement " two-way memory barrier across that operation" for example as lwsync+cmpxchg+sync on PPC64 as well. But this wouldn't be multi-copy-atomic. It's unclear if this property is needed.

Best regards,
Martin


-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Mittwoch, 18. Mai 2016 12:52
To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett <kim.barrett at oracle.com>
Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg

On 18/05/2016 8:12 PM, Doerr, Martin wrote:
> Hi Kim,
>
> thank you very much for the detailed review.
>
> I agree with your comments and I have made all your requested changes here:
> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>
> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.

What further specification are you looking for:

   // All of the atomic operations that imply a read-modify-write action
   // guarantee a two-way memory barrier across that operation.

??

David


> For performance optimization, we should better use (or introduce additional) enum values.
>
> Thanks and best regards,
> Martin
>
>
> -----Original Message-----
> From: Kim Barrett [mailto:kim.barrett at oracle.com]
> Sent: Mittwoch, 18. Mai 2016 03:26
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; David Holmes <david.holmes at oracle.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
>> On May 10, 2016, at 10:27 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>
>> Hello everybody,
>>
>> thanks for finding this issue. New webrev is here:
>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/
>>
>
> ------------------------------------------------------------------------------
> src/share/vm/runtime/atomic.hpp
>   30 typedef enum cmpxchg_cmpxchg_memory_order {
>   31   memory_order_relaxed,
>   32   // Use value which doesn't interfere with C++2011. We need to be more conservative.
>   33   memory_order_conservative = 8
>   34 } cmpxchg_memory_order;
>
> This is C++, where enum tag names are types, so we don't need a
> typedef here. Just use "enum cmpxchg_memory_order { ... };".
>
> ------------------------------------------------------------------------------
> src/share/vm/runtime/atomic.cpp
> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>   60                            volatile unsigned int* dest, unsigned int compare_value,
>   61                            cmpxchg_memory_order order) {
>
> Misaligned parameters.
>
> I'm surprised this was ever out-of-line. But with this change it's
> quite bad to be out-of-line, as that's going to kill the constant
> propogation of the order value.
>
> ------------------------------------------------------------------------------
> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>
> In each use, the cmpxchg_post_membar's are after the exit label,
> whereas the acquire fences they are replacing were before the exit
> label.  This means we'll be fencing on failure exit, where we weren't
> doing so before.
>
> It's not clear whether this change is intentional.  Note that this
> change is consistent with the C++11 one-order forms of cmpxchg, where
> the single order argument is used as the sucess order and (with
> potentially some modification) as the failure order.
>
> [I was going to suggest the asm goto syntax might be used to obtain
> the original ordering, but "An asm goto statement cannot have
> outputs." So some non-trivial restructuring will probably be needed to
> get the original ordering.]
>
> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp.
>
> ------------------------------------------------------------------------------
> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>
> These repeated comments need updating:
>  315   // Note that cmpxchg guarantees a two-way memory barrier across
>  316   // the cmpxchg, so it's really a a 'fence_cmpxchg_acquire'
>  317   // (see atomic.hpp).
>
> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp.
>
> ------------------------------------------------------------------------------
> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>
> [pre-existing]
>
> The cmpxchg asm sequences all originally looked like
>
>   /* fence */
>   strasm_sync
>   ...
>   /* acquire */
>   strasm_sync
>   ...
>
> So they were using strasm_sync (the full fence) in both places, even
> though the comments suggest it could/should have been
>   strasm_fence ... strasm_acquire
> However, the description in runtime/atomic.hpp seems to indicate
> something stronger than "acquire" is required here, so the second
> comment seems wrong. Maybe its a good thing the comments are being
> removed by the proposed change.
>
> Similarly in other corresponding places in other files.
>
> ------------------------------------------------------------------------------
> src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>  138   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
>  139                               (volatile intptr_t*) dest,
>  140                               (intptr_t) compare_value, order);
>
> I'd prefer the order argument be placed on its own line, rather than
> added to an existing line where it's kind of hiding.
>
> Similarly in other corresponding places in other files.
>
> ------------------------------------------------------------------------------
> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>  271 inline jint Atomic::cmpxchg(jint exchange_value,
>  272                             volatile jint* dest,
>  273                             jint compare_value, cmpxchg_memory_order order) {
>
> and similarly elsewhere, I'd prefer the order parameter be on it's own
> line like the other parameters.
>
> Similarly in other corresponding places here and in other files.
>
> ------------------------------------------------------------------------------
> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>  306 inline void* Atomic::cmpxchg_ptr(void* exchange_value,
>  307                                  volatile void* dest,
>  308                                  void* compare_value, cmpxchg_memory_order order) {
>  309
>  310   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
>  311                               (volatile intptr_t*) dest,
>  312                               (intptr_t) compare_value);
>  313 }
>
> Inner cmpxchg_ptr is missing the order argument. This will discard an
> outer relaxed order.  (atomic_linux_zero is OK.)
>
> ------------------------------------------------------------------------------
> src/share/vm/runtime/atomic.inline.hpp
>
> The unspecialized Atomic::cmpxchg for jbyte isn't passing the order
> argument through to cmpxchg_general.  Of course, then we might want to
> figure out what cmpxchg_general should be doing with the order.
>
> ------------------------------------------------------------------------------
>

From david.holmes at oracle.com  Wed May 18 11:52:41 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 18 May 2016 21:52:41 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <13d10493b45840b4912274f68fedc855@DEWDFE13DE14.global.corp.sap>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<95a96fb1-f35b-2e04-9f57-af39e2da4c9c@oracle.com>
	<13d10493b45840b4912274f68fedc855@DEWDFE13DE14.global.corp.sap>
Message-ID: <7628ef5d-4e59-ef05-0550-07d5838f4570@oracle.com>

On 18/05/2016 9:08 PM, Doerr, Martin wrote:
> Hi David,
>
> in comparison to C++11 or JEP 193, the hotspot C++ semantics are kind of unprecise for PPC64. That's the reason why we use 2 sync instructions which is the maximum conservative implementation.
>
> C++11 and JEP 193 specify which barriers are needed in case cmpxchg fails.

The hotspot semantics are quite simple - no difference between success 
or failure - just two-way barriers around the operations.

> In addition, I think one could implement " two-way memory barrier across that operation" for example as lwsync+cmpxchg+sync on PPC64 as well. But this wouldn't be multi-copy-atomic. It's unclear if this property is needed.

Yes multi-copy-atomicity is implicit in the "two way barriers" - nothing 
can be reordered in relation to the operation, so implicitly all 
observers see the same thing at the same time.

This may well be stronger than required by actual algorithms using the 
operations but as the comment block continues:

   // these semantics reflect the strength of atomic operations that are
   // provided on SPARC/X86. We assume that strength is necessary unless
   // we can prove that a weaker form is sufficiently safe.

Cheers,
David
-----

> Best regards,
> Martin
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Mittwoch, 18. Mai 2016 12:52
> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett <kim.barrett at oracle.com>
> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
> On 18/05/2016 8:12 PM, Doerr, Martin wrote:
>> Hi Kim,
>>
>> thank you very much for the detailed review.
>>
>> I agree with your comments and I have made all your requested changes here:
>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>>
>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.
>
> What further specification are you looking for:
>
>    // All of the atomic operations that imply a read-modify-write action
>    // guarantee a two-way memory barrier across that operation.
>
> ??
>
> David
>
>
>> For performance optimization, we should better use (or introduce additional) enum values.
>>
>> Thanks and best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Kim Barrett [mailto:kim.barrett at oracle.com]
>> Sent: Mittwoch, 18. Mai 2016 03:26
>> To: Doerr, Martin <martin.doerr at sap.com>
>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; David Holmes <david.holmes at oracle.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>
>>> On May 10, 2016, at 10:27 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>>
>>> Hello everybody,
>>>
>>> thanks for finding this issue. New webrev is here:
>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/
>>>
>>
>> ------------------------------------------------------------------------------
>> src/share/vm/runtime/atomic.hpp
>>   30 typedef enum cmpxchg_cmpxchg_memory_order {
>>   31   memory_order_relaxed,
>>   32   // Use value which doesn't interfere with C++2011. We need to be more conservative.
>>   33   memory_order_conservative = 8
>>   34 } cmpxchg_memory_order;
>>
>> This is C++, where enum tag names are types, so we don't need a
>> typedef here. Just use "enum cmpxchg_memory_order { ... };".
>>
>> ------------------------------------------------------------------------------
>> src/share/vm/runtime/atomic.cpp
>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>>   60                            volatile unsigned int* dest, unsigned int compare_value,
>>   61                            cmpxchg_memory_order order) {
>>
>> Misaligned parameters.
>>
>> I'm surprised this was ever out-of-line. But with this change it's
>> quite bad to be out-of-line, as that's going to kill the constant
>> propogation of the order value.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>
>> In each use, the cmpxchg_post_membar's are after the exit label,
>> whereas the acquire fences they are replacing were before the exit
>> label.  This means we'll be fencing on failure exit, where we weren't
>> doing so before.
>>
>> It's not clear whether this change is intentional.  Note that this
>> change is consistent with the C++11 one-order forms of cmpxchg, where
>> the single order argument is used as the sucess order and (with
>> potentially some modification) as the failure order.
>>
>> [I was going to suggest the asm goto syntax might be used to obtain
>> the original ordering, but "An asm goto statement cannot have
>> outputs." So some non-trivial restructuring will probably be needed to
>> get the original ordering.]
>>
>> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>
>> These repeated comments need updating:
>>  315   // Note that cmpxchg guarantees a two-way memory barrier across
>>  316   // the cmpxchg, so it's really a a 'fence_cmpxchg_acquire'
>>  317   // (see atomic.hpp).
>>
>> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>
>> [pre-existing]
>>
>> The cmpxchg asm sequences all originally looked like
>>
>>   /* fence */
>>   strasm_sync
>>   ...
>>   /* acquire */
>>   strasm_sync
>>   ...
>>
>> So they were using strasm_sync (the full fence) in both places, even
>> though the comments suggest it could/should have been
>>   strasm_fence ... strasm_acquire
>> However, the description in runtime/atomic.hpp seems to indicate
>> something stronger than "acquire" is required here, so the second
>> comment seems wrong. Maybe its a good thing the comments are being
>> removed by the proposed change.
>>
>> Similarly in other corresponding places in other files.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>>  138   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
>>  139                               (volatile intptr_t*) dest,
>>  140                               (intptr_t) compare_value, order);
>>
>> I'd prefer the order argument be placed on its own line, rather than
>> added to an existing line where it's kind of hiding.
>>
>> Similarly in other corresponding places in other files.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>>  271 inline jint Atomic::cmpxchg(jint exchange_value,
>>  272                             volatile jint* dest,
>>  273                             jint compare_value, cmpxchg_memory_order order) {
>>
>> and similarly elsewhere, I'd prefer the order parameter be on it's own
>> line like the other parameters.
>>
>> Similarly in other corresponding places here and in other files.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>>  306 inline void* Atomic::cmpxchg_ptr(void* exchange_value,
>>  307                                  volatile void* dest,
>>  308                                  void* compare_value, cmpxchg_memory_order order) {
>>  309
>>  310   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
>>  311                               (volatile intptr_t*) dest,
>>  312                               (intptr_t) compare_value);
>>  313 }
>>
>> Inner cmpxchg_ptr is missing the order argument. This will discard an
>> outer relaxed order.  (atomic_linux_zero is OK.)
>>
>> ------------------------------------------------------------------------------
>> src/share/vm/runtime/atomic.inline.hpp
>>
>> The unspecialized Atomic::cmpxchg for jbyte isn't passing the order
>> argument through to cmpxchg_general.  Of course, then we might want to
>> figure out what cmpxchg_general should be doing with the order.
>>
>> ------------------------------------------------------------------------------
>>

From martin.doerr at sap.com  Wed May 18 12:32:15 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 18 May 2016 12:32:15 +0000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <7628ef5d-4e59-ef05-0550-07d5838f4570@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<95a96fb1-f35b-2e04-9f57-af39e2da4c9c@oracle.com>
	<13d10493b45840b4912274f68fedc855@DEWDFE13DE14.global.corp.sap>
	<7628ef5d-4e59-ef05-0550-07d5838f4570@oracle.com>
Message-ID: <fed26e95a8754b8b86376e4982068556@DEWDFE13DE14.global.corp.sap>

Hi David,

ok, this comment specifies it clear enough:
   // these semantics reflect the strength of atomic operations that are
   // provided on SPARC/X86.
So my change does the right thing :-)

Thanks for your quick response and especially for pushing this change forward.

Best regards,
Martin


-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Mittwoch, 18. Mai 2016 13:53
To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett <kim.barrett at oracle.com>
Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg

On 18/05/2016 9:08 PM, Doerr, Martin wrote:
> Hi David,
>
> in comparison to C++11 or JEP 193, the hotspot C++ semantics are kind of unprecise for PPC64. That's the reason why we use 2 sync instructions which is the maximum conservative implementation.
>
> C++11 and JEP 193 specify which barriers are needed in case cmpxchg fails.

The hotspot semantics are quite simple - no difference between success 
or failure - just two-way barriers around the operations.

> In addition, I think one could implement " two-way memory barrier across that operation" for example as lwsync+cmpxchg+sync on PPC64 as well. But this wouldn't be multi-copy-atomic. It's unclear if this property is needed.

Yes multi-copy-atomicity is implicit in the "two way barriers" - nothing 
can be reordered in relation to the operation, so implicitly all 
observers see the same thing at the same time.

This may well be stronger than required by actual algorithms using the 
operations but as the comment block continues:

   // these semantics reflect the strength of atomic operations that are
   // provided on SPARC/X86. We assume that strength is necessary unless
   // we can prove that a weaker form is sufficiently safe.

Cheers,
David
-----

> Best regards,
> Martin
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Mittwoch, 18. Mai 2016 12:52
> To: Doerr, Martin <martin.doerr at sap.com>; Kim Barrett <kim.barrett at oracle.com>
> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
> On 18/05/2016 8:12 PM, Doerr, Martin wrote:
>> Hi Kim,
>>
>> thank you very much for the detailed review.
>>
>> I agree with your comments and I have made all your requested changes here:
>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>>
>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.
>
> What further specification are you looking for:
>
>    // All of the atomic operations that imply a read-modify-write action
>    // guarantee a two-way memory barrier across that operation.
>
> ??
>
> David
>
>
>> For performance optimization, we should better use (or introduce additional) enum values.
>>
>> Thanks and best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Kim Barrett [mailto:kim.barrett at oracle.com]
>> Sent: Mittwoch, 18. Mai 2016 03:26
>> To: Doerr, Martin <martin.doerr at sap.com>
>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; David Holmes <david.holmes at oracle.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>
>>> On May 10, 2016, at 10:27 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>>
>>> Hello everybody,
>>>
>>> thanks for finding this issue. New webrev is here:
>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/
>>>
>>
>> ------------------------------------------------------------------------------
>> src/share/vm/runtime/atomic.hpp
>>   30 typedef enum cmpxchg_cmpxchg_memory_order {
>>   31   memory_order_relaxed,
>>   32   // Use value which doesn't interfere with C++2011. We need to be more conservative.
>>   33   memory_order_conservative = 8
>>   34 } cmpxchg_memory_order;
>>
>> This is C++, where enum tag names are types, so we don't need a
>> typedef here. Just use "enum cmpxchg_memory_order { ... };".
>>
>> ------------------------------------------------------------------------------
>> src/share/vm/runtime/atomic.cpp
>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>>   60                            volatile unsigned int* dest, unsigned int compare_value,
>>   61                            cmpxchg_memory_order order) {
>>
>> Misaligned parameters.
>>
>> I'm surprised this was ever out-of-line. But with this change it's
>> quite bad to be out-of-line, as that's going to kill the constant
>> propogation of the order value.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>
>> In each use, the cmpxchg_post_membar's are after the exit label,
>> whereas the acquire fences they are replacing were before the exit
>> label.  This means we'll be fencing on failure exit, where we weren't
>> doing so before.
>>
>> It's not clear whether this change is intentional.  Note that this
>> change is consistent with the C++11 one-order forms of cmpxchg, where
>> the single order argument is used as the sucess order and (with
>> potentially some modification) as the failure order.
>>
>> [I was going to suggest the asm goto syntax might be used to obtain
>> the original ordering, but "An asm goto statement cannot have
>> outputs." So some non-trivial restructuring will probably be needed to
>> get the original ordering.]
>>
>> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>
>> These repeated comments need updating:
>>  315   // Note that cmpxchg guarantees a two-way memory barrier across
>>  316   // the cmpxchg, so it's really a a 'fence_cmpxchg_acquire'
>>  317   // (see atomic.hpp).
>>
>> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>>
>> [pre-existing]
>>
>> The cmpxchg asm sequences all originally looked like
>>
>>   /* fence */
>>   strasm_sync
>>   ...
>>   /* acquire */
>>   strasm_sync
>>   ...
>>
>> So they were using strasm_sync (the full fence) in both places, even
>> though the comments suggest it could/should have been
>>   strasm_fence ... strasm_acquire
>> However, the description in runtime/atomic.hpp seems to indicate
>> something stronger than "acquire" is required here, so the second
>> comment seems wrong. Maybe its a good thing the comments are being
>> removed by the proposed change.
>>
>> Similarly in other corresponding places in other files.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>>  138   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
>>  139                               (volatile intptr_t*) dest,
>>  140                               (intptr_t) compare_value, order);
>>
>> I'd prefer the order argument be placed on its own line, rather than
>> added to an existing line where it's kind of hiding.
>>
>> Similarly in other corresponding places in other files.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>>  271 inline jint Atomic::cmpxchg(jint exchange_value,
>>  272                             volatile jint* dest,
>>  273                             jint compare_value, cmpxchg_memory_order order) {
>>
>> and similarly elsewhere, I'd prefer the order parameter be on it's own
>> line like the other parameters.
>>
>> Similarly in other corresponding places here and in other files.
>>
>> ------------------------------------------------------------------------------
>> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>>  306 inline void* Atomic::cmpxchg_ptr(void* exchange_value,
>>  307                                  volatile void* dest,
>>  308                                  void* compare_value, cmpxchg_memory_order order) {
>>  309
>>  310   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
>>  311                               (volatile intptr_t*) dest,
>>  312                               (intptr_t) compare_value);
>>  313 }
>>
>> Inner cmpxchg_ptr is missing the order argument. This will discard an
>> outer relaxed order.  (atomic_linux_zero is OK.)
>>
>> ------------------------------------------------------------------------------
>> src/share/vm/runtime/atomic.inline.hpp
>>
>> The unspecialized Atomic::cmpxchg for jbyte isn't passing the order
>> argument through to cmpxchg_general.  Of course, then we might want to
>> figure out what cmpxchg_general should be doing with the order.
>>
>> ------------------------------------------------------------------------------
>>

From david.holmes at oracle.com  Thu May 19 00:03:46 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 19 May 2016 10:03:46 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
Message-ID: <743059fb-1b8a-1105-493b-b0071e53cbf8@oracle.com>

On 18/05/2016 8:12 PM, Doerr, Martin wrote:
> Hi Kim,
>
> thank you very much for the detailed review.
>
> I agree with your comments and I have made all your requested changes here:
> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/

This looks fine to me. I make no comments on the PPC implementation.

Thanks,
David

> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.
>
> For performance optimization, we should better use (or introduce additional) enum values.
>
> Thanks and best regards,
> Martin
>
>
> -----Original Message-----
> From: Kim Barrett [mailto:kim.barrett at oracle.com]
> Sent: Mittwoch, 18. Mai 2016 03:26
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; David Holmes <david.holmes at oracle.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
>> On May 10, 2016, at 10:27 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>
>> Hello everybody,
>>
>> thanks for finding this issue. New webrev is here:
>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.02/
>>
>
> ------------------------------------------------------------------------------
> src/share/vm/runtime/atomic.hpp
>   30 typedef enum cmpxchg_cmpxchg_memory_order {
>   31   memory_order_relaxed,
>   32   // Use value which doesn't interfere with C++2011. We need to be more conservative.
>   33   memory_order_conservative = 8
>   34 } cmpxchg_memory_order;
>
> This is C++, where enum tag names are types, so we don't need a
> typedef here. Just use "enum cmpxchg_memory_order { ... };".
>
> ------------------------------------------------------------------------------
> src/share/vm/runtime/atomic.cpp
> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>   60                            volatile unsigned int* dest, unsigned int compare_value,
>   61                            cmpxchg_memory_order order) {
>
> Misaligned parameters.
>
> I'm surprised this was ever out-of-line. But with this change it's
> quite bad to be out-of-line, as that's going to kill the constant
> propogation of the order value.
>
> ------------------------------------------------------------------------------
> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>
> In each use, the cmpxchg_post_membar's are after the exit label,
> whereas the acquire fences they are replacing were before the exit
> label.  This means we'll be fencing on failure exit, where we weren't
> doing so before.
>
> It's not clear whether this change is intentional.  Note that this
> change is consistent with the C++11 one-order forms of cmpxchg, where
> the single order argument is used as the sucess order and (with
> potentially some modification) as the failure order.
>
> [I was going to suggest the asm goto syntax might be used to obtain
> the original ordering, but "An asm goto statement cannot have
> outputs." So some non-trivial restructuring will probably be needed to
> get the original ordering.]
>
> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp.
>
> ------------------------------------------------------------------------------
> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>
> These repeated comments need updating:
>  315   // Note that cmpxchg guarantees a two-way memory barrier across
>  316   // the cmpxchg, so it's really a a 'fence_cmpxchg_acquire'
>  317   // (see atomic.hpp).
>
> Similarly in src/os_cpu/aix_ppc/vm/atomic_aix_ppc.inline.hpp.
>
> ------------------------------------------------------------------------------
> src/os_cpu/linux_ppc/vm/atomic_linux_ppc.inline.hpp
>
> [pre-existing]
>
> The cmpxchg asm sequences all originally looked like
>
>   /* fence */
>   strasm_sync
>   ...
>   /* acquire */
>   strasm_sync
>   ...
>
> So they were using strasm_sync (the full fence) in both places, even
> though the comments suggest it could/should have been
>   strasm_fence ... strasm_acquire
> However, the description in runtime/atomic.hpp seems to indicate
> something stronger than "acquire" is required here, so the second
> comment seems wrong. Maybe its a good thing the comments are being
> removed by the proposed change.
>
> Similarly in other corresponding places in other files.
>
> ------------------------------------------------------------------------------
> src/os_cpu/linux_aarch64/vm/atomic_linux_aarch64.inline.hpp
>  138   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
>  139                               (volatile intptr_t*) dest,
>  140                               (intptr_t) compare_value, order);
>
> I'd prefer the order argument be placed on its own line, rather than
> added to an existing line where it's kind of hiding.
>
> Similarly in other corresponding places in other files.
>
> ------------------------------------------------------------------------------
> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>  271 inline jint Atomic::cmpxchg(jint exchange_value,
>  272                             volatile jint* dest,
>  273                             jint compare_value, cmpxchg_memory_order order) {
>
> and similarly elsewhere, I'd prefer the order parameter be on it's own
> line like the other parameters.
>
> Similarly in other corresponding places here and in other files.
>
> ------------------------------------------------------------------------------
> src/os_cpu/bsd_zero/vm/atomic_bsd_zero.inline.hpp
>  306 inline void* Atomic::cmpxchg_ptr(void* exchange_value,
>  307                                  volatile void* dest,
>  308                                  void* compare_value, cmpxchg_memory_order order) {
>  309
>  310   return (void *) cmpxchg_ptr((intptr_t) exchange_value,
>  311                               (volatile intptr_t*) dest,
>  312                               (intptr_t) compare_value);
>  313 }
>
> Inner cmpxchg_ptr is missing the order argument. This will discard an
> outer relaxed order.  (atomic_linux_zero is OK.)
>
> ------------------------------------------------------------------------------
> src/share/vm/runtime/atomic.inline.hpp
>
> The unspecialized Atomic::cmpxchg for jbyte isn't passing the order
> argument through to cmpxchg_general.  Of course, then we might want to
> figure out what cmpxchg_general should be doing with the order.
>
> ------------------------------------------------------------------------------
>

From david.holmes at oracle.com  Thu May 19 11:56:58 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 19 May 2016 21:56:58 +1000
Subject: enhancement of cmpxchg and copy_to_survivor for ppc64
In-Reply-To: <573D8D20.3080008@redhat.com>
References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com>
	<D32C2773-507F-4594-9B53-F2661E744C04@oracle.com>
	<5711ED18.7000706@oracle.com>
	<201604180215.u3I2FUZi001650@d19av07.sagamino.japan.ibm.com>
	<571464DF.3070706@oracle.com>
	<CAP_pwnXwkBmG+9Cy4_fY8kkVSrMsJ00NsCEs4wmRB3XwjGM+JA@mail.gmail.com>
	<5714E416.6030300@redhat.com>
	<CA+3eh11wZe-BQeYxz=ev02JLcKEbdrqVMn691AbBJHJp+-pybA@mail.gmail.com>
	<571577EB.1080907@oracle.com> <573D8D20.3080008@redhat.com>
Message-ID: <95abf1f5-0464-0df1-7de2-c8f7c493703e@oracle.com>

On 19/05/2016 7:53 PM, Andrew Haley wrote:
> The AArch64 code for this isn't ideal, of course.  I'll submit
> an AArch64 version as soon as this goes in.  Do I need a different
> bug ID?

There are now two bugs for this:

8155949: Support relaxed semantics in cmpxchg

is adding the new API, with a relaxed implementation for PPC.

8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

is for the actual GC code changes to use the relaxed cmpxchg API.

I will be shepherding 8155949 through the post FC process (once we know 
what it is). If you can provide the Aarch64 code before that completes 
then it can go in with the other changes under this bug. Otherwise it 
will need a separate bug.

Cheers,
David

> Andrew.
>

From david.holmes at oracle.com  Thu May 19 20:08:20 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 20 May 2016 06:08:20 +1000
Subject: enhancement of cmpxchg and copy_to_survivor for ppc64
In-Reply-To: <573DD60F.9030005@redhat.com>
References: <201604081054.u38As2K6014953@d19av07.sagamino.japan.ibm.com>
	<D32C2773-507F-4594-9B53-F2661E744C04@oracle.com>
	<5711ED18.7000706@oracle.com>
	<201604180215.u3I2FUZi001650@d19av07.sagamino.japan.ibm.com>
	<571464DF.3070706@oracle.com>
	<CAP_pwnXwkBmG+9Cy4_fY8kkVSrMsJ00NsCEs4wmRB3XwjGM+JA@mail.gmail.com>
	<5714E416.6030300@redhat.com>
	<CA+3eh11wZe-BQeYxz=ev02JLcKEbdrqVMn691AbBJHJp+-pybA@mail.gmail.com>
	<573DD60F.9030005@redhat.com>
Message-ID: <3fdffecd-75d0-fe77-0ed9-038b24bbf007@oracle.com>

Andrew,

Can you post this to the actual review thread for 8155949 please.

Thanks,
David

On 20/05/2016 1:04 AM, Andrew Haley wrote:
> There is one significant problem with this approach.
>
> Atomic::cmpxchg(jint) is defined like this in atomic.cpp:
>
>   unsigned Atomic::cmpxchg(unsigned int exchange_value,
>                            volatile unsigned int* dest, unsigned int compare_value,
>                            cmpxchg_memory_order order) {
>     assert(sizeof(unsigned int) == sizeof(jint), "more work to do");
>     return (unsigned int)Atomic::cmpxchg((jint)exchange_value, (volatile jint*)dest,
>                                          (jint)compare_value, order);
>   }
>
> Because this is in atomic.cpp, there is a *runtime* test on the memory
> order: the compiler can't constant propagate it.  If we're adding the
> cmpxchg_memory_order I think we should move Atomic::cmpxchg(jint) to
> atomic.inline.hpp.
>
> Andrew.
>

From kim.barrett at oracle.com  Thu May 19 20:17:53 2016
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 19 May 2016 16:17:53 -0400
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <7628ef5d-4e59-ef05-0550-07d5838f4570@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<95a96fb1-f35b-2e04-9f57-af39e2da4c9c@oracle.com>
	<13d10493b45840b4912274f68fedc855@DEWDFE13DE14.global.corp.sap>
	<7628ef5d-4e59-ef05-0550-07d5838f4570@oracle.com>
Message-ID: <227CE7D4-E731-4F1A-AED5-C29135208BC5@oracle.com>

> On May 18, 2016, at 7:52 AM, David Holmes <david.holmes at oracle.com> wrote:
> 
> On 18/05/2016 9:08 PM, Doerr, Martin wrote:
>> Hi David,
>> 
>> in comparison to C++11 or JEP 193, the hotspot C++ semantics are kind of unprecise for PPC64. That's the reason why we use 2 sync instructions which is the maximum conservative implementation.
>> 
>> C++11 and JEP 193 specify which barriers are needed in case cmpxchg fails.
> 
> The hotspot semantics are quite simple - no difference between success or failure - just two-way barriers around the operations.

The current ppc ports have have no post-barrier in the failure case.  That does seem
at variance from the documented hotspot semantics.

I?m not sure about aarch64, since it uses a compiler intrinsic (__sync_val_compare_and_swap) whose expansion I don?t know where to find.

sparc and x86 look fine, not surprisingly.


From kim.barrett at oracle.com  Thu May 19 22:03:02 2016
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 19 May 2016 18:03:02 -0400
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
Message-ID: <EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>

> On May 18, 2016, at 6:12 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
> 
> Hi Kim,
> 
> thank you very much for the detailed review.
> 
> I agree with your comments and I have made all your requested changes here:
> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
> 
> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.
> 
> For performance optimization, we should better use (or introduce additional) enum values.

------------------------------------------------------------------------------  
There doesn't seem to have been any change for this earlier comment.

src/share/vm/runtime/atomic.cpp
59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
 60                            volatile unsigned int* dest, unsigned int compare_value,
 61                            cmpxchg_memory_order order) {

I'm surprised this was ever out-of-line. But with this change it's
quite bad to be out-of-line, as that's going to kill the constant
propogation of the order value.

------------------------------------------------------------------------------ 

Other than that, looks good.


From gromero at linux.vnet.ibm.com  Thu May 19 23:46:05 2016
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Thu, 19 May 2016 20:46:05 -0300
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
References: <56FEDBB3.5030106@linux.vnet.ibm.com>
	<CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com>
	<57339EE1.2040500@linux.vnet.ibm.com>
	<da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
Message-ID: <201605192346.u4JNhcxs015627@mx0a-001b2d01.pphosted.com>

Hi Martin

Thank you for reviewing the webrev.

> We could use a static variable for the default dscr value. It could be modified in VM_Version::config_dscr() and used by your restore code (load_const_optimized(tmp1, ...) instead of li(tmp1, 0)).

Absolutely, resetting DSCR to the default value (zero) is not right.

I did as you suggested and created a static variable modified and
initialized from VM_Version::config_dscr(). Then I used it to get the
current value of DSCR, set only the pre-fetch as deepest, and restore
its previous value.


> - The PPC-elf64abi-1.9 says: "Functions must ensure that the appropriate bits in the vrsave register are set for any vector registers they use. ...". I think not touching vrsave is the right thing for AIX and ppc64le, but I think we will either have to skip the optimization on ppc64 big endian or handle vrsave. Do you agree?

About the VRSAVE register, you are right, but there is a confusing here
and it's my fault: I'm not using the VMX registers.

In my code I've used the VSX load/store instructions with a
VectorRegister type, i.e. VR0 and VR1. It's OK if we look at the
assembled instructions because, in the end, VR0 and VR1 will be
converted to target (or source) registers number 0 and 1. But it's VSX
registers 0 and 1 (VSR0 and VSR1) and not VMX (aka Altivec) registers
0 and 1 (VR0 and VR1).

There is indeed a relationship between VSR and VR registers, as
we can see in the following diagram adapted from [1]:

       .---------------------------------.
VSR( 0)|     FPR(0)     |                |
VSR( 1)|     FPR(1)     |                |
  ...  |      ...       |                |
  ...  |      ...       |                |
VSR(30)|     FPR(30)    |                |
VSR(31)|     FPR(31)    |                |
VSR(32)|              VR(0)              |
VSR(33)|              VR(1)              |
  ...  |               ...               |
  ...  |               ...               |
VSR(62)|              VR(30)             |
VSR(63)|              VR(31)             |
       '---------------------------------'
        0                             127

However VMX registers VR0-31 are mapped to VSX VSR32-63 registers,
and so we can use VSR0 and VSR1 (although they are also mapped to FPR,
FPR0-13 are volatile). Thus actually in my code I was using VSR0 and
VSR1 and not VR0 and VR1. Thus as VRSAVE only corresponds to
VMX/Altivec registers (VR0-VR31), there is not need to take care of
VRSAVE. I fixed the registers names/types in this new webrev.

I noted that the VSR registers were not implemented and thus I
implemented them. Now VSX load/store instruction use VectorSRegister
type. I'm using VSR0 and VSR1 registers in the stub, respecting the
ABI.

Webrev:
http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v2/

Best regards,
Gustavo

[1] Power Architecture 64-Bit ELF V2 ABI https://goo.gl/LLXRwN, p. 43-44

> -----Original Message-----
> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] 
> Sent: Mittwoch, 11. Mai 2016 23:07
> To: Volker Simonis <volker.simonis at gmail.com>
> Cc: Doerr, Martin <martin.doerr at sap.com>; Simonis, Volker <volker.simonis at sap.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; brenohl at br.ibm.com
> Subject: Re: PPC64 VSX load/store instructions in stubs
> Importance: High
> 
> Hi Volker, Hi Martin
> 
> Sincere apologies for the long delay.
> 
> My initial approach to test the VSX load/store was from an
> extracted snippet regarding just the mass copy loop "grafted" inside an inline
> asm, performing isolated tests with "perf" tool focused only on aligned source and
> destination (best case).
> 
> The extracted code, called "Original" in the plot below (black line), is here:
> https://github.com/gromero/arraycopy/blob/2pairs/arraycopy.c#L27-L36
> 
> That extracted, after some experiments, evolved into this one that employs VSX
> load/store, Data Stream deepest pre-fetch, d-cache touch, and backbranch aligned
> to 32-byte:
> https://github.com/gromero/arraycopy/blob/2pairs/arraycopy_vsx.c#L27-L41
> 
> All runs where "pinned" using `numactl --cpunodebind --membind` to avoid any
> scheduler decision that could add noise to the measure.
> 
> VSX, deepest data pre-fetch, d-cache touch, and 32-bytes align proved to be better
> in the isolated code (red line) in comparison to the original extracted code
> (black line):
> http://gromero.github.io/openjdk/original_vsx_non_pf_vsx_pf_deepest.pdf
> 
> So I proceeded to implement the VSX loop in OpenJDK based on the best case
> result (VSX, pre-fetch deepest, d-cache touch, and backbranch target align -
> goetz TODO note).
> 
> OpenJDK 8 webrev:
> http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/8/
> 
> OpenJDK 9 webrev:
> http://81.de.7a9f.ip4.static.sl-reverse.com/8154156/9/
> 
> I've tested the change on OpenJDK 8 using this script that calls
> System.arraycopy() on shorts:
> https://goo.gl/8UWtLm
> 
> The results for all data alignment cases:
> http://gromero.github.io/openjdk/src_0_dst_0.pdf
> http://gromero.github.io/openjdk/src_1_dst_0.pdf
> http://gromero.github.io/openjdk/src_0_dst_1.pdf
> http://gromero.github.io/openjdk/src_1_dst_1.pdf
> 
> Martin, I added the vsx test to the feature-string. Regarding the ABI, I'm just
> using two VSR: vsr0 and vsr1, both volatile.
> 
> Volker, as the loop unrolling was removed now the loop copies 16 elemets a time,
> like the non-VSX loop, and not 32 elements. I just verified the change on Little
> endian. Sorry I didn't understand your question regarding "instructions for
> aligned load/stores". Did you mean instructions for unaligned load/stores? I think
> both fixed-point (ld/std) and VSX instructions will do load/store slower in
> unaligned scenario. However VMX load/store is different and expects aligned
> operands. Thank you very much for opening the bug
> https://bugs.openjdk.java.net/browse/JDK-8154156
> 
> I don't have the profiling per function for each SPEC{jbb,jvm} benchmark
> in order to determine which one would stress the proposed change better.
> Could I use a better benchmark?
> 
> Thank you!
> 
> Best regards,
> Gustavo
> 
> On 05-04-2016 14:23, Volker Simonis wrote:
>> Hi Gustavo,
>>
>> thanks a lot for your contribution.
>>
>> Can you please describe if you've run benchmarks and which performance
>> improvements you saw?
>>
>> With your change if we're running on Power 8, we will only use the
>> fast path for arrays with at least 32 elements. For smaller arrays, we
>> will fall-back to copying only 2 elements at a time which will be
>> slower than the initial version which copied 4 at a time in that case.
>>
>> Did you verified your changes on both, little and big endian?
>>
>> And what about unaligned memory accesses? As far as I read,
>> lxvd2x/stxvd2x still work, but may be slower. I saw there also exist
>> instructions for aligned load/stores. Would it make sens
>> (performance-wise) to use them for the cases where we can be sure that
>> we have aligned memory accesses?
>>
>> Thank you and best regards,
>> Volker
>>
>>
>> On Fri, Apr 1, 2016 at 10:36 PM, Gustavo Romero
>> <gromero at linux.vnet.ibm.com> wrote:
>>> Hi Martin, Hi Volker
>>>
>>> Currently VSX load/store instructions are not being used in PPC64 stubs,
>>> particularly in arraycopy stubs inside generate_arraycopy_stubs() like,
>>> but not limited to, generate_disjoint_{byte,short,int,long}_copy.
>>>
>>> We can speed up mass copy using VSX (Vector-Scalar Extension) load/store
>>> instruction in processors >= POWER8, the same way it's already done for
>>> libc memcpy().
>>>
>>> This is an initial patch just for jshort_disjoint_arraycopy() VSX vector
>>> load/store:
>>>
>>> http://81.de.7a9f.ip4.static.sl-reverse.com/202539/webrev
>>>
>>> What are your thoughts on that? Is there any impediment to use VSX
>>> instructions in OpenJDK at the moment?
>>>
>>> Thank you.
>>>
>>> Best regards,
>>> Gustavo
>>>
>>
> 


From gromero at linux.vnet.ibm.com  Fri May 20 16:20:05 2016
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Fri, 20 May 2016 13:20:05 -0300
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <201605192346.u4JNhdUm028414@mx0a-001b2d01.pphosted.com>
References: <56FEDBB3.5030106@linux.vnet.ibm.com>
	<CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com>
	<57339EE1.2040500@linux.vnet.ibm.com>
	<da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
	<201605192346.u4JNhdUm028414@mx0a-001b2d01.pphosted.com>
Message-ID: <201605201620.u4KGEXCM042745@mx0a-001b2d01.pphosted.com>

> Hi Martin
> 
> Thank you for reviewing the webrev.
> 
>> We could use a static variable for the default dscr value. It could be modified in VM_Version::config_dscr() and used by your restore code (load_const_optimized(tmp1, ...) instead of li(tmp1, 0)).
> 
> Absolutely, resetting DSCR to the default value (zero) is not right.
> 
> I did as you suggested and created a static variable modified and
> initialized from VM_Version::config_dscr(). Then I used it to get the
> current value of DSCR, set only the pre-fetch as deepest, and restore
> its previous value.
> 
> 
>> - The PPC-elf64abi-1.9 says: "Functions must ensure that the appropriate bits in the vrsave register are set for any vector registers they use. ...". I think not touching vrsave is the right thing for AIX and ppc64le, but I think we will either have to skip the optimization on ppc64 big endian or handle vrsave. Do you agree?
> 
> About the VRSAVE register, you are right, but there is a confusing here
> and it's my fault: I'm not using the VMX registers.
> 
> In my code I've used the VSX load/store instructions with a
> VectorRegister type, i.e. VR0 and VR1. It's OK if we look at the
> assembled instructions because, in the end, VR0 and VR1 will be
> converted to target (or source) registers number 0 and 1. But it's VSX
> registers 0 and 1 (VSR0 and VSR1) and not VMX (aka Altivec) registers
> 0 and 1 (VR0 and VR1).
> 
> There is indeed a relationship between VSR and VR registers, as
> we can see in the following diagram adapted from [1]:
> 
>        .---------------------------------.
> VSR( 0)|     FPR(0)     |                |
> VSR( 1)|     FPR(1)     |                |
>   ...  |      ...       |                |
>   ...  |      ...       |                |
> VSR(30)|     FPR(30)    |                |
> VSR(31)|     FPR(31)    |                |
> VSR(32)|              VR(0)              |
> VSR(33)|              VR(1)              |
>   ...  |               ...               |
>   ...  |               ...               |
> VSR(62)|              VR(30)             |
> VSR(63)|              VR(31)             |
>        '---------------------------------'
>         0                             127
> 
> However VMX registers VR0-31 are mapped to VSX VSR32-63 registers,
> and so we can use VSR0 and VSR1 (although they are also mapped to FPR,
> FPR0-13 are volatile). Thus actually in my code I was using VSR0 and
> VSR1 and not VR0 and VR1. Thus as VRSAVE only corresponds to
> VMX/Altivec registers (VR0-VR31), there is not need to take care of
> VRSAVE. I fixed the registers names/types in this new webrev.
> 
> I noted that the VSR registers were not implemented and thus I
> implemented them. Now VSX load/store instruction use VectorSRegister
> type. I'm using VSR0 and VSR1 registers in the stub, respecting the
> ABI.
> 
> Webrev:
> http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v2/
> 
> Best regards,
> Gustavo
> 
> [1] Power Architecture 64-Bit ELF V2 ABI https://goo.gl/LLXRwN, p. 43-44
> 

Hi Martin

The previous change was not restoring the DSCR value.

Here is the webwev with the fix included:
http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v3/

Thank you!

Best regards,
Gustavo


From david.holmes at oracle.com  Fri May 20 23:09:31 2016
From: david.holmes at oracle.com (David Holmes)
Date: Sat, 21 May 2016 09:09:31 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
Message-ID: <267a624c-626f-4238-0166-baa14ff4b412@oracle.com>

Hi Martin,

Are you in a position to make the change now suggested by both Kim and 
Andrew? Can you also include the Aarch64 code that Andrew provided:

http://cr.openjdk.java.net/~aph/8154736

I'd like to get this finalized so it is ready to push as soon as the 
process allows it to.

Thanks,
David

On 20/05/2016 8:03 AM, Kim Barrett wrote:
>> On May 18, 2016, at 6:12 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>
>> Hi Kim,
>>
>> thank you very much for the detailed review.
>>
>> I agree with your comments and I have made all your requested changes here:
>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>>
>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.
>>
>> For performance optimization, we should better use (or introduce additional) enum values.
>
> ------------------------------------------------------------------------------
> There doesn't seem to have been any change for this earlier comment.
>
> src/share/vm/runtime/atomic.cpp
> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>  60                            volatile unsigned int* dest, unsigned int compare_value,
>  61                            cmpxchg_memory_order order) {
>
> I'm surprised this was ever out-of-line. But with this change it's
> quite bad to be out-of-line, as that's going to kill the constant
> propogation of the order value.
>
> ------------------------------------------------------------------------------
>
> Other than that, looks good.
>
>
>
>

From martin.doerr at sap.com  Mon May 23 09:29:42 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 23 May 2016 09:29:42 +0000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <267a624c-626f-4238-0166-baa14ff4b412@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
	<267a624c-626f-4238-0166-baa14ff4b412@oracle.com>
Message-ID: <bd436243ef584df3bb12ea2a0bf6a7a6@DEWDFE13DE14.global.corp.sap>

Hi David,

here's the new webrev:
http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/

Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change.

Thanks for your time and your support.

Best regards,
Martin

-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Samstag, 21. Mai 2016 01:10
To: Doerr, Martin <martin.doerr at sap.com>
Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg

Hi Martin,

Are you in a position to make the change now suggested by both Kim and 
Andrew? Can you also include the Aarch64 code that Andrew provided:

http://cr.openjdk.java.net/~aph/8154736

I'd like to get this finalized so it is ready to push as soon as the 
process allows it to.

Thanks,
David

On 20/05/2016 8:03 AM, Kim Barrett wrote:
>> On May 18, 2016, at 6:12 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>
>> Hi Kim,
>>
>> thank you very much for the detailed review.
>>
>> I agree with your comments and I have made all your requested changes here:
>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>>
>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.
>>
>> For performance optimization, we should better use (or introduce additional) enum values.
>
> ------------------------------------------------------------------------------
> There doesn't seem to have been any change for this earlier comment.
>
> src/share/vm/runtime/atomic.cpp
> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>  60                            volatile unsigned int* dest, unsigned int compare_value,
>  61                            cmpxchg_memory_order order) {
>
> I'm surprised this was ever out-of-line. But with this change it's
> quite bad to be out-of-line, as that's going to kill the constant
> propogation of the order value.
>
> ------------------------------------------------------------------------------
>
> Other than that, looks good.
>
>
>
>

From gromero at linux.vnet.ibm.com  Mon May 23 14:22:16 2016
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Mon, 23 May 2016 11:22:16 -0300
Subject: RFR(M): PPC64: improve array copy stubs by using vector instructions
Message-ID: <201605231422.u4NEIiBQ005583@mx0a-001b2d01.pphosted.com>

Hi Martin

Could you please host and review this webrev?

Summary:

* Add VSR registers to be used with VSX instruction set;
* Add VSX load/store instructions (lxvd2x/stxvd2x) to mass copy in
  the stub for disjoint short copy in order to improve it.

http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v4/

Thank you!

Best regards,
Gustavo


From martin.doerr at sap.com  Mon May 23 15:51:41 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 23 May 2016 15:51:41 +0000
Subject: RFR(M): PPC64: improve array copy stubs by using vector
	instructions
In-Reply-To: <201605231422.u4NEIb1g013944@mx0a-001b2d01.pphosted.com>
References: <201605231422.u4NEIb1g013944@mx0a-001b2d01.pphosted.com>
Message-ID: <25cb11dfe7624a4a8848d049626413e7@DEWDFE13DE14.global.corp.sap>

Hi Gustavo,

thanks for implementing it and taking care of my concerns. Looks good, now.
I will run tests and I can sponsor it after it was reviewed.

Best regards,
Martin

-----Original Message-----
From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] 
Sent: Montag, 23. Mai 2016 16:22
To: Doerr, Martin <martin.doerr at sap.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net
Cc: Simonis, Volker <volker.simonis at sap.com>; brenohl at br.ibm.com
Subject: RFR(M): PPC64: improve array copy stubs by using vector instructions

Hi Martin

Could you please host and review this webrev?

Summary:

* Add VSR registers to be used with VSX instruction set;
* Add VSX load/store instructions (lxvd2x/stxvd2x) to mass copy in
  the stub for disjoint short copy in order to improve it.

http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v4/

Thank you!

Best regards,
Gustavo


From gromero at linux.vnet.ibm.com  Mon May 23 15:53:45 2016
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Mon, 23 May 2016 12:53:45 -0300
Subject: RFR(M): PPC64: improve array copy stubs by using vector
	instructions
In-Reply-To: <25cb11dfe7624a4a8848d049626413e7@DEWDFE13DE14.global.corp.sap>
References: <201605231422.u4NEIb1g013944@mx0a-001b2d01.pphosted.com>
	<25cb11dfe7624a4a8848d049626413e7@DEWDFE13DE14.global.corp.sap>
Message-ID: <201605231553.u4NFn9EW016604@mx0a-001b2d01.pphosted.com>

Hi Martin

Thank you for reviewing the change.

Best regards,
Gustavo

On 23-05-2016 12:51, Doerr, Martin wrote:
> Hi Gustavo,
> 
> thanks for implementing it and taking care of my concerns. Looks good, now.
> I will run tests and I can sponsor it after it was reviewed.
> 
> Best regards,
> Martin
> 
> -----Original Message-----
> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] 
> Sent: Montag, 23. Mai 2016 16:22
> To: Doerr, Martin <martin.doerr at sap.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-dev at openjdk.java.net
> Cc: Simonis, Volker <volker.simonis at sap.com>; brenohl at br.ibm.com
> Subject: RFR(M): PPC64: improve array copy stubs by using vector instructions
> 
> Hi Martin
> 
> Could you please host and review this webrev?
> 
> Summary:
> 
> * Add VSR registers to be used with VSX instruction set;
> * Add VSX load/store instructions (lxvd2x/stxvd2x) to mass copy in
>   the stub for disjoint short copy in order to improve it.
> 
> http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v4/
> 
> Thank you!
> 
> Best regards,
> Gustavo
> 


From david.holmes at oracle.com  Tue May 24 03:49:49 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 24 May 2016 13:49:49 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <bd436243ef584df3bb12ea2a0bf6a7a6@DEWDFE13DE14.global.corp.sap>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
	<267a624c-626f-4238-0166-baa14ff4b412@oracle.com>
	<bd436243ef584df3bb12ea2a0bf6a7a6@DEWDFE13DE14.global.corp.sap>
Message-ID: <9cff0b75-e234-e789-910d-d86154bba834@oracle.com>

Hi Martin,

On 23/05/2016 7:29 PM, Doerr, Martin wrote:
> Hi David,
>
> here's the new webrev:
> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/

There seems to be some confusion. You've moved the jbyte 
Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but 
the comments from Andrew and Kim were about moving the unsigned 
Atomic::cmpxchg version. ??

Aside: In the changeset contributor's have to be specified by "email 
address" or "name <email address>", OpenJDK user names are not accepted. 
I think Andrew should also be listed there for the Aarch64 component.

Thanks,
David

> Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change.
>
> Thanks for your time and your support.
>
> Best regards,
> Martin
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Samstag, 21. Mai 2016 01:10
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
> Hi Martin,
>
> Are you in a position to make the change now suggested by both Kim and
> Andrew? Can you also include the Aarch64 code that Andrew provided:
>
> http://cr.openjdk.java.net/~aph/8154736
>
> I'd like to get this finalized so it is ready to push as soon as the
> process allows it to.
>
> Thanks,
> David
>
> On 20/05/2016 8:03 AM, Kim Barrett wrote:
>>> On May 18, 2016, at 6:12 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>>
>>> Hi Kim,
>>>
>>> thank you very much for the detailed review.
>>>
>>> I agree with your comments and I have made all your requested changes here:
>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>>>
>>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
>>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.
>>>
>>> For performance optimization, we should better use (or introduce additional) enum values.
>>
>> ------------------------------------------------------------------------------
>> There doesn't seem to have been any change for this earlier comment.
>>
>> src/share/vm/runtime/atomic.cpp
>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>>  60                            volatile unsigned int* dest, unsigned int compare_value,
>>  61                            cmpxchg_memory_order order) {
>>
>> I'm surprised this was ever out-of-line. But with this change it's
>> quite bad to be out-of-line, as that's going to kill the constant
>> propogation of the order value.
>>
>> ------------------------------------------------------------------------------
>>
>> Other than that, looks good.
>>
>>
>>
>>

From goetz.lindenmaier at sap.com  Tue May 24 08:29:41 2016
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 24 May 2016 08:29:41 +0000
Subject: RFR(M): PPC64: improve array copy stubs by using vector
	instructions
In-Reply-To: <201605231553.u4NFq1Rr022712@mx0a-001b2d01.pphosted.com>
References: <201605231422.u4NEIb1g013944@mx0a-001b2d01.pphosted.com>
	<25cb11dfe7624a4a8848d049626413e7@DEWDFE13DE14.global.corp.sap>
	<201605231553.u4NFq1Rr022712@mx0a-001b2d01.pphosted.com>
Message-ID: <8b32b8882e964fd0b2ac0f22c94e389a@DEWDFE13DE09.global.corp.sap>

Hi Gustavo, 

thanks for contributing this optimization to the ppc port!

The change looks good, nice work.

Next time, please use correct subject in the RFR mail, the bugID is missing.
Also, address the RFR to everybody.  This one you addressed to Martin.
In general, you need several reviews.
Martin, thanks for reviewing though!

Martin, I think you can push this as it's ppc-only.

Best regards,
  Goetz.


> -----Original Message-----
> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com]
> Sent: Montag, 23. Mai 2016 17:54
> To: Doerr, Martin <martin.doerr at sap.com>; ppc-aix-port-
> dev at openjdk.java.net; hotspot-dev at openjdk.java.net
> Cc: Simonis, Volker <volker.simonis at sap.com>; brenohl at br.ibm.com;
> Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
> Subject: Re: RFR(M): PPC64: improve array copy stubs by using vector
> instructions
> 
> Hi Martin
> 
> Thank you for reviewing the change.
> 
> Best regards,
> Gustavo
> 
> On 23-05-2016 12:51, Doerr, Martin wrote:
> > Hi Gustavo,
> >
> > thanks for implementing it and taking care of my concerns. Looks good,
> now.
> > I will run tests and I can sponsor it after it was reviewed.
> >
> > Best regards,
> > Martin
> >
> > -----Original Message-----
> > From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com]
> > Sent: Montag, 23. Mai 2016 16:22
> > To: Doerr, Martin <martin.doerr at sap.com>; ppc-aix-port-
> dev at openjdk.java.net; hotspot-dev at openjdk.java.net
> > Cc: Simonis, Volker <volker.simonis at sap.com>; brenohl at br.ibm.com
> > Subject: RFR(M): PPC64: improve array copy stubs by using vector
> instructions
> >
> > Hi Martin
> >
> > Could you please host and review this webrev?
> >
> > Summary:
> >
> > * Add VSR registers to be used with VSX instruction set;
> > * Add VSX load/store instructions (lxvd2x/stxvd2x) to mass copy in
> >   the stub for disjoint short copy in order to improve it.
> >
> > http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v4/
> >
> > Thank you!
> >
> > Best regards,
> > Gustavo
> >


From martin.doerr at sap.com  Tue May 24 09:37:50 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 24 May 2016 09:37:50 +0000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <9cff0b75-e234-e789-910d-d86154bba834@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
	<267a624c-626f-4238-0166-baa14ff4b412@oracle.com>
	<bd436243ef584df3bb12ea2a0bf6a7a6@DEWDFE13DE14.global.corp.sap>
	<9cff0b75-e234-e789-910d-d86154bba834@oracle.com>
Message-ID: <fd5eed458a8f4dedb8ea1bd42f822c4e@DEWDFE13DE14.global.corp.sap>

Hi David and Andrew,

sorry for missing this one. There were too many emails.

After moving the jint version as well, there was not much left of atomic.cpp.
I think it doesn't make any sense to keep a couple of trivial functions in the cpp file.
Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file.

Webrev is here:
http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/

Best regards,
Martin


-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Dienstag, 24. Mai 2016 05:50
To: Doerr, Martin <martin.doerr at sap.com>
Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg

Hi Martin,

On 23/05/2016 7:29 PM, Doerr, Martin wrote:
> Hi David,
>
> here's the new webrev:
> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/

There seems to be some confusion. You've moved the jbyte 
Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but 
the comments from Andrew and Kim were about moving the unsigned 
Atomic::cmpxchg version. ??

Aside: In the changeset contributor's have to be specified by "email 
address" or "name <email address>", OpenJDK user names are not accepted. 
I think Andrew should also be listed there for the Aarch64 component.

Thanks,
David

> Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change.
>
> Thanks for your time and your support.
>
> Best regards,
> Martin
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Samstag, 21. Mai 2016 01:10
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
> Hi Martin,
>
> Are you in a position to make the change now suggested by both Kim and
> Andrew? Can you also include the Aarch64 code that Andrew provided:
>
> http://cr.openjdk.java.net/~aph/8154736
>
> I'd like to get this finalized so it is ready to push as soon as the
> process allows it to.
>
> Thanks,
> David
>
> On 20/05/2016 8:03 AM, Kim Barrett wrote:
>>> On May 18, 2016, at 6:12 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>>
>>> Hi Kim,
>>>
>>> thank you very much for the detailed review.
>>>
>>> I agree with your comments and I have made all your requested changes here:
>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>>>
>>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
>>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.
>>>
>>> For performance optimization, we should better use (or introduce additional) enum values.
>>
>> ------------------------------------------------------------------------------
>> There doesn't seem to have been any change for this earlier comment.
>>
>> src/share/vm/runtime/atomic.cpp
>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>>  60                            volatile unsigned int* dest, unsigned int compare_value,
>>  61                            cmpxchg_memory_order order) {
>>
>> I'm surprised this was ever out-of-line. But with this change it's
>> quite bad to be out-of-line, as that's going to kill the constant
>> propogation of the order value.
>>
>> ------------------------------------------------------------------------------
>>
>> Other than that, looks good.
>>
>>
>>
>>

From david.holmes at oracle.com  Tue May 24 10:03:56 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 24 May 2016 20:03:56 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <fd5eed458a8f4dedb8ea1bd42f822c4e@DEWDFE13DE14.global.corp.sap>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
	<267a624c-626f-4238-0166-baa14ff4b412@oracle.com>
	<bd436243ef584df3bb12ea2a0bf6a7a6@DEWDFE13DE14.global.corp.sap>
	<9cff0b75-e234-e789-910d-d86154bba834@oracle.com>
	<fd5eed458a8f4dedb8ea1bd42f822c4e@DEWDFE13DE14.global.corp.sap>
Message-ID: <275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com>

On 24/05/2016 7:37 PM, Doerr, Martin wrote:
> Hi David and Andrew,
>
> sorry for missing this one. There were too many emails.
>
> After moving the jint version as well, there was not much left of atomic.cpp.
> I think it doesn't make any sense to keep a couple of trivial functions in the cpp file.
> Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file.

Sorry I don't understand why the jbyte cmpxchg_general was moved to the 
.inline.hpp file - it seems far too big to be inlined.

David

> Webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/
>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Dienstag, 24. Mai 2016 05:50
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
> Hi Martin,
>
> On 23/05/2016 7:29 PM, Doerr, Martin wrote:
>> Hi David,
>>
>> here's the new webrev:
>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/
>
> There seems to be some confusion. You've moved the jbyte
> Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but
> the comments from Andrew and Kim were about moving the unsigned
> Atomic::cmpxchg version. ??
>
> Aside: In the changeset contributor's have to be specified by "email
> address" or "name <email address>", OpenJDK user names are not accepted.
> I think Andrew should also be listed there for the Aarch64 component.
>
> Thanks,
> David
>
>> Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change.
>>
>> Thanks for your time and your support.
>>
>> Best regards,
>> Martin
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Samstag, 21. Mai 2016 01:10
>> To: Doerr, Martin <martin.doerr at sap.com>
>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>
>> Hi Martin,
>>
>> Are you in a position to make the change now suggested by both Kim and
>> Andrew? Can you also include the Aarch64 code that Andrew provided:
>>
>> http://cr.openjdk.java.net/~aph/8154736
>>
>> I'd like to get this finalized so it is ready to push as soon as the
>> process allows it to.
>>
>> Thanks,
>> David
>>
>> On 20/05/2016 8:03 AM, Kim Barrett wrote:
>>>> On May 18, 2016, at 6:12 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>>>
>>>> Hi Kim,
>>>>
>>>> thank you very much for the detailed review.
>>>>
>>>> I agree with your comments and I have made all your requested changes here:
>>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>>>>
>>>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
>>>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.
>>>>
>>>> For performance optimization, we should better use (or introduce additional) enum values.
>>>
>>> ------------------------------------------------------------------------------
>>> There doesn't seem to have been any change for this earlier comment.
>>>
>>> src/share/vm/runtime/atomic.cpp
>>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>>>  60                            volatile unsigned int* dest, unsigned int compare_value,
>>>  61                            cmpxchg_memory_order order) {
>>>
>>> I'm surprised this was ever out-of-line. But with this change it's
>>> quite bad to be out-of-line, as that's going to kill the constant
>>> propogation of the order value.
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> Other than that, looks good.
>>>
>>>
>>>
>>>

From martin.doerr at sap.com  Tue May 24 10:21:59 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 24 May 2016 10:21:59 +0000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
	<267a624c-626f-4238-0166-baa14ff4b412@oracle.com>
	<bd436243ef584df3bb12ea2a0bf6a7a6@DEWDFE13DE14.global.corp.sap>
	<9cff0b75-e234-e789-910d-d86154bba834@oracle.com>
	<fd5eed458a8f4dedb8ea1bd42f822c4e@DEWDFE13DE14.global.corp.sap>
	<275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com>
Message-ID: <9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap>

Hi David,

it was moved for the same reason as the jint version of cmpxchg: It passes the memory order to the jint version.
It may look large in terms of C++ code, but there's not much substantial content.
I can only see a loop which calls the jint version + a bunch of very simple operations.
Why shouldn't we give compilers a chance to inline and possibly optimize some of the simple operations and especially to eliminate the order check?

Best regards,
Martin

-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Dienstag, 24. Mai 2016 12:04
To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley (aph at redhat.com) <aph at redhat.com>
Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg

On 24/05/2016 7:37 PM, Doerr, Martin wrote:
> Hi David and Andrew,
>
> sorry for missing this one. There were too many emails.
>
> After moving the jint version as well, there was not much left of atomic.cpp.
> I think it doesn't make any sense to keep a couple of trivial functions in the cpp file.
> Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file.

Sorry I don't understand why the jbyte cmpxchg_general was moved to the 
.inline.hpp file - it seems far too big to be inlined.

David

> Webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/
>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Dienstag, 24. Mai 2016 05:50
> To: Doerr, Martin <martin.doerr at sap.com>
> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
> Hi Martin,
>
> On 23/05/2016 7:29 PM, Doerr, Martin wrote:
>> Hi David,
>>
>> here's the new webrev:
>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/
>
> There seems to be some confusion. You've moved the jbyte
> Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but
> the comments from Andrew and Kim were about moving the unsigned
> Atomic::cmpxchg version. ??
>
> Aside: In the changeset contributor's have to be specified by "email
> address" or "name <email address>", OpenJDK user names are not accepted.
> I think Andrew should also be listed there for the Aarch64 component.
>
> Thanks,
> David
>
>> Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change.
>>
>> Thanks for your time and your support.
>>
>> Best regards,
>> Martin
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Samstag, 21. Mai 2016 01:10
>> To: Doerr, Martin <martin.doerr at sap.com>
>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>
>> Hi Martin,
>>
>> Are you in a position to make the change now suggested by both Kim and
>> Andrew? Can you also include the Aarch64 code that Andrew provided:
>>
>> http://cr.openjdk.java.net/~aph/8154736
>>
>> I'd like to get this finalized so it is ready to push as soon as the
>> process allows it to.
>>
>> Thanks,
>> David
>>
>> On 20/05/2016 8:03 AM, Kim Barrett wrote:
>>>> On May 18, 2016, at 6:12 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>>>
>>>> Hi Kim,
>>>>
>>>> thank you very much for the detailed review.
>>>>
>>>> I agree with your comments and I have made all your requested changes here:
>>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>>>>
>>>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
>>>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.
>>>>
>>>> For performance optimization, we should better use (or introduce additional) enum values.
>>>
>>> ------------------------------------------------------------------------------
>>> There doesn't seem to have been any change for this earlier comment.
>>>
>>> src/share/vm/runtime/atomic.cpp
>>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>>>  60                            volatile unsigned int* dest, unsigned int compare_value,
>>>  61                            cmpxchg_memory_order order) {
>>>
>>> I'm surprised this was ever out-of-line. But with this change it's
>>> quite bad to be out-of-line, as that's going to kill the constant
>>> propogation of the order value.
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> Other than that, looks good.
>>>
>>>
>>>
>>>

From david.holmes at oracle.com  Tue May 24 12:26:31 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 24 May 2016 22:26:31 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
	<267a624c-626f-4238-0166-baa14ff4b412@oracle.com>
	<bd436243ef584df3bb12ea2a0bf6a7a6@DEWDFE13DE14.global.corp.sap>
	<9cff0b75-e234-e789-910d-d86154bba834@oracle.com>
	<fd5eed458a8f4dedb8ea1bd42f822c4e@DEWDFE13DE14.global.corp.sap>
	<275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com>
	<9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap>
Message-ID: <fb6bd58f-8aab-81d9-6c67-72d852bd8d13@oracle.com>

Hi Martin,

On 24/05/2016 8:21 PM, Doerr, Martin wrote:
> Hi David,
>
> it was moved for the same reason as the jint version of cmpxchg: It passes the memory order to the jint version.
> It may look large in terms of C++ code, but there's not much substantial content.
> I can only see a loop which calls the jint version + a bunch of very simple operations.
> Why shouldn't we give compilers a chance to inline and possibly optimize some of the simple operations and especially to eliminate the order check?

I think this forces the compiler to inline it, not just "gives it a 
chance". But I'll leave it to those more knowledgeable about the 
compiler side of this to comment.

But if we're making these changes can you delete the Atomic::add(jlong) 
- it is unused and incorrect as discussed here:

http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-February/021620.html

Thanks,
David

> Best regards,
> Martin
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Dienstag, 24. Mai 2016 12:04
> To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley (aph at redhat.com) <aph at redhat.com>
> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
> On 24/05/2016 7:37 PM, Doerr, Martin wrote:
>> Hi David and Andrew,
>>
>> sorry for missing this one. There were too many emails.
>>
>> After moving the jint version as well, there was not much left of atomic.cpp.
>> I think it doesn't make any sense to keep a couple of trivial functions in the cpp file.
>> Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file.
>
> Sorry I don't understand why the jbyte cmpxchg_general was moved to the
> .inline.hpp file - it seems far too big to be inlined.
>
> David
>
>> Webrev is here:
>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/
>>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Dienstag, 24. Mai 2016 05:50
>> To: Doerr, Martin <martin.doerr at sap.com>
>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>
>> Hi Martin,
>>
>> On 23/05/2016 7:29 PM, Doerr, Martin wrote:
>>> Hi David,
>>>
>>> here's the new webrev:
>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/
>>
>> There seems to be some confusion. You've moved the jbyte
>> Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but
>> the comments from Andrew and Kim were about moving the unsigned
>> Atomic::cmpxchg version. ??
>>
>> Aside: In the changeset contributor's have to be specified by "email
>> address" or "name <email address>", OpenJDK user names are not accepted.
>> I think Andrew should also be listed there for the Aarch64 component.
>>
>> Thanks,
>> David
>>
>>> Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change.
>>>
>>> Thanks for your time and your support.
>>>
>>> Best regards,
>>> Martin
>>>
>>> -----Original Message-----
>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>> Sent: Samstag, 21. Mai 2016 01:10
>>> To: Doerr, Martin <martin.doerr at sap.com>
>>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>
>>> Hi Martin,
>>>
>>> Are you in a position to make the change now suggested by both Kim and
>>> Andrew? Can you also include the Aarch64 code that Andrew provided:
>>>
>>> http://cr.openjdk.java.net/~aph/8154736
>>>
>>> I'd like to get this finalized so it is ready to push as soon as the
>>> process allows it to.
>>>
>>> Thanks,
>>> David
>>>
>>> On 20/05/2016 8:03 AM, Kim Barrett wrote:
>>>>> On May 18, 2016, at 6:12 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>>>>
>>>>> Hi Kim,
>>>>>
>>>>> thank you very much for the detailed review.
>>>>>
>>>>> I agree with your comments and I have made all your requested changes here:
>>>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>>>>>
>>>>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
>>>>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.
>>>>>
>>>>> For performance optimization, we should better use (or introduce additional) enum values.
>>>>
>>>> ------------------------------------------------------------------------------
>>>> There doesn't seem to have been any change for this earlier comment.
>>>>
>>>> src/share/vm/runtime/atomic.cpp
>>>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>>>>  60                            volatile unsigned int* dest, unsigned int compare_value,
>>>>  61                            cmpxchg_memory_order order) {
>>>>
>>>> I'm surprised this was ever out-of-line. But with this change it's
>>>> quite bad to be out-of-line, as that's going to kill the constant
>>>> propogation of the order value.
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> Other than that, looks good.
>>>>
>>>>
>>>>
>>>>

From martin.doerr at sap.com  Tue May 24 13:06:45 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 24 May 2016 13:06:45 +0000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <fb6bd58f-8aab-81d9-6c67-72d852bd8d13@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
	<267a624c-626f-4238-0166-baa14ff4b412@oracle.com>
	<bd436243ef584df3bb12ea2a0bf6a7a6@DEWDFE13DE14.global.corp.sap>
	<9cff0b75-e234-e789-910d-d86154bba834@oracle.com>
	<fd5eed458a8f4dedb8ea1bd42f822c4e@DEWDFE13DE14.global.corp.sap>
	<275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com>
	<9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap>
	<fb6bd58f-8aab-81d9-6c67-72d852bd8d13@oracle.com>
Message-ID: <ac03790fca854131a77465638cab6972@DEWDFE13DE14.global.corp.sap>

Hi David,

unfortunately, Atomic::add(jlong) is used by mallocTracker.hpp (e.g. line 56). Removing it breaks the build.

But I could change it as follows:
inline jlong Atomic::add(jlong add_value, volatile jlong* dest) {
#ifdef _LP64
  return (jlong) add_ptr((intptr_t) add_value, (volatile intptr_t*) dest);
#else
  jlong old = load(dest);
  jlong new_value = old + add_value;
  while (old != cmpxchg(new_value, dest, old)) {
    old = load(dest);
    new_value = old + add_value;
  }
  return new_value;
#endif
}

Best regards,
Martin


-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Dienstag, 24. Mai 2016 14:27
To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley (aph at redhat.com) <aph at redhat.com>
Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg

Hi Martin,

On 24/05/2016 8:21 PM, Doerr, Martin wrote:
> Hi David,
>
> it was moved for the same reason as the jint version of cmpxchg: It passes the memory order to the jint version.
> It may look large in terms of C++ code, but there's not much substantial content.
> I can only see a loop which calls the jint version + a bunch of very simple operations.
> Why shouldn't we give compilers a chance to inline and possibly optimize some of the simple operations and especially to eliminate the order check?

I think this forces the compiler to inline it, not just "gives it a 
chance". But I'll leave it to those more knowledgeable about the 
compiler side of this to comment.

But if we're making these changes can you delete the Atomic::add(jlong) 
- it is unused and incorrect as discussed here:

http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-February/021620.html

Thanks,
David

> Best regards,
> Martin
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Dienstag, 24. Mai 2016 12:04
> To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley (aph at redhat.com) <aph at redhat.com>
> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
> On 24/05/2016 7:37 PM, Doerr, Martin wrote:
>> Hi David and Andrew,
>>
>> sorry for missing this one. There were too many emails.
>>
>> After moving the jint version as well, there was not much left of atomic.cpp.
>> I think it doesn't make any sense to keep a couple of trivial functions in the cpp file.
>> Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file.
>
> Sorry I don't understand why the jbyte cmpxchg_general was moved to the
> .inline.hpp file - it seems far too big to be inlined.
>
> David
>
>> Webrev is here:
>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/
>>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Dienstag, 24. Mai 2016 05:50
>> To: Doerr, Martin <martin.doerr at sap.com>
>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>
>> Hi Martin,
>>
>> On 23/05/2016 7:29 PM, Doerr, Martin wrote:
>>> Hi David,
>>>
>>> here's the new webrev:
>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/
>>
>> There seems to be some confusion. You've moved the jbyte
>> Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but
>> the comments from Andrew and Kim were about moving the unsigned
>> Atomic::cmpxchg version. ??
>>
>> Aside: In the changeset contributor's have to be specified by "email
>> address" or "name <email address>", OpenJDK user names are not accepted.
>> I think Andrew should also be listed there for the Aarch64 component.
>>
>> Thanks,
>> David
>>
>>> Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change.
>>>
>>> Thanks for your time and your support.
>>>
>>> Best regards,
>>> Martin
>>>
>>> -----Original Message-----
>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>> Sent: Samstag, 21. Mai 2016 01:10
>>> To: Doerr, Martin <martin.doerr at sap.com>
>>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>
>>> Hi Martin,
>>>
>>> Are you in a position to make the change now suggested by both Kim and
>>> Andrew? Can you also include the Aarch64 code that Andrew provided:
>>>
>>> http://cr.openjdk.java.net/~aph/8154736
>>>
>>> I'd like to get this finalized so it is ready to push as soon as the
>>> process allows it to.
>>>
>>> Thanks,
>>> David
>>>
>>> On 20/05/2016 8:03 AM, Kim Barrett wrote:
>>>>> On May 18, 2016, at 6:12 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>>>>
>>>>> Hi Kim,
>>>>>
>>>>> thank you very much for the detailed review.
>>>>>
>>>>> I agree with your comments and I have made all your requested changes here:
>>>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>>>>>
>>>>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
>>>>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.
>>>>>
>>>>> For performance optimization, we should better use (or introduce additional) enum values.
>>>>
>>>> ------------------------------------------------------------------------------
>>>> There doesn't seem to have been any change for this earlier comment.
>>>>
>>>> src/share/vm/runtime/atomic.cpp
>>>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>>>>  60                            volatile unsigned int* dest, unsigned int compare_value,
>>>>  61                            cmpxchg_memory_order order) {
>>>>
>>>> I'm surprised this was ever out-of-line. But with this change it's
>>>> quite bad to be out-of-line, as that's going to kill the constant
>>>> propogation of the order value.
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> Other than that, looks good.
>>>>
>>>>
>>>>
>>>>

From gromero at linux.vnet.ibm.com  Tue May 24 13:59:58 2016
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Tue, 24 May 2016 10:59:58 -0300
Subject: RFR(M): PPC64: improve array copy stubs by using vector
	instructions
In-Reply-To: <8b32b8882e964fd0b2ac0f22c94e389a@DEWDFE13DE09.global.corp.sap>
References: <201605231422.u4NEIb1g013944@mx0a-001b2d01.pphosted.com>
	<25cb11dfe7624a4a8848d049626413e7@DEWDFE13DE14.global.corp.sap>
	<201605231553.u4NFq1Rr022712@mx0a-001b2d01.pphosted.com>
	<8b32b8882e964fd0b2ac0f22c94e389a@DEWDFE13DE09.global.corp.sap>
Message-ID: <201605241400.u4ODtLeu022907@mx0a-001b2d01.pphosted.com>

Hi Goetz

I'm happy to be contributing to the ppc port!

Sorry, I didn't realize that bugID was missing in the subject. Next time
I'll pay attention on that and also address the RFR to everybody, sure.
Thanks for point that out.

Thanks a lot for reviewing the change.

Best regards,
Gustavo

On 24-05-2016 05:29, Lindenmaier, Goetz wrote:
> Hi Gustavo, 
> 
> thanks for contributing this optimization to the ppc port!
> 
> The change looks good, nice work.
> 
> Next time, please use correct subject in the RFR mail, the bugID is missing.
> Also, address the RFR to everybody.  This one you addressed to Martin.
> In general, you need several reviews.
> Martin, thanks for reviewing though!
> 
> Martin, I think you can push this as it's ppc-only.
> 
> Best regards,
>   Goetz.
> 
> 
> 
>> -----Original Message-----
>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com]
>> Sent: Montag, 23. Mai 2016 17:54
>> To: Doerr, Martin <martin.doerr at sap.com>; ppc-aix-port-
>> dev at openjdk.java.net; hotspot-dev at openjdk.java.net
>> Cc: Simonis, Volker <volker.simonis at sap.com>; brenohl at br.ibm.com;
>> Lindenmaier, Goetz <goetz.lindenmaier at sap.com>
>> Subject: Re: RFR(M): PPC64: improve array copy stubs by using vector
>> instructions
>>
>> Hi Martin
>>
>> Thank you for reviewing the change.
>>
>> Best regards,
>> Gustavo
>>
>> On 23-05-2016 12:51, Doerr, Martin wrote:
>>> Hi Gustavo,
>>>
>>> thanks for implementing it and taking care of my concerns. Looks good,
>> now.
>>> I will run tests and I can sponsor it after it was reviewed.
>>>
>>> Best regards,
>>> Martin
>>>
>>> -----Original Message-----
>>> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com]
>>> Sent: Montag, 23. Mai 2016 16:22
>>> To: Doerr, Martin <martin.doerr at sap.com>; ppc-aix-port-
>> dev at openjdk.java.net; hotspot-dev at openjdk.java.net
>>> Cc: Simonis, Volker <volker.simonis at sap.com>; brenohl at br.ibm.com
>>> Subject: RFR(M): PPC64: improve array copy stubs by using vector
>> instructions
>>>
>>> Hi Martin
>>>
>>> Could you please host and review this webrev?
>>>
>>> Summary:
>>>
>>> * Add VSR registers to be used with VSX instruction set;
>>> * Add VSX load/store instructions (lxvd2x/stxvd2x) to mass copy in
>>>   the stub for disjoint short copy in order to improve it.
>>>
>>> http://81.de.7a9f.ip4.static.sl-reverse.com./8154156/9/v4/
>>>
>>> Thank you!
>>>
>>> Best regards,
>>> Gustavo
>>>
> 


From ENOMIKI at jp.ibm.com  Tue May 24 15:15:13 2016
From: ENOMIKI at jp.ibm.com (Miki M Enoki)
Date: Wed, 25 May 2016 00:15:13 +0900
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <573A034C.9060602@br.ibm.com>
References: <56FEDBB3.5030106@linux.vnet.ibm.com><CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com><57339EE1.2040500@linux.vnet.ibm.com>
	<da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
	<OFF20F9685.DD164547-ON49257FB5.001F4757-49257FB5.00206401@notes.na.collabserv.com>
	<573A034C.9060602@br.ibm.com>
Message-ID: <201605241515.u4OFFOkQ016415@d19av06.sagamino.japan.ibm.com>


Hi Breno,

Thank you for your reply.

>The same mechanism could be used to copy arrays of short elements, as
Gustavo was
>working on. Do you agree?

I think the mechanism is different with type (byte, short, int, long...).
Gustavo will apply a pach with VSX for short array copy, so it would be
reasonable to use VSX instruction for long array copy, too.

My coworker is also creating byte and int arraycopy with VSX. He will post
an email to this mailing list.
I appreciate it if our patch for byte, int and long copy is applied to
OpenJDK.


Best regards,
Miki


From:	Breno Leitao <brenohl at br.ibm.com>
To:	Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin"
            <martin.doerr at sap.com>,
Cc:	Gustavo Romero <gromero at linux.vnet.ibm.com>, Volker Simonis
            <volker.simonis at gmail.com>, "Simonis, Volker"
            <volker.simonis at sap.com>, "ppc-aix-port-dev at openjdk.java.net"
            <ppc-aix-port-dev at openjdk.java.net>,
            "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>
Date:	2016/05/17 02:29
Subject:	Re: PPC64 VSX load/store instructions in stubs


Hi Miki,

On 05/16/2016 02:53 AM, Miki M Enoki wrote:
> I also implemented VSX disjoint long arraycopy.
> I appreciate it if it is applied to OpenJDK, too.

Thanks for the summarized information, this is helpful. Based on your plot,
I
understand we can split the whole scenario in two:

  * Array size smaller than 4k, and then use VSX instructions to perform
copy
  * Array size bigger than 4k, and then use VMX instructions to perform
copy

The same mechanism could be used to copy arrays of short elements, as
Gustavo was
working on. Do you agree?

That said, I understand that a new patch should be generated that
contemplates
both cases on a single patch, ready to be applied on OpenJDK 9 source code.
Hence
a webrev should be generated mapping to bug id
https://bugs.openjdk.java.net/browse/JDK-8154156

If you need any help on the webrev[1] creation and hosting, Gustavo might
help,
since he did this process already.

[1] http://openjdk.java.net/guide/webrevHelp.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160525/0765aa07/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160525/0765aa07/graycol.gif>

From david.holmes at oracle.com  Tue May 24 20:18:49 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 25 May 2016 06:18:49 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <ac03790fca854131a77465638cab6972@DEWDFE13DE14.global.corp.sap>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
	<267a624c-626f-4238-0166-baa14ff4b412@oracle.com>
	<bd436243ef584df3bb12ea2a0bf6a7a6@DEWDFE13DE14.global.corp.sap>
	<9cff0b75-e234-e789-910d-d86154bba834@oracle.com>
	<fd5eed458a8f4dedb8ea1bd42f822c4e@DEWDFE13DE14.global.corp.sap>
	<275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com>
	<9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap>
	<fb6bd58f-8aab-81d9-6c67-72d852bd8d13@oracle.com>
	<ac03790fca854131a77465638cab6972@DEWDFE13DE14.global.corp.sap>
Message-ID: <b2af2ca4-f7b1-6a3a-005d-f2513957fe39@oracle.com>

On 24/05/2016 11:06 PM, Doerr, Martin wrote:
> Hi David,
>
> unfortunately, Atomic::add(jlong) is used by mallocTracker.hpp (e.g. line 56). Removing it breaks the build.

Yeah I only discovered that this morning when I checked my test build 
results. That in itself is a bug as Zhengyu has noted.

> But I could change it as follows:

No - thanks - lets just leave this part for another day.

Thanks,
David

> inline jlong Atomic::add(jlong add_value, volatile jlong* dest) {
> #ifdef _LP64
>   return (jlong) add_ptr((intptr_t) add_value, (volatile intptr_t*) dest);
> #else
>   jlong old = load(dest);
>   jlong new_value = old + add_value;
>   while (old != cmpxchg(new_value, dest, old)) {
>     old = load(dest);
>     new_value = old + add_value;
>   }
>   return new_value;
> #endif
> }
>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Dienstag, 24. Mai 2016 14:27
> To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley (aph at redhat.com) <aph at redhat.com>
> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
> Hi Martin,
>
> On 24/05/2016 8:21 PM, Doerr, Martin wrote:
>> Hi David,
>>
>> it was moved for the same reason as the jint version of cmpxchg: It passes the memory order to the jint version.
>> It may look large in terms of C++ code, but there's not much substantial content.
>> I can only see a loop which calls the jint version + a bunch of very simple operations.
>> Why shouldn't we give compilers a chance to inline and possibly optimize some of the simple operations and especially to eliminate the order check?
>
> I think this forces the compiler to inline it, not just "gives it a
> chance". But I'll leave it to those more knowledgeable about the
> compiler side of this to comment.
>
> But if we're making these changes can you delete the Atomic::add(jlong)
> - it is unused and incorrect as discussed here:
>
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-February/021620.html
>
> Thanks,
> David
>
>> Best regards,
>> Martin
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Dienstag, 24. Mai 2016 12:04
>> To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley (aph at redhat.com) <aph at redhat.com>
>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>
>> On 24/05/2016 7:37 PM, Doerr, Martin wrote:
>>> Hi David and Andrew,
>>>
>>> sorry for missing this one. There were too many emails.
>>>
>>> After moving the jint version as well, there was not much left of atomic.cpp.
>>> I think it doesn't make any sense to keep a couple of trivial functions in the cpp file.
>>> Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file.
>>
>> Sorry I don't understand why the jbyte cmpxchg_general was moved to the
>> .inline.hpp file - it seems far too big to be inlined.
>>
>> David
>>
>>> Webrev is here:
>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>> Sent: Dienstag, 24. Mai 2016 05:50
>>> To: Doerr, Martin <martin.doerr at sap.com>
>>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>
>>> Hi Martin,
>>>
>>> On 23/05/2016 7:29 PM, Doerr, Martin wrote:
>>>> Hi David,
>>>>
>>>> here's the new webrev:
>>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/
>>>
>>> There seems to be some confusion. You've moved the jbyte
>>> Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but
>>> the comments from Andrew and Kim were about moving the unsigned
>>> Atomic::cmpxchg version. ??
>>>
>>> Aside: In the changeset contributor's have to be specified by "email
>>> address" or "name <email address>", OpenJDK user names are not accepted.
>>> I think Andrew should also be listed there for the Aarch64 component.
>>>
>>> Thanks,
>>> David
>>>
>>>> Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change.
>>>>
>>>> Thanks for your time and your support.
>>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>> Sent: Samstag, 21. Mai 2016 01:10
>>>> To: Doerr, Martin <martin.doerr at sap.com>
>>>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>>
>>>> Hi Martin,
>>>>
>>>> Are you in a position to make the change now suggested by both Kim and
>>>> Andrew? Can you also include the Aarch64 code that Andrew provided:
>>>>
>>>> http://cr.openjdk.java.net/~aph/8154736
>>>>
>>>> I'd like to get this finalized so it is ready to push as soon as the
>>>> process allows it to.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 20/05/2016 8:03 AM, Kim Barrett wrote:
>>>>>> On May 18, 2016, at 6:12 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>>>>>
>>>>>> Hi Kim,
>>>>>>
>>>>>> thank you very much for the detailed review.
>>>>>>
>>>>>> I agree with your comments and I have made all your requested changes here:
>>>>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>>>>>>
>>>>>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
>>>>>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.
>>>>>>
>>>>>> For performance optimization, we should better use (or introduce additional) enum values.
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> There doesn't seem to have been any change for this earlier comment.
>>>>>
>>>>> src/share/vm/runtime/atomic.cpp
>>>>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>>>>>  60                            volatile unsigned int* dest, unsigned int compare_value,
>>>>>  61                            cmpxchg_memory_order order) {
>>>>>
>>>>> I'm surprised this was ever out-of-line. But with this change it's
>>>>> quite bad to be out-of-line, as that's going to kill the constant
>>>>> propogation of the order value.
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>> Other than that, looks good.
>>>>>
>>>>>
>>>>>
>>>>>

From david.holmes at oracle.com  Tue May 24 20:19:30 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 25 May 2016 06:19:30 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <57447B34.3080608@redhat.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
	<267a624c-626f-4238-0166-baa14ff4b412@oracle.com>
	<bd436243ef584df3bb12ea2a0bf6a7a6@DEWDFE13DE14.global.corp.sap>
	<9cff0b75-e234-e789-910d-d86154bba834@oracle.com>
	<fd5eed458a8f4dedb8ea1bd42f822c4e@DEWDFE13DE14.global.corp.sap>
	<275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com>
	<9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap>
	<fb6bd58f-8aab-81d9-6c67-72d852bd8d13@oracle.com>
	<ac03790fca854131a77465638cab6972@DEWDFE13DE14.global.corp.sap>
	<5744758A.3080405@redhat.com> <57447B34.3080608@redhat.com>
Message-ID: <78cd5827-0b3d-da2b-21ea-8a01edde0405@oracle.com>

On 25/05/2016 2:03 AM, Zhengyu Gu wrote:
>
> On 05/24/2016 11:38 AM, Zhengyu Gu wrote:
>>
>>
>> On 05/24/2016 09:06 AM, Doerr, Martin wrote:
>>> Hi David,
>>>
>>> unfortunately, Atomic::add(jlong) is used by mallocTracker.hpp (e.g.
>>> line 56). Removing it breaks the build.
>> It should be replaced with size_t version in mallocTracker.hpp.
>>
> I created https://bugs.openjdk.java.net/browse/JDK-8157709 for this.

Thanks Zhengyu.

David

> -Zhengyu
>
>> -Zhengyu
>>
>>
>>
>>>
>>> But I could change it as follows:
>>> inline jlong Atomic::add(jlong add_value, volatile jlong* dest) {
>>> #ifdef _LP64
>>>    return (jlong) add_ptr((intptr_t) add_value, (volatile intptr_t*)
>>> dest);
>>> #else
>>>    jlong old = load(dest);
>>>    jlong new_value = old + add_value;
>>>    while (old != cmpxchg(new_value, dest, old)) {
>>>      old = load(dest);
>>>      new_value = old + add_value;
>>>    }
>>>    return new_value;
>>> #endif
>>> }
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>> Sent: Dienstag, 24. Mai 2016 14:27
>>> To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley
>>> (aph at redhat.com) <aph at redhat.com>
>>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison
>>> <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net;
>>> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>
>>> Hi Martin,
>>>
>>> On 24/05/2016 8:21 PM, Doerr, Martin wrote:
>>>> Hi David,
>>>>
>>>> it was moved for the same reason as the jint version of cmpxchg: It
>>>> passes the memory order to the jint version.
>>>> It may look large in terms of C++ code, but there's not much
>>>> substantial content.
>>>> I can only see a loop which calls the jint version + a bunch of very
>>>> simple operations.
>>>> Why shouldn't we give compilers a chance to inline and possibly
>>>> optimize some of the simple operations and especially to eliminate
>>>> the order check?
>>> I think this forces the compiler to inline it, not just "gives it a
>>> chance". But I'll leave it to those more knowledgeable about the
>>> compiler side of this to comment.
>>>
>>> But if we're making these changes can you delete the Atomic::add(jlong)
>>> - it is unused and incorrect as discussed here:
>>>
>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-February/021620.html
>>>
>>>
>>> Thanks,
>>> David
>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>> Sent: Dienstag, 24. Mai 2016 12:04
>>>> To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley
>>>> (aph at redhat.com) <aph at redhat.com>
>>>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison
>>>> <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net;
>>>> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>>
>>>> On 24/05/2016 7:37 PM, Doerr, Martin wrote:
>>>>> Hi David and Andrew,
>>>>>
>>>>> sorry for missing this one. There were too many emails.
>>>>>
>>>>> After moving the jint version as well, there was not much left of
>>>>> atomic.cpp.
>>>>> I think it doesn't make any sense to keep a couple of trivial
>>>>> functions in the cpp file.
>>>>> Therefore, I have removed atomic.cpp and moved the remaining small
>>>>> functions into the inline file.
>>>> Sorry I don't understand why the jbyte cmpxchg_general was moved to the
>>>> .inline.hpp file - it seems far too big to be inlined.
>>>>
>>>> David
>>>>
>>>>> Webrev is here:
>>>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/
>>>>>
>>>>> Best regards,
>>>>> Martin
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>> Sent: Dienstag, 24. Mai 2016 05:50
>>>>> To: Doerr, Martin <martin.doerr at sap.com>
>>>>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison
>>>>> <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net;
>>>>> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>>>
>>>>> Hi Martin,
>>>>>
>>>>> On 23/05/2016 7:29 PM, Doerr, Martin wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> here's the new webrev:
>>>>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/
>>>>> There seems to be some confusion. You've moved the jbyte
>>>>> Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file,
>>>>> but
>>>>> the comments from Andrew and Kim were about moving the unsigned
>>>>> Atomic::cmpxchg version. ??
>>>>>
>>>>> Aside: In the changeset contributor's have to be specified by "email
>>>>> address" or "name <email address>", OpenJDK user names are not
>>>>> accepted.
>>>>> I think Andrew should also be listed there for the Aarch64 component.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Btw.: The jbyte version of cmpxchg can be implemented on aarch
>>>>>> like on ppc where we emulate the byte access by a 4 byte access
>>>>>> (lwarx/stwcx). But that should better be done in a separate change.
>>>>>>
>>>>>> Thanks for your time and your support.
>>>>>>
>>>>>> Best regards,
>>>>>> Martin
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>>> Sent: Samstag, 21. Mai 2016 01:10
>>>>>> To: Doerr, Martin <martin.doerr at sap.com>
>>>>>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison
>>>>>> <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net;
>>>>>> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>>>>
>>>>>> Hi Martin,
>>>>>>
>>>>>> Are you in a position to make the change now suggested by both Kim
>>>>>> and
>>>>>> Andrew? Can you also include the Aarch64 code that Andrew provided:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~aph/8154736
>>>>>>
>>>>>> I'd like to get this finalized so it is ready to push as soon as the
>>>>>> process allows it to.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 20/05/2016 8:03 AM, Kim Barrett wrote:
>>>>>>>> On May 18, 2016, at 6:12 AM, Doerr, Martin
>>>>>>>> <martin.doerr at sap.com> wrote:
>>>>>>>>
>>>>>>>> Hi Kim,
>>>>>>>>
>>>>>>>> thank you very much for the detailed review.
>>>>>>>>
>>>>>>>> I agree with your comments and I have made all your requested
>>>>>>>> changes here:
>>>>>>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>>>>>>>>
>>>>>>>>
>>>>>>>> It's correct that the change changes the semantics of the
>>>>>>>> conservative cmpxchg. In case of failure, we also execute the
>>>>>>>> sync instruction, now.
>>>>>>>> Advantage is that the new implementation is maximum conservative
>>>>>>>> by default. I think this makes sense as long as the semantics of
>>>>>>>> the hotspot C++ cmpxchg are not clearly specified.
>>>>>>>>
>>>>>>>> For performance optimization, we should better use (or introduce
>>>>>>>> additional) enum values.
>>>>>>> ------------------------------------------------------------------------------
>>>>>>>
>>>>>>> There doesn't seem to have been any change for this earlier comment.
>>>>>>>
>>>>>>> src/share/vm/runtime/atomic.cpp
>>>>>>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>>>>>>>   60                            volatile unsigned int* dest,
>>>>>>> unsigned int compare_value,
>>>>>>>   61                            cmpxchg_memory_order order) {
>>>>>>>
>>>>>>> I'm surprised this was ever out-of-line. But with this change it's
>>>>>>> quite bad to be out-of-line, as that's going to kill the constant
>>>>>>> propogation of the order value.
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> Other than that, looks good.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>
>

From zgu at redhat.com  Tue May 24 15:38:50 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 24 May 2016 11:38:50 -0400
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <ac03790fca854131a77465638cab6972@DEWDFE13DE14.global.corp.sap>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
	<267a624c-626f-4238-0166-baa14ff4b412@oracle.com>
	<bd436243ef584df3bb12ea2a0bf6a7a6@DEWDFE13DE14.global.corp.sap>
	<9cff0b75-e234-e789-910d-d86154bba834@oracle.com>
	<fd5eed458a8f4dedb8ea1bd42f822c4e@DEWDFE13DE14.global.corp.sap>
	<275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com>
	<9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap>
	<fb6bd58f-8aab-81d9-6c67-72d852bd8d13@oracle.com>
	<ac03790fca854131a77465638cab6972@DEWDFE13DE14.global.corp.sap>
Message-ID: <5744758A.3080405@redhat.com>


On 05/24/2016 09:06 AM, Doerr, Martin wrote:
> Hi David,
>
> unfortunately, Atomic::add(jlong) is used by mallocTracker.hpp (e.g. line 56). Removing it breaks the build.
It should be replaced with size_t version in mallocTracker.hpp.

-Zhengyu


>
> But I could change it as follows:
> inline jlong Atomic::add(jlong add_value, volatile jlong* dest) {
> #ifdef _LP64
>    return (jlong) add_ptr((intptr_t) add_value, (volatile intptr_t*) dest);
> #else
>    jlong old = load(dest);
>    jlong new_value = old + add_value;
>    while (old != cmpxchg(new_value, dest, old)) {
>      old = load(dest);
>      new_value = old + add_value;
>    }
>    return new_value;
> #endif
> }
>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Dienstag, 24. Mai 2016 14:27
> To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley (aph at redhat.com) <aph at redhat.com>
> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>
> Hi Martin,
>
> On 24/05/2016 8:21 PM, Doerr, Martin wrote:
>> Hi David,
>>
>> it was moved for the same reason as the jint version of cmpxchg: It passes the memory order to the jint version.
>> It may look large in terms of C++ code, but there's not much substantial content.
>> I can only see a loop which calls the jint version + a bunch of very simple operations.
>> Why shouldn't we give compilers a chance to inline and possibly optimize some of the simple operations and especially to eliminate the order check?
> I think this forces the compiler to inline it, not just "gives it a
> chance". But I'll leave it to those more knowledgeable about the
> compiler side of this to comment.
>
> But if we're making these changes can you delete the Atomic::add(jlong)
> - it is unused and incorrect as discussed here:
>
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-February/021620.html
>
> Thanks,
> David
>
>> Best regards,
>> Martin
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Dienstag, 24. Mai 2016 12:04
>> To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley (aph at redhat.com) <aph at redhat.com>
>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>
>> On 24/05/2016 7:37 PM, Doerr, Martin wrote:
>>> Hi David and Andrew,
>>>
>>> sorry for missing this one. There were too many emails.
>>>
>>> After moving the jint version as well, there was not much left of atomic.cpp.
>>> I think it doesn't make any sense to keep a couple of trivial functions in the cpp file.
>>> Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file.
>> Sorry I don't understand why the jbyte cmpxchg_general was moved to the
>> .inline.hpp file - it seems far too big to be inlined.
>>
>> David
>>
>>> Webrev is here:
>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>> Sent: Dienstag, 24. Mai 2016 05:50
>>> To: Doerr, Martin <martin.doerr at sap.com>
>>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>
>>> Hi Martin,
>>>
>>> On 23/05/2016 7:29 PM, Doerr, Martin wrote:
>>>> Hi David,
>>>>
>>>> here's the new webrev:
>>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/
>>> There seems to be some confusion. You've moved the jbyte
>>> Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, but
>>> the comments from Andrew and Kim were about moving the unsigned
>>> Atomic::cmpxchg version. ??
>>>
>>> Aside: In the changeset contributor's have to be specified by "email
>>> address" or "name <email address>", OpenJDK user names are not accepted.
>>> I think Andrew should also be listed there for the Aarch64 component.
>>>
>>> Thanks,
>>> David
>>>
>>>> Btw.: The jbyte version of cmpxchg can be implemented on aarch like on ppc where we emulate the byte access by a 4 byte access (lwarx/stwcx). But that should better be done in a separate change.
>>>>
>>>> Thanks for your time and your support.
>>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>> Sent: Samstag, 21. Mai 2016 01:10
>>>> To: Doerr, Martin <martin.doerr at sap.com>
>>>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>>
>>>> Hi Martin,
>>>>
>>>> Are you in a position to make the change now suggested by both Kim and
>>>> Andrew? Can you also include the Aarch64 code that Andrew provided:
>>>>
>>>> http://cr.openjdk.java.net/~aph/8154736
>>>>
>>>> I'd like to get this finalized so it is ready to push as soon as the
>>>> process allows it to.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 20/05/2016 8:03 AM, Kim Barrett wrote:
>>>>>> On May 18, 2016, at 6:12 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>>>>>
>>>>>> Hi Kim,
>>>>>>
>>>>>> thank you very much for the detailed review.
>>>>>>
>>>>>> I agree with your comments and I have made all your requested changes here:
>>>>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/
>>>>>>
>>>>>> It's correct that the change changes the semantics of the conservative cmpxchg. In case of failure, we also execute the sync instruction, now.
>>>>>> Advantage is that the new implementation is maximum conservative by default. I think this makes sense as long as the semantics of the hotspot C++ cmpxchg are not clearly specified.
>>>>>>
>>>>>> For performance optimization, we should better use (or introduce additional) enum values.
>>>>> ------------------------------------------------------------------------------
>>>>> There doesn't seem to have been any change for this earlier comment.
>>>>>
>>>>> src/share/vm/runtime/atomic.cpp
>>>>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>>>>>   60                            volatile unsigned int* dest, unsigned int compare_value,
>>>>>   61                            cmpxchg_memory_order order) {
>>>>>
>>>>> I'm surprised this was ever out-of-line. But with this change it's
>>>>> quite bad to be out-of-line, as that's going to kill the constant
>>>>> propogation of the order value.
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>> Other than that, looks good.
>>>>>
>>>>>
>>>>>
>>>>>


From zgu at redhat.com  Tue May 24 16:03:00 2016
From: zgu at redhat.com (Zhengyu Gu)
Date: Tue, 24 May 2016 12:03:00 -0400
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <5744758A.3080405@redhat.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
	<267a624c-626f-4238-0166-baa14ff4b412@oracle.com>
	<bd436243ef584df3bb12ea2a0bf6a7a6@DEWDFE13DE14.global.corp.sap>
	<9cff0b75-e234-e789-910d-d86154bba834@oracle.com>
	<fd5eed458a8f4dedb8ea1bd42f822c4e@DEWDFE13DE14.global.corp.sap>
	<275140a8-2e3f-fda9-6697-f320a7b25027@oracle.com>
	<9dac5b3e08584f8f8447749175acf964@DEWDFE13DE14.global.corp.sap>
	<fb6bd58f-8aab-81d9-6c67-72d852bd8d13@oracle.com>
	<ac03790fca854131a77465638cab6972@DEWDFE13DE14.global.corp.sap>
	<5744758A.3080405@redhat.com>
Message-ID: <57447B34.3080608@redhat.com>


On 05/24/2016 11:38 AM, Zhengyu Gu wrote:
>
>
> On 05/24/2016 09:06 AM, Doerr, Martin wrote:
>> Hi David,
>>
>> unfortunately, Atomic::add(jlong) is used by mallocTracker.hpp (e.g. 
>> line 56). Removing it breaks the build.
> It should be replaced with size_t version in mallocTracker.hpp.
>
I created https://bugs.openjdk.java.net/browse/JDK-8157709 for this.

-Zhengyu

> -Zhengyu
>
>
>
>>
>> But I could change it as follows:
>> inline jlong Atomic::add(jlong add_value, volatile jlong* dest) {
>> #ifdef _LP64
>>    return (jlong) add_ptr((intptr_t) add_value, (volatile intptr_t*) 
>> dest);
>> #else
>>    jlong old = load(dest);
>>    jlong new_value = old + add_value;
>>    while (old != cmpxchg(new_value, dest, old)) {
>>      old = load(dest);
>>      new_value = old + add_value;
>>    }
>>    return new_value;
>> #endif
>> }
>>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Dienstag, 24. Mai 2016 14:27
>> To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley 
>> (aph at redhat.com) <aph at redhat.com>
>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison 
>> <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; 
>> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>
>> Hi Martin,
>>
>> On 24/05/2016 8:21 PM, Doerr, Martin wrote:
>>> Hi David,
>>>
>>> it was moved for the same reason as the jint version of cmpxchg: It 
>>> passes the memory order to the jint version.
>>> It may look large in terms of C++ code, but there's not much 
>>> substantial content.
>>> I can only see a loop which calls the jint version + a bunch of very 
>>> simple operations.
>>> Why shouldn't we give compilers a chance to inline and possibly 
>>> optimize some of the simple operations and especially to eliminate 
>>> the order check?
>> I think this forces the compiler to inline it, not just "gives it a
>> chance". But I'll leave it to those more knowledgeable about the
>> compiler side of this to comment.
>>
>> But if we're making these changes can you delete the Atomic::add(jlong)
>> - it is unused and incorrect as discussed here:
>>
>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-February/021620.html 
>>
>>
>> Thanks,
>> David
>>
>>> Best regards,
>>> Martin
>>>
>>> -----Original Message-----
>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>> Sent: Dienstag, 24. Mai 2016 12:04
>>> To: Doerr, Martin <martin.doerr at sap.com>; Andrew Haley 
>>> (aph at redhat.com) <aph at redhat.com>
>>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison 
>>> <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; 
>>> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>
>>> On 24/05/2016 7:37 PM, Doerr, Martin wrote:
>>>> Hi David and Andrew,
>>>>
>>>> sorry for missing this one. There were too many emails.
>>>>
>>>> After moving the jint version as well, there was not much left of 
>>>> atomic.cpp.
>>>> I think it doesn't make any sense to keep a couple of trivial 
>>>> functions in the cpp file.
>>>> Therefore, I have removed atomic.cpp and moved the remaining small 
>>>> functions into the inline file.
>>> Sorry I don't understand why the jbyte cmpxchg_general was moved to the
>>> .inline.hpp file - it seems far too big to be inlined.
>>>
>>> David
>>>
>>>> Webrev is here:
>>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/
>>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>> Sent: Dienstag, 24. Mai 2016 05:50
>>>> To: Doerr, Martin <martin.doerr at sap.com>
>>>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison 
>>>> <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; 
>>>> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>>
>>>> Hi Martin,
>>>>
>>>> On 23/05/2016 7:29 PM, Doerr, Martin wrote:
>>>>> Hi David,
>>>>>
>>>>> here's the new webrev:
>>>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.04/
>>>> There seems to be some confusion. You've moved the jbyte
>>>> Atomic::cmpxchg_general from the .cpp file to the .inline/hpp file, 
>>>> but
>>>> the comments from Andrew and Kim were about moving the unsigned
>>>> Atomic::cmpxchg version. ??
>>>>
>>>> Aside: In the changeset contributor's have to be specified by "email
>>>> address" or "name <email address>", OpenJDK user names are not 
>>>> accepted.
>>>> I think Andrew should also be listed there for the Aarch64 component.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>> Btw.: The jbyte version of cmpxchg can be implemented on aarch 
>>>>> like on ppc where we emulate the byte access by a 4 byte access 
>>>>> (lwarx/stwcx). But that should better be done in a separate change.
>>>>>
>>>>> Thanks for your time and your support.
>>>>>
>>>>> Best regards,
>>>>> Martin
>>>>>
>>>>> -----Original Message-----
>>>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>>>> Sent: Samstag, 21. Mai 2016 01:10
>>>>> To: Doerr, Martin <martin.doerr at sap.com>
>>>>> Cc: Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison 
>>>>> <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; 
>>>>> hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: Re: RFR(M): 8155949: Support relaxed semantics in cmpxchg
>>>>>
>>>>> Hi Martin,
>>>>>
>>>>> Are you in a position to make the change now suggested by both Kim 
>>>>> and
>>>>> Andrew? Can you also include the Aarch64 code that Andrew provided:
>>>>>
>>>>> http://cr.openjdk.java.net/~aph/8154736
>>>>>
>>>>> I'd like to get this finalized so it is ready to push as soon as the
>>>>> process allows it to.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 20/05/2016 8:03 AM, Kim Barrett wrote:
>>>>>>> On May 18, 2016, at 6:12 AM, Doerr, Martin 
>>>>>>> <martin.doerr at sap.com> wrote:
>>>>>>>
>>>>>>> Hi Kim,
>>>>>>>
>>>>>>> thank you very much for the detailed review.
>>>>>>>
>>>>>>> I agree with your comments and I have made all your requested 
>>>>>>> changes here:
>>>>>>> http://cr.openjdk.java.net/~goetz/wr16/8155949-relaxed_cas/webrev.03/ 
>>>>>>>
>>>>>>>
>>>>>>> It's correct that the change changes the semantics of the 
>>>>>>> conservative cmpxchg. In case of failure, we also execute the 
>>>>>>> sync instruction, now.
>>>>>>> Advantage is that the new implementation is maximum conservative 
>>>>>>> by default. I think this makes sense as long as the semantics of 
>>>>>>> the hotspot C++ cmpxchg are not clearly specified.
>>>>>>>
>>>>>>> For performance optimization, we should better use (or introduce 
>>>>>>> additional) enum values.
>>>>>> ------------------------------------------------------------------------------ 
>>>>>>
>>>>>> There doesn't seem to have been any change for this earlier comment.
>>>>>>
>>>>>> src/share/vm/runtime/atomic.cpp
>>>>>> 59 unsigned Atomic::cmpxchg(unsigned int exchange_value,
>>>>>>   60                            volatile unsigned int* dest, 
>>>>>> unsigned int compare_value,
>>>>>>   61                            cmpxchg_memory_order order) {
>>>>>>
>>>>>> I'm surprised this was ever out-of-line. But with this change it's
>>>>>> quite bad to be out-of-line, as that's going to kill the constant
>>>>>> propogation of the order value.
>>>>>>
>>>>>> ------------------------------------------------------------------------------ 
>>>>>>
>>>>>>
>>>>>> Other than that, looks good.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>


From kim.barrett at oracle.com  Thu May 26 00:04:53 2016
From: kim.barrett at oracle.com (Kim Barrett)
Date: Wed, 25 May 2016 20:04:53 -0400
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <fd5eed458a8f4dedb8ea1bd42f822c4e@DEWDFE13DE14.global.corp.sap>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
	<267a624c-626f-4238-0166-baa14ff4b412@oracle.com>
	<bd436243ef584df3bb12ea2a0bf6a7a6@DEWDFE13DE14.global.corp.sap>
	<9cff0b75-e234-e789-910d-d86154bba834@oracle.com>
	<fd5eed458a8f4dedb8ea1bd42f822c4e@DEWDFE13DE14.global.corp.sap>
Message-ID: <AA657CCB-A729-4558-92C9-3B22FE0753A7@oracle.com>

> On May 24, 2016, at 5:37 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
> 
> Hi David and Andrew,
> 
> sorry for missing this one. There were too many emails.
> 
> After moving the jint version as well, there was not much left of atomic.cpp.
> I think it doesn't make any sense to keep a couple of trivial functions in the cpp file.
> Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file.
> 
> Webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/

------------------------------------------------------------------------------
 100 inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte *dest, jbyte comparand, cmpxchg_memory_order order)

The addition of the order option makes it a bit more obvious that this
does not, and never has, executed any fences in the immediate failure
case, e.g. when

 111   while (cur_as_bytes[offset] == comparand) {

is false on the first iteration.  This seems like a bug.  Assuming it
is, I'm not sure whether this should be dealt with as part of this
changeset, or moved to a separate bug for this (pre-existing) issue.
I think only ARM targets (and zero?) are lacking specialized cmpxchg
on bytes and so use this version?

Sorry I didn't notice this previously.

------------------------------------------------------------------------------

Other than that and the already mentioned (pre-existing) Atomic::add
for jlong return value problem, this looks good.


From david.holmes at oracle.com  Thu May 26 01:09:17 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 26 May 2016 11:09:17 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <AA657CCB-A729-4558-92C9-3B22FE0753A7@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
	<267a624c-626f-4238-0166-baa14ff4b412@oracle.com>
	<bd436243ef584df3bb12ea2a0bf6a7a6@DEWDFE13DE14.global.corp.sap>
	<9cff0b75-e234-e789-910d-d86154bba834@oracle.com>
	<fd5eed458a8f4dedb8ea1bd42f822c4e@DEWDFE13DE14.global.corp.sap>
	<AA657CCB-A729-4558-92C9-3B22FE0753A7@oracle.com>
Message-ID: <eaac651c-dc0c-6346-e158-b3a8e0549c79@oracle.com>

Hi Kim,

On 26/05/2016 10:04 AM, Kim Barrett wrote:
>> On May 24, 2016, at 5:37 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>
>> Hi David and Andrew,
>>
>> sorry for missing this one. There were too many emails.
>>
>> After moving the jint version as well, there was not much left of atomic.cpp.
>> I think it doesn't make any sense to keep a couple of trivial functions in the cpp file.
>> Therefore, I have removed atomic.cpp and moved the remaining small functions into the inline file.
>>
>> Webrev is here:
>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/
>
> ------------------------------------------------------------------------------
>  100 inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte *dest, jbyte comparand, cmpxchg_memory_order order)
>
> The addition of the order option makes it a bit more obvious that this
> does not, and never has, executed any fences in the immediate failure
> case, e.g. when
>
>  111   while (cur_as_bytes[offset] == comparand) {
>
> is false on the first iteration.  This seems like a bug.  Assuming it
> is, I'm not sure whether this should be dealt with as part of this
> changeset, or moved to a separate bug for this (pre-existing) issue.
> I think only ARM targets (and zero?) are lacking specialized cmpxchg
> on bytes and so use this version?

I'll file a separate bug for that.

Thanks,
David
------


> Sorry I didn't notice this previously.
>
> ------------------------------------------------------------------------------
>
> Other than that and the already mentioned (pre-existing) Atomic::add
> for jlong return value problem, this looks good.
>

From david.holmes at oracle.com  Thu May 26 01:29:58 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 26 May 2016 11:29:58 +1000
Subject: RFR(M): 8155949: Support relaxed semantics in cmpxchg
In-Reply-To: <eaac651c-dc0c-6346-e158-b3a8e0549c79@oracle.com>
References: <201605101044.u4AAingb016922@d19av07.sagamino.japan.ibm.com>
	<8bc82016-81fd-b420-7d4e-1c31e615c218@oracle.com>
	<1f78facd-2872-2eda-23e2-fd5de0fd8c42@oracle.com>
	<201605101318.u4ADI0NW004604@d19av05.sagamino.japan.ibm.com>
	<7a0f80de7788484da6d93f03c3eddd19@DEWDFE13DE14.global.corp.sap>
	<C40ED48B-FDB2-493E-A93A-F6EC58DFE36F@oracle.com>
	<83338d53a921449fada28b7bf31e3665@DEWDFE13DE14.global.corp.sap>
	<EF619BDA-A9D1-46A6-ADD9-B5A0141A8E15@oracle.com>
	<267a624c-626f-4238-0166-baa14ff4b412@oracle.com>
	<bd436243ef584df3bb12ea2a0bf6a7a6@DEWDFE13DE14.global.corp.sap>
	<9cff0b75-e234-e789-910d-d86154bba834@oracle.com>
	<fd5eed458a8f4dedb8ea1bd42f822c4e@DEWDFE13DE14.global.corp.sap>
	<AA657CCB-A729-4558-92C9-3B22FE0753A7@oracle.com>
	<eaac651c-dc0c-6346-e158-b3a8e0549c79@oracle.com>
Message-ID: <17598d6f-6729-65c1-0af0-60ba93b4c003@oracle.com>

Filed: https://bugs.openjdk.java.net/browse/JDK-8157904

Atomic::cmpxchg_general for jbyte is missing a fence on initial failure

David

On 26/05/2016 11:09 AM, David Holmes wrote:
> Hi Kim,
>
> On 26/05/2016 10:04 AM, Kim Barrett wrote:
>>> On May 24, 2016, at 5:37 AM, Doerr, Martin <martin.doerr at sap.com> wrote:
>>>
>>> Hi David and Andrew,
>>>
>>> sorry for missing this one. There were too many emails.
>>>
>>> After moving the jint version as well, there was not much left of
>>> atomic.cpp.
>>> I think it doesn't make any sense to keep a couple of trivial
>>> functions in the cpp file.
>>> Therefore, I have removed atomic.cpp and moved the remaining small
>>> functions into the inline file.
>>>
>>> Webrev is here:
>>> http://cr.openjdk.java.net/~mdoerr/8155949_relaxed_cas/webrev.05/
>>
>> ------------------------------------------------------------------------------
>>
>>  100 inline jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte
>> *dest, jbyte comparand, cmpxchg_memory_order order)
>>
>> The addition of the order option makes it a bit more obvious that this
>> does not, and never has, executed any fences in the immediate failure
>> case, e.g. when
>>
>>  111   while (cur_as_bytes[offset] == comparand) {
>>
>> is false on the first iteration.  This seems like a bug.  Assuming it
>> is, I'm not sure whether this should be dealt with as part of this
>> changeset, or moved to a separate bug for this (pre-existing) issue.
>> I think only ARM targets (and zero?) are lacking specialized cmpxchg
>> on bytes and so use this version?
>
> I'll file a separate bug for that.
>
> Thanks,
> David
> ------
>
>
>> Sorry I didn't notice this previously.
>>
>> ------------------------------------------------------------------------------
>>
>>
>> Other than that and the already mentioned (pre-existing) Atomic::add
>> for jlong return value problem, this looks good.
>>

From HORIE at jp.ibm.com  Mon May 30 01:42:31 2016
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Mon, 30 May 2016 10:42:31 +0900
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <OF0DB404FA.2F5617F3-ON49257FBD.0052E126-49257FBD.0053CA88@LocalDomain>
References: <56FEDBB3.5030106@linux.vnet.ibm.com><CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com><57339EE1.2040500@linux.vnet.ibm.com>
	<da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
	<OFF20F9685.DD164547-ON49257FB5.001F4757-49257FB5.00206401@notes.na.collabserv.com>
	<573A034C.9060602@br.ibm.com>
	<OF0DB404FA.2F5617F3-ON49257FBD.0052E126-49257FBD.0053CA88@LocalDomain>
Message-ID: <201605300143.u4U1cXVI000652@mx0a-001b2d01.pphosted.com>


Dear Breno, Gustavo, Voker, and Martin,
I am a cowoker of Miki.

I implemented VSX disjoint arraycopy functions for byte, int, and long.
Although Miki had implemented VSX disjoint long arraycopy, we found a
couple of bugs so I fixed it. Would you please review them?

Micro benchmarks for byte and int are as follows. (The one for long is the
same as Miki's, which was attached before by Miki)
(See attached file: ArrayCopyTest_byte.java)(See attached file:
ArrayCopyTest_int.java)

Results are as follows. (For the short result, I used Gustavo's code.)
(See attached file: result_disjoint-arraycopy_vsx-max.jpg)

Patch for Java8:
(See attached file: hotspot_jdk8.diff)

Patch for Java9:
(See attached file: hotspot_jdk9.diff)

Best regards,
--
Michihiro Horie,
IBM Research - Tokyo


From:	Miki M Enoki/Japan/IBM
To:	Breno Leitao <brenohl at br.ibm.com>
Cc:	Gustavo Romero <gromero at linux.vnet.ibm.com>,
            "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>,
            "Doerr, Martin" <martin.doerr at sap.com>,
            "ppc-aix-port-dev at openjdk.java.net"
            <ppc-aix-port-dev at openjdk.java.net>, "Simonis, Volker"
            <volker.simonis at sap.com>, Volker Simonis
            <volker.simonis at gmail.com>
Date:	2016/05/25 00:15
Subject:	Re: PPC64 VSX load/store instructions in stubs


Hi Breno,

Thank you for your reply.

>The same mechanism could be used to copy arrays of short elements, as
Gustavo was
>working on. Do you agree?

I think the mechanism is different with type (byte, short, int, long...).
Gustavo will apply a pach with VSX for short array copy, so it would be
reasonable to use VSX instruction for long array copy, too.

My coworker is also creating byte and int arraycopy with VSX. He will post
an email to this mailing list.
I appreciate it if our patch for byte, int and long copy is applied to
OpenJDK.


Best regards,
Miki


From:	Breno Leitao <brenohl at br.ibm.com>
To:	Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin"
            <martin.doerr at sap.com>,
Cc:	Gustavo Romero <gromero at linux.vnet.ibm.com>, Volker Simonis
            <volker.simonis at gmail.com>, "Simonis, Volker"
            <volker.simonis at sap.com>, "ppc-aix-port-dev at openjdk.java.net"
            <ppc-aix-port-dev at openjdk.java.net>,
            "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>
Date:	2016/05/17 02:29
Subject:	Re: PPC64 VSX load/store instructions in stubs


Hi Miki,

On 05/16/2016 02:53 AM, Miki M Enoki wrote:
> I also implemented VSX disjoint long arraycopy.
> I appreciate it if it is applied to OpenJDK, too.

Thanks for the summarized information, this is helpful. Based on your plot,
I
understand we can split the whole scenario in two:

  * Array size smaller than 4k, and then use VSX instructions to perform
copy
  * Array size bigger than 4k, and then use VMX instructions to perform
copy

The same mechanism could be used to copy arrays of short elements, as
Gustavo was
working on. Do you agree?

That said, I understand that a new patch should be generated that
contemplates
both cases on a single patch, ready to be applied on OpenJDK 9 source code.
Hence
a webrev should be generated mapping to bug id
https://bugs.openjdk.java.net/browse/JDK-8154156

If you need any help on the webrev[1] creation and hosting, Gustavo might
help,
since he did this process already.

[1] http://openjdk.java.net/guide/webrevHelp.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160530/040c892a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160530/040c892a/graycol-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ArrayCopyTest_byte.java
Type: application/octet-stream
Size: 219239 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160530/040c892a/ArrayCopyTest_byte-0001.java>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ArrayCopyTest_int.java
Type: application/octet-stream
Size: 14652 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160530/040c892a/ArrayCopyTest_int-0001.java>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: result_disjoint-arraycopy_vsx-max.jpg
Type: image/jpeg
Size: 30481 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160530/040c892a/result_disjoint-arraycopy_vsx-max-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hotspot_jdk8.diff
Type: application/octet-stream
Size: 10689 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160530/040c892a/hotspot_jdk8-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hotspot_jdk9.diff
Type: application/octet-stream
Size: 9729 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160530/040c892a/hotspot_jdk9-0001.diff>

From sgehwolf at redhat.com  Mon May 30 08:55:50 2016
From: sgehwolf at redhat.com (Severin Gehwolf)
Date: Mon, 30 May 2016 10:55:50 +0200
Subject: RFR: JDK-8157336: Generation of classlists at build time should
	be configurable
In-Reply-To: <57456E94.9030200@oracle.com>
References: <574456F8.7000506@oracle.com>
	<097cad5f-c4a9-6f79-3ac3-717d512e81ba@oracle.com>
	<57456E94.9030200@oracle.com>
Message-ID: <1464598550.3804.14.camel@redhat.com>

cc'ing PPC folks for input.

On Wed, 2016-05-25 at 11:21 +0200, Erik Joelsson wrote:
> Thanks!
> 
> When building zero, the JVM_VARIANT is "zero" so this addresses that?
> problem automatically too. I have verified that.
> 
> There are some other peculiarities with zero in that it ends up in the?
> "server" directory so I understand that it's confusing.
> 
> /Erik
> 
> On 2016-05-24 22:35, David Holmes wrote:
> > 
> > Hi Erik,
> > 
> > On 24/05/2016 11:28 PM, Erik Joelsson wrote:
> > > 
> > > Generating a classlist at build time is not supported on all JVM
> > > configurations. This patch adds a configure flag to control this build
> > > step: --disable-generate-classlist. The default is to be enabled if
> > > either a client or server JVM Variant is being built.
> > > 
> > > Bug: https://bugs.openjdk.java.net/browse/JDK-8157336
> > > Webrev:?
> > > http://cr.openjdk.java.net/~erikj/8157336/webrev.top.01/index.html
> > This looks okay to me. It addresses the "minimal VM only" problem?
> > automatically which is good. I'm unclear if the Zero case is?
> > automatically handled as I'm not sure how the VM variants are?
> > expressed - but having the option is good enough I think.

The Zero case is handled as noted in another branch of this thread, but
I wonder if this works for "server" JVMs on PPC64? Is -Xshare:dump
working on JDK 9 and PPC64? AFAIR, in JDK 8 on PPC64 it was not
supported at some point.

Cheers,
Severin

From martin.doerr at sap.com  Mon May 30 09:56:25 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 30 May 2016 09:56:25 +0000
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com>
References: <56FEDBB3.5030106@linux.vnet.ibm.com><CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com><57339EE1.2040500@linux.vnet.ibm.com>
	<da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
	<OFF20F9685.DD164547-ON49257FB5.001F4757-49257FB5.00206401@notes.na.collabserv.com>
	<573A034C.9060602@br.ibm.com>
	<OF0DB404FA.2F5617F3-ON49257FBD.0052E126-49257FBD.0053CA88@LocalDomain>
	<201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com>
Message-ID: <4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap>

Hi Michihiro,

thanks for implementing the VSX versions.

Gustavo's change "8154156: PPC64: improve array copy stubs by using vector instructions" is pushed into hs-comp.
Your change needs to get adapted:

-          The vm_version and assembler parts are already there.

-          Vector-scalar load/store instructions use VectorSRegisters, now.

The byte and int version look good to me. I think the long version should be implemented in a similar way: check for has_vsx() is necessary, the length comparison should be done inside of the block.

Best regards,
Martin


From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
Sent: Montag, 30. Mai 2016 03:43
To: Miki M Enoki <ENOMIKI at jp.ibm.com>
Cc: Breno Leitao <brenohl at br.ibm.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; hotspot-dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>; Volker Simonis <volker.simonis at gmail.com>
Subject: Re: PPC64 VSX load/store instructions in stubs


Dear Breno, Gustavo, Voker, and Martin,
I am a cowoker of Miki.

I implemented VSX disjoint arraycopy functions for byte, int, and long. Although Miki had implemented VSX disjoint long arraycopy, we found a couple of bugs so I fixed it. Would you please review them?

Micro benchmarks for byte and int are as follows. (The one for long is the same as Miki's, which was attached before by Miki)
(See attached file: ArrayCopyTest_byte.java)(See attached file: ArrayCopyTest_int.java)

Results are as follows. (For the short result, I used Gustavo's code.)
(See attached file: result_disjoint-arraycopy_vsx-max.jpg)

Patch for Java8:
(See attached file: hotspot_jdk8.diff)

Patch for Java9:
(See attached file: hotspot_jdk9.diff)

Best regards,
--
Michihiro Horie,
IBM Research - Tokyo

[Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.

From: Miki M Enoki/Japan/IBM
To: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
Cc: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>, "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>, "ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>>, "Simonis, Volker" <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, Volker Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>
Date: 2016/05/25 00:15
Subject: Re: PPC64 VSX load/store instructions in stubs

________________________________


Hi Breno,

Thank you for your reply.

>The same mechanism could be used to copy arrays of short elements, as Gustavo was
>working on. Do you agree?

I think the mechanism is different with type (byte, short, int, long...).
Gustavo will apply a pach with VSX for short array copy, so it would be reasonable to use VSX instruction for long array copy, too.

My coworker is also creating byte and int arraycopy with VSX. He will post an email to this mailing list.
I appreciate it if our patch for byte, int and long copy is applied to OpenJDK.


Best regards,
Miki


[Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:

From: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>,
Cc: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>, Volker Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>, "Simonis, Volker" <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, "ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>>, "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>
Date: 2016/05/17 02:29
Subject: Re: PPC64 VSX load/store instructions in stubs
________________________________


Hi Miki,

On 05/16/2016 02:53 AM, Miki M Enoki wrote:
> I also implemented VSX disjoint long arraycopy.
> I appreciate it if it is applied to OpenJDK, too.

Thanks for the summarized information, this is helpful. Based on your plot, I
understand we can split the whole scenario in two:

 * Array size smaller than 4k, and then use VSX instructions to perform copy
 * Array size bigger than 4k, and then use VMX instructions to perform copy

The same mechanism could be used to copy arrays of short elements, as Gustavo was
working on. Do you agree?

That said, I understand that a new patch should be generated that contemplates
both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence
a webrev should be generated mapping to bug id
https://bugs.openjdk.java.net/browse/JDK-8154156

If you need any help on the webrev[1] creation and hosting, Gustavo might help,
since he did this process already.

[1] http://openjdk.java.net/guide/webrevHelp.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160530/d0fdf85b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 105 bytes
Desc: image001.gif
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160530/d0fdf85b/image001-0001.gif>

From gromero at linux.vnet.ibm.com  Tue May 31 01:31:10 2016
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Mon, 30 May 2016 22:31:10 -0300
Subject: SIGILL crashes JVM on PPC64 LE
In-Reply-To: <CA+3eh1364X=REW=SbPTQWB-cmZUhnxGmE8mWvVTN4S099NcSxA@mail.gmail.com>
References: <5733B30D.6010201@linux.vnet.ibm.com>
	<CA+3eh1235gSyijXZ-o+9D0Q+xjCKdk1OBm+2tKbPkCipH8P9EA@mail.gmail.com>
	<CA+3eh1364X=REW=SbPTQWB-cmZUhnxGmE8mWvVTN4S099NcSxA@mail.gmail.com>
Message-ID: <201605310131.u4V1T5Gm040249@mx0a-001b2d01.pphosted.com>

Hi Volker

The following test case has been isolated by Hiroshi Horii and generates
the illegal instruction, crashing the JVM on PPC64 LE:

UnalignedUnsafeAccess.java:
http://hastebin.com/raw/uqegukific

$ javac UnalignedUnsafeAccess.java
$ java -Xcomp -Xbatch UnalignedUnsafeAccess

The issue can be reproduced on OpenJDK 8 downstream, OpenJDK 8, and
OpenJDK 9 - hs_err logs:

OpenJDK 9, tag 0be6f4f5d186 jdk-9+120:
http://hastebin.com/raw/ecuhukutur

OpenJDK 8, tag 5aaa43d91c73 tip:
http://hastebin.com/raw/ipohoyafos

OpenJDK 8 downstream:

Ubuntu 16.04 LTS
build 1.8.0_91-8u91-b14-0ubuntu4~16.04.1-b14
http://hastebin.com/raw/yetizebofo

RHEL 7.2:
build 1.8.0_91-b14
http://hastebin.com/raw/irequfawaw

The crash happens when an illegal instruction - 0xea2f0013 - is executed.

The backtrace shows:

Stack: [0x00003fff56030000,0x00003fff56430000],  sp=0x00003fff5642b8d0,  free space=4078k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x162104]  loadI2LNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x194
V  [libjvm.so+0x8ece28]  Compile::fill_buffer(CodeBuffer*, unsigned int*)+0x4e8
V  [libjvm.so+0x368e08]  Compile::Code_Gen()+0x3c8
V  [libjvm.so+0x369e04]  Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool)+0xf64
V  [libjvm.so+0x271380]  C2Compiler::compile_method(ciEnv*, ciMethod*, int)+0x1f0
V  [libjvm.so+0x3785a4]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0xd54
V  [libjvm.so+0x379dc8]  CompileBroker::compiler_thread_loop()+0x488
V  [libjvm.so+0xa5de90]  compiler_thread_entry(JavaThread*, Thread*)+0x20
V  [libjvm.so+0xa690c8]  JavaThread::thread_main_inner()+0x178
V  [libjvm.so+0x8c8c10]  java_start(Thread*)+0x170
C  [libpthread.so.0+0x833c]  start_thread+0xfc
C  [libc.so.6+0x12b014]  clone+0xe4

loadI2LNode class is generated according to the following ADL code in
ppc.ad file:

instruct loadI2L(iRegLdst dst, memory mem) %{
  match(Set dst (ConvI2L (LoadI mem)));
  predicate(_kids[0]->_leaf->as_Load()->is_unordered());
  ins_cost(MEMORY_REF_COST);

  format %{ "LWA     $dst, $mem \t// loadI2L" %}
  size(4);
  ins_encode %{
    // TODO: PPC port $archOpcode(ppc64Opcode_lwa);
    int Idisp = $mem$$disp + frame_slots_bias($mem$$base, ra_);
    __ lwa($dst$$Register, Idisp, $mem$$base$$Register);
  %}
  ins_pipe(pipe_class_memory);
%}

So the generated illegal instruction comes from:
lwa 17,17,15  (DS-form: lwa RT, DS, RA)

As DS field must always be 4-byte aligned (i.e. DS field is always
concatenated with 0b00), 17 as DS (middle 17 value) is illegal,
generating the illegal instruction in question:

11101010000000000000000000000010: LWA
00000010001000000000000000000000: 17
00000000000000000000000000010001: 17
00000000000011110000000000000000: 15
--------------------------------
11101010001011110000000000010011: 0xEA2F0013 => Illegal instruction

The following change is proposed to fix the issue and deals with the
unaligned displacements:

OpenJDK 9 webrev:
81.de.7a9f.ip4.static.sl-reverse.com./illegal/9

OpenJDK 8 webrev:
81.de.7a9f.ip4.static.sl-reverse.com./illegal/8

Could we open a JIRA ticket regarding this issue in order to include it
in the webrev?

Thank you!

Best regards,
Gustavo

On 12-05-2016 09:39, Volker Simonis wrote:
> And I forgot to mention: I've checked and we don't emit vsel
> instructions in jdk8 on ppc. So it must be a coincidence that changing
> the endianess of the offending instruction yields a valid 'vsel'
> instruction.
> 
> 
> 
> On Thu, May 12, 2016 at 2:26 PM, Volker Simonis
> <volker.simonis at gmail.com> wrote:
>> Hi Gustavo,
>>
>> thanks for the bug report. The hs_err file you provided indicates that
>> this crash happened with Ubuntu's openjdk 8 version. Can you still
>> reproduce this with the the newest jdk9 builds?
>>
>> Also, I can see from the hs_err file that the crash happened in the C2
>> compiled method java.util.TimSort.countRunAndMakeAscending which
>> doesn't seem to be related to nio and unsafe.
>>
>> Ideally, you could post an easy test case to reproduce the problem. If
>> that's not possible, it would be helpful if you could post the output
>> of a failing run with
>> "-XX:CompileCommand=print,java.util.TimSort::countRunAndMakeAscending
>> -XX:CompileCommand=option,java.util.TimSort::countRunAndMakeAscending,PrintOptoAssembly".
>> In order to get the disassembly output for compiled methods you have
>> to build the hsdis library from hotspot/src/share/tools/hsdis (it has
>> a README with build instructions).
>>
>> Regards,
>> Volker
>>
>>
>> On Thu, May 12, 2016 at 12:32 AM, Gustavo Romero
>> <gromero at linux.vnet.ibm.com> wrote:
>>> Hi
>>>
>>> I'm getting a nasty SIGILL that crashes the JVM on PPC64 LE.
>>>
>>> hs_err log:
>>> http://hastebin.com/raw/fovagunaci
>>>
>>> The application employs methods from both java.nio.ByteBuffer and
>>> sun.misc.Unsafe classes in order to write and read from an allocated buffer.
>>>
>>> A interesting thing is that after debugging the instruction that caused the
>>> said SIGILL:
>>>
>>>    0x3fff902839a4:      cmpwi   cr6,r17,0
>>>    0x3fff902839a8:      beq     cr6,0x3fff90283ae4
>>>    0x3fff902839ac:      .long 0xea2f0013 <============ illegal instruction
>>>    0x3fff902839b0:      add     r15,r15,r17
>>>    0x3fff902839b4:      add     r14,r17,r14
>>>
>>> I found that when its endianness is changed it turns out to be a valid
>>> instruction: vsel v24,v0,v5,v31
>>>
>>> However, I'm still unable to determine if it's an application issue, something
>>> with JVM unsafe interface code, or something else.
>>>
>>> Any clue on how to narrow down this SIGILL?
>>>
>>> Thank you!
>>>
>>> Regards,
>>> Gustavo
>>>
> 


From gromero at linux.vnet.ibm.com  Tue May 31 01:49:34 2016
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Mon, 30 May 2016 22:49:34 -0300
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap>
References: <56FEDBB3.5030106@linux.vnet.ibm.com>
	<CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com>
	<57339EE1.2040500@linux.vnet.ibm.com>
	<da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
	<OFF20F9685.DD164547-ON49257FB5.001F4757-49257FB5.00206401@notes.na.collabserv.com>
	<573A034C.9060602@br.ibm.com>
	<OF0DB404FA.2F5617F3-ON49257FBD.0052E126-49257FBD.0053CA88@LocalDomain>
	<201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com>
	<4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap>
Message-ID: <201605310149.u4V1mmv5012147@mx0a-001b2d01.pphosted.com>

Hi Michihiro

Thanks a lot for providing a result summary for byte, short, int, and
long.

Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate
the ABI, so you can use them as Martin suggested.

Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi)
for byte, short, int, and long webrevs or open a new one?

Thank you.

Best regards,
Gustavo

On 30-05-2016 06:56, Doerr, Martin wrote:
> Hi Michihiro,
> 
> thanks for implementing the VSX versions.
> 
> Gustavo's change "8154156: PPC64: improve array copy stubs by using vector instructions" is pushed into hs-comp.
> Your change needs to get adapted:
> 
> -          The vm_version and assembler parts are already there.
> 
> -          Vector-scalar load/store instructions use VectorSRegisters, now.
> 
> The byte and int version look good to me. I think the long version should be implemented in a similar way: check for has_vsx() is necessary, the length comparison should be done inside of the block.
> 
> Best regards,
> Martin
> 
> 
> From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
> Sent: Montag, 30. Mai 2016 03:43
> To: Miki M Enoki <ENOMIKI at jp.ibm.com>
> Cc: Breno Leitao <brenohl at br.ibm.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; hotspot-dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>; Volker Simonis <volker.simonis at gmail.com>
> Subject: Re: PPC64 VSX load/store instructions in stubs
> 
> 
> Dear Breno, Gustavo, Voker, and Martin,
> I am a cowoker of Miki.
> 
> I implemented VSX disjoint arraycopy functions for byte, int, and long. Although Miki had implemented VSX disjoint long arraycopy, we found a couple of bugs so I fixed it. Would you please review them?
> 
> Micro benchmarks for byte and int are as follows. (The one for long is the same as Miki's, which was attached before by Miki)
> (See attached file: ArrayCopyTest_byte.java)(See attached file: ArrayCopyTest_int.java)
> 
> Results are as follows. (For the short result, I used Gustavo's code.)
> (See attached file: result_disjoint-arraycopy_vsx-max.jpg)
> 
> Patch for Java8:
> (See attached file: hotspot_jdk8.diff)
> 
> Patch for Java9:
> (See attached file: hotspot_jdk9.diff)
> 
> Best regards,
> --
> Michihiro Horie,
> IBM Research - Tokyo
> 
> [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.
> 
> From: Miki M Enoki/Japan/IBM
> To: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
> Cc: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>, "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>, "ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>>, "Simonis, Volker" <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, Volker Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>
> Date: 2016/05/25 00:15
> Subject: Re: PPC64 VSX load/store instructions in stubs
> 
> ________________________________
> 
> 
> Hi Breno,
> 
> Thank you for your reply.
> 
>> The same mechanism could be used to copy arrays of short elements, as Gustavo was
>> working on. Do you agree?
> 
> I think the mechanism is different with type (byte, short, int, long...).
> Gustavo will apply a pach with VSX for short array copy, so it would be reasonable to use VSX instruction for long array copy, too.
> 
> My coworker is also creating byte and int arraycopy with VSX. He will post an email to this mailing list.
> I appreciate it if our patch for byte, int and long copy is applied to OpenJDK.
> 
> 
> Best regards,
> Miki
> 
> 
> 
> 
> [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:
> 
> From: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
> To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>,
> Cc: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>, Volker Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>, "Simonis, Volker" <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, "ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>>, "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>
> Date: 2016/05/17 02:29
> Subject: Re: PPC64 VSX load/store instructions in stubs
> ________________________________
> 
> 
> 
> Hi Miki,
> 
> On 05/16/2016 02:53 AM, Miki M Enoki wrote:
>> I also implemented VSX disjoint long arraycopy.
>> I appreciate it if it is applied to OpenJDK, too.
> 
> Thanks for the summarized information, this is helpful. Based on your plot, I
> understand we can split the whole scenario in two:
> 
>  * Array size smaller than 4k, and then use VSX instructions to perform copy
>  * Array size bigger than 4k, and then use VMX instructions to perform copy
> 
> The same mechanism could be used to copy arrays of short elements, as Gustavo was
> working on. Do you agree?
> 
> That said, I understand that a new patch should be generated that contemplates
> both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence
> a webrev should be generated mapping to bug id
> https://bugs.openjdk.java.net/browse/JDK-8154156
> 
> If you need any help on the webrev[1] creation and hosting, Gustavo might help,
> since he did this process already.
> 
> [1] http://openjdk.java.net/guide/webrevHelp.html
> 
> 


From goetz.lindenmaier at sap.com  Tue May 31 07:24:58 2016
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 31 May 2016 07:24:58 +0000
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <201605310149.u4V1mmv5012147@mx0a-001b2d01.pphosted.com>
References: <56FEDBB3.5030106@linux.vnet.ibm.com>
	<CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com>
	<57339EE1.2040500@linux.vnet.ibm.com>
	<da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
	<OFF20F9685.DD164547-ON49257FB5.001F4757-49257FB5.00206401@notes.na.collabserv.com>
	<573A034C.9060602@br.ibm.com>
	<OF0DB404FA.2F5617F3-ON49257FBD.0052E126-49257FBD.0053CA88@LocalDomain>
	<201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com>
	<4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap>
	<201605310149.u4V1mmv5012147@mx0a-001b2d01.pphosted.com>
Message-ID: <da9cd044c3f04af2a03068a1f3a3d791@DEWDFE13DE09.global.corp.sap>

Hi Gustavo, 

you need a new bugId, as the change with the other one has been 
pushed by Martin.  You can't have the same bugId on two different
changes.
http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/f8f067457966

Best regards,
  Goetz.


> -----Original Message-----
> From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-
> bounces at openjdk.java.net] On Behalf Of Gustavo Romero
> Sent: Dienstag, 31. Mai 2016 03:50
> To: Doerr, Martin <martin.doerr at sap.com>; Michihiro Horie
> <HORIE at jp.ibm.com>; Miki M Enoki <ENOMIKI at jp.ibm.com>
> Cc: Simonis, Volker <volker.simonis at sap.com>; ppc-aix-port-
> dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Breno Leitao
> <brenohl at br.ibm.com>
> Subject: Re: PPC64 VSX load/store instructions in stubs
> 
> Hi Michihiro
> 
> Thanks a lot for providing a result summary for byte, short, int, and
> long.
> 
> Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate
> the ABI, so you can use them as Martin suggested.
> 
> Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi)
> for byte, short, int, and long webrevs or open a new one?
> 
> Thank you.
> 
> Best regards,
> Gustavo
> 
> On 30-05-2016 06:56, Doerr, Martin wrote:
> > Hi Michihiro,
> >
> > thanks for implementing the VSX versions.
> >
> > Gustavo's change "8154156: PPC64: improve array copy stubs by using
> vector instructions" is pushed into hs-comp.
> > Your change needs to get adapted:
> >
> > -          The vm_version and assembler parts are already there.
> >
> > -          Vector-scalar load/store instructions use VectorSRegisters, now.
> >
> > The byte and int version look good to me. I think the long version should be
> implemented in a similar way: check for has_vsx() is necessary, the length
> comparison should be done inside of the block.
> >
> > Best regards,
> > Martin
> >
> >
> > From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
> > Sent: Montag, 30. Mai 2016 03:43
> > To: Miki M Enoki <ENOMIKI at jp.ibm.com>
> > Cc: Breno Leitao <brenohl at br.ibm.com>; Gustavo Romero
> <gromero at linux.vnet.ibm.com>; hotspot-dev at openjdk.java.net; Doerr,
> Martin <martin.doerr at sap.com>; ppc-aix-port-dev at openjdk.java.net;
> Simonis, Volker <volker.simonis at sap.com>; Volker Simonis
> <volker.simonis at gmail.com>
> > Subject: Re: PPC64 VSX load/store instructions in stubs
> >
> >
> > Dear Breno, Gustavo, Voker, and Martin,
> > I am a cowoker of Miki.
> >
> > I implemented VSX disjoint arraycopy functions for byte, int, and long.
> Although Miki had implemented VSX disjoint long arraycopy, we found a
> couple of bugs so I fixed it. Would you please review them?
> >
> > Micro benchmarks for byte and int are as follows. (The one for long is the
> same as Miki's, which was attached before by Miki)
> > (See attached file: ArrayCopyTest_byte.java)(See attached file:
> ArrayCopyTest_int.java)
> >
> > Results are as follows. (For the short result, I used Gustavo's code.)
> > (See attached file: result_disjoint-arraycopy_vsx-max.jpg)
> >
> > Patch for Java8:
> > (See attached file: hotspot_jdk8.diff)
> >
> > Patch for Java9:
> > (See attached file: hotspot_jdk9.diff)
> >
> > Best regards,
> > --
> > Michihiro Horie,
> > IBM Research - Tokyo
> >
> > [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno,
> Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno,
> Thank you for your reply.
> >
> > From: Miki M Enoki/Japan/IBM
> > To: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
> > Cc: Gustavo Romero
> <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>,
> "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>"
> <hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>,
> "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>,
> "ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-
> dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-
> aix-port-dev at openjdk.java.net>>, "Simonis, Volker"
> <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, Volker
> Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>
> > Date: 2016/05/25 00:15
> > Subject: Re: PPC64 VSX load/store instructions in stubs
> >
> > ________________________________
> >
> >
> > Hi Breno,
> >
> > Thank you for your reply.
> >
> >> The same mechanism could be used to copy arrays of short elements, as
> Gustavo was
> >> working on. Do you agree?
> >
> > I think the mechanism is different with type (byte, short, int, long...).
> > Gustavo will apply a pach with VSX for short array copy, so it would be
> reasonable to use VSX instruction for long array copy, too.
> >
> > My coworker is also creating byte and int arraycopy with VSX. He will post
> an email to this mailing list.
> > I appreciate it if our patch for byte, int and long copy is applied to OpenJDK.
> >
> >
> > Best regards,
> > Miki
> >
> >
> >
> >
> > [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On
> 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17
> 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:
> >
> > From: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
> > To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin"
> <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>,
> > Cc: Gustavo Romero
> <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>,
> Volker Simonis
> <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>, "Simonis,
> Volker" <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, "ppc-
> aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-
> dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-
> aix-port-dev at openjdk.java.net>>, "hotspot-
> dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>" <hotspot-
> dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>
> > Date: 2016/05/17 02:29
> > Subject: Re: PPC64 VSX load/store instructions in stubs
> > ________________________________
> >
> >
> >
> > Hi Miki,
> >
> > On 05/16/2016 02:53 AM, Miki M Enoki wrote:
> >> I also implemented VSX disjoint long arraycopy.
> >> I appreciate it if it is applied to OpenJDK, too.
> >
> > Thanks for the summarized information, this is helpful. Based on your plot,
> I
> > understand we can split the whole scenario in two:
> >
> >  * Array size smaller than 4k, and then use VSX instructions to perform copy
> >  * Array size bigger than 4k, and then use VMX instructions to perform copy
> >
> > The same mechanism could be used to copy arrays of short elements, as
> Gustavo was
> > working on. Do you agree?
> >
> > That said, I understand that a new patch should be generated that
> contemplates
> > both cases on a single patch, ready to be applied on OpenJDK 9 source
> code. Hence
> > a webrev should be generated mapping to bug id
> > https://bugs.openjdk.java.net/browse/JDK-8154156
> >
> > If you need any help on the webrev[1] creation and hosting, Gustavo might
> help,
> > since he did this process already.
> >
> > [1] http://openjdk.java.net/guide/webrevHelp.html
> >
> >


From martin.doerr at sap.com  Tue May 31 10:17:28 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 31 May 2016 10:17:28 +0000
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <201605310149.u4V1mlAA012138@mx0a-001b2d01.pphosted.com>
References: <56FEDBB3.5030106@linux.vnet.ibm.com>
	<CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com>
	<57339EE1.2040500@linux.vnet.ibm.com>
	<da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
	<OFF20F9685.DD164547-ON49257FB5.001F4757-49257FB5.00206401@notes.na.collabserv.com>
	<573A034C.9060602@br.ibm.com>
	<OF0DB404FA.2F5617F3-ON49257FBD.0052E126-49257FBD.0053CA88@LocalDomain>
	<201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com>
	<4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap>
	<201605310149.u4V1mlAA012138@mx0a-001b2d01.pphosted.com>
Message-ID: <e9873a9c690c48bcbde6058db84aa88d@DEWDFE13DE14.global.corp.sap>

Hello everybody,

I have created a new bug: JDK-8158232

We will need a webrev and a request for review mail to hotspot-dev:
"RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions"

Thanks and best regards,
Martin

-----Original Message-----
From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] 
Sent: Dienstag, 31. Mai 2016 03:50
To: Doerr, Martin <martin.doerr at sap.com>; Michihiro Horie <HORIE at jp.ibm.com>; Miki M Enoki <ENOMIKI at jp.ibm.com>
Cc: Breno Leitao <brenohl at br.ibm.com>; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>; Volker Simonis <volker.simonis at gmail.com>; Breno Leitao <brenohl at br.ibm.com>
Subject: Re: PPC64 VSX load/store instructions in stubs

Hi Michihiro

Thanks a lot for providing a result summary for byte, short, int, and
long.

Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate
the ABI, so you can use them as Martin suggested.

Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi)
for byte, short, int, and long webrevs or open a new one?

Thank you.

Best regards,
Gustavo

On 30-05-2016 06:56, Doerr, Martin wrote:
> Hi Michihiro,
> 
> thanks for implementing the VSX versions.
> 
> Gustavo's change "8154156: PPC64: improve array copy stubs by using vector instructions" is pushed into hs-comp.
> Your change needs to get adapted:
> 
> -          The vm_version and assembler parts are already there.
> 
> -          Vector-scalar load/store instructions use VectorSRegisters, now.
> 
> The byte and int version look good to me. I think the long version should be implemented in a similar way: check for has_vsx() is necessary, the length comparison should be done inside of the block.
> 
> Best regards,
> Martin
> 
> 
> From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
> Sent: Montag, 30. Mai 2016 03:43
> To: Miki M Enoki <ENOMIKI at jp.ibm.com>
> Cc: Breno Leitao <brenohl at br.ibm.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; hotspot-dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>; Volker Simonis <volker.simonis at gmail.com>
> Subject: Re: PPC64 VSX load/store instructions in stubs
> 
> 
> Dear Breno, Gustavo, Voker, and Martin,
> I am a cowoker of Miki.
> 
> I implemented VSX disjoint arraycopy functions for byte, int, and long. Although Miki had implemented VSX disjoint long arraycopy, we found a couple of bugs so I fixed it. Would you please review them?
> 
> Micro benchmarks for byte and int are as follows. (The one for long is the same as Miki's, which was attached before by Miki)
> (See attached file: ArrayCopyTest_byte.java)(See attached file: ArrayCopyTest_int.java)
> 
> Results are as follows. (For the short result, I used Gustavo's code.)
> (See attached file: result_disjoint-arraycopy_vsx-max.jpg)
> 
> Patch for Java8:
> (See attached file: hotspot_jdk8.diff)
> 
> Patch for Java9:
> (See attached file: hotspot_jdk9.diff)
> 
> Best regards,
> --
> Michihiro Horie,
> IBM Research - Tokyo
> 
> [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.
> 
> From: Miki M Enoki/Japan/IBM
> To: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
> Cc: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>, "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>, "ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>>, "Simonis, Volker" <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, Volker Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>
> Date: 2016/05/25 00:15
> Subject: Re: PPC64 VSX load/store instructions in stubs
> 
> ________________________________
> 
> 
> Hi Breno,
> 
> Thank you for your reply.
> 
>> The same mechanism could be used to copy arrays of short elements, as Gustavo was
>> working on. Do you agree?
> 
> I think the mechanism is different with type (byte, short, int, long...).
> Gustavo will apply a pach with VSX for short array copy, so it would be reasonable to use VSX instruction for long array copy, too.
> 
> My coworker is also creating byte and int arraycopy with VSX. He will post an email to this mailing list.
> I appreciate it if our patch for byte, int and long copy is applied to OpenJDK.
> 
> 
> Best regards,
> Miki
> 
> 
> 
> 
> [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:
> 
> From: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
> To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>,
> Cc: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>, Volker Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>, "Simonis, Volker" <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, "ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>>, "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>
> Date: 2016/05/17 02:29
> Subject: Re: PPC64 VSX load/store instructions in stubs
> ________________________________
> 
> 
> 
> Hi Miki,
> 
> On 05/16/2016 02:53 AM, Miki M Enoki wrote:
>> I also implemented VSX disjoint long arraycopy.
>> I appreciate it if it is applied to OpenJDK, too.
> 
> Thanks for the summarized information, this is helpful. Based on your plot, I
> understand we can split the whole scenario in two:
> 
>  * Array size smaller than 4k, and then use VSX instructions to perform copy
>  * Array size bigger than 4k, and then use VMX instructions to perform copy
> 
> The same mechanism could be used to copy arrays of short elements, as Gustavo was
> working on. Do you agree?
> 
> That said, I understand that a new patch should be generated that contemplates
> both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence
> a webrev should be generated mapping to bug id
> https://bugs.openjdk.java.net/browse/JDK-8154156
> 
> If you need any help on the webrev[1] creation and hosting, Gustavo might help,
> since he did this process already.
> 
> [1] http://openjdk.java.net/guide/webrevHelp.html
> 
> 


From HORIE at jp.ibm.com  Tue May 31 11:50:37 2016
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Tue, 31 May 2016 20:50:37 +0900
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <e9873a9c690c48bcbde6058db84aa88d@DEWDFE13DE14.global.corp.sap>
References: <56FEDBB3.5030106@linux.vnet.ibm.com>
	<CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com>
	<57339EE1.2040500@linux.vnet.ibm.com>
	<da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
	<OFF20F9685.DD164547-ON49257FB5.001F4757-49257FB5.00206401@notes.na.collabserv.com>
	<573A034C.9060602@br.ibm.com>
	<OF0DB404FA.2F5617F3-ON49257FBD.0052E126-49257FBD.0053CA88@LocalDomain>
	<201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com>
	<4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap>
	<201605310149.u4V1mlAA012138@mx0a-001b2d01.pphosted.com>
	<e9873a9c690c48bcbde6058db84aa88d@DEWDFE13DE14.global.corp.sap>
Message-ID: <201605311151.u4VBnUY9039336@mx0a-001b2d01.pphosted.com>


Hi Martin, Gustavo,

Thank you very much for your comments. I used VectorSRegisters, inserted an
if-statement with has_vsx() in long arraycopy, and moved the length
comparison  inside the if-statement.

Diff from jdk9 hs-comp hotspot:
(See attached file: hotspot_jdk9_hscomp.diff)

Best regards,
--
Michihiro Horie,
IBM Research - Tokyo


From:	"Doerr, Martin" <martin.doerr at sap.com>
To:	Gustavo Romero <gromero at linux.vnet.ibm.com>, Michihiro
            Horie/Japan/IBM at IBMJP, Miki M Enoki/Japan/IBM at IBMJP
Cc:	Breno Leitao <brenohl at br.ibm.com>,
            "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>,
            "ppc-aix-port-dev at openjdk.java.net"
            <ppc-aix-port-dev at openjdk.java.net>, "Simonis, Volker"
            <volker.simonis at sap.com>, Volker Simonis
            <volker.simonis at gmail.com>, "Breno Leitao" <brenohl at br.ibm.com>
Date:	2016/05/31 19:18
Subject:	RE: PPC64 VSX load/store instructions in stubs


Hello everybody,

I have created a new bug: JDK-8158232

We will need a webrev and a request for review mail to hotspot-dev:
"RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by
using VSX instructions"

Thanks and best regards,
Martin

-----Original Message-----
From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com]
Sent: Dienstag, 31. Mai 2016 03:50
To: Doerr, Martin <martin.doerr at sap.com>; Michihiro Horie
<HORIE at jp.ibm.com>; Miki M Enoki <ENOMIKI at jp.ibm.com>
Cc: Breno Leitao <brenohl at br.ibm.com>; hotspot-dev at openjdk.java.net;
ppc-aix-port-dev at openjdk.java.net; Simonis, Volker
<volker.simonis at sap.com>; Volker Simonis <volker.simonis at gmail.com>; Breno
Leitao <brenohl at br.ibm.com>
Subject: Re: PPC64 VSX load/store instructions in stubs

Hi Michihiro

Thanks a lot for providing a result summary for byte, short, int, and
long.

Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate
the ABI, so you can use them as Martin suggested.

Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi)
for byte, short, int, and long webrevs or open a new one?

Thank you.

Best regards,
Gustavo

On 30-05-2016 06:56, Doerr, Martin wrote:
> Hi Michihiro,
>
> thanks for implementing the VSX versions.
>
> Gustavo's change "8154156: PPC64: improve array copy stubs by using
vector instructions" is pushed into hs-comp.
> Your change needs to get adapted:
>
> -          The vm_version and assembler parts are already there.
>
> -          Vector-scalar load/store instructions use VectorSRegisters,
now.
>
> The byte and int version look good to me. I think the long version should
be implemented in a similar way: check for has_vsx() is necessary, the
length comparison should be done inside of the block.
>
> Best regards,
> Martin
>
>
> From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
> Sent: Montag, 30. Mai 2016 03:43
> To: Miki M Enoki <ENOMIKI at jp.ibm.com>
> Cc: Breno Leitao <brenohl at br.ibm.com>; Gustavo Romero
<gromero at linux.vnet.ibm.com>; hotspot-dev at openjdk.java.net; Doerr, Martin
<martin.doerr at sap.com>; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker
<volker.simonis at sap.com>; Volker Simonis <volker.simonis at gmail.com>
> Subject: Re: PPC64 VSX load/store instructions in stubs
>
>
> Dear Breno, Gustavo, Voker, and Martin,
> I am a cowoker of Miki.
>
> I implemented VSX disjoint arraycopy functions for byte, int, and long.
Although Miki had implemented VSX disjoint long arraycopy, we found a
couple of bugs so I fixed it. Would you please review them?
>
> Micro benchmarks for byte and int are as follows. (The one for long is
the same as Miki's, which was attached before by Miki)
> (See attached file: ArrayCopyTest_byte.java)(See attached file:
ArrayCopyTest_int.java)
>
> Results are as follows. (For the short result, I used Gustavo's code.)
> (See attached file: result_disjoint-arraycopy_vsx-max.jpg)
>
> Patch for Java8:
> (See attached file: hotspot_jdk8.diff)
>
> Patch for Java9:
> (See attached file: hotspot_jdk9.diff)
>
> Best regards,
> --
> Michihiro Horie,
> IBM Research - Tokyo
>
> [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno,
Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno,
Thank you for your reply.
>
> From: Miki M Enoki/Japan/IBM
> To: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
> Cc: Gustavo Romero <gromero at linux.vnet.ibm.com<
mailto:gromero at linux.vnet.ibm.com>>, "hotspot-dev at openjdk.java.net<
mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<
mailto:hotspot-dev at openjdk.java.net>>, "Doerr, Martin"
<martin.doerr at sap.com<mailto:martin.doerr at sap.com>>,
"ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net
>" <ppc-aix-port-dev at openjdk.java.net<
mailto:ppc-aix-port-dev at openjdk.java.net>>, "Simonis, Volker"
<volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, Volker Simonis
<volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>
> Date: 2016/05/25 00:15
> Subject: Re: PPC64 VSX load/store instructions in stubs
>
> ________________________________
>
>
> Hi Breno,
>
> Thank you for your reply.
>
>> The same mechanism could be used to copy arrays of short elements, as
Gustavo was
>> working on. Do you agree?
>
> I think the mechanism is different with type (byte, short, int, long...).
> Gustavo will apply a pach with VSX for short array copy, so it would be
reasonable to use VSX instruction for long array copy, too.
>
> My coworker is also creating byte and int arraycopy with VSX. He will
post an email to this mailing list.
> I appreciate it if our patch for byte, int and long copy is applied to
OpenJDK.
>
>
> Best regards,
> Miki
>
>
>
>
> [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki,
On 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17
02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:
>
> From: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
> To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" <martin.doerr at sap.com<
mailto:martin.doerr at sap.com>>,
> Cc: Gustavo Romero <gromero at linux.vnet.ibm.com<
mailto:gromero at linux.vnet.ibm.com>>, Volker Simonis
<volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>, "Simonis,
Volker" <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>,
"ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net
>" <ppc-aix-port-dev at openjdk.java.net<
mailto:ppc-aix-port-dev at openjdk.java.net>>, "hotspot-dev at openjdk.java.net<
mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<
mailto:hotspot-dev at openjdk.java.net>>
> Date: 2016/05/17 02:29
> Subject: Re: PPC64 VSX load/store instructions in stubs
> ________________________________
>
>
>
> Hi Miki,
>
> On 05/16/2016 02:53 AM, Miki M Enoki wrote:
>> I also implemented VSX disjoint long arraycopy.
>> I appreciate it if it is applied to OpenJDK, too.
>
> Thanks for the summarized information, this is helpful. Based on your
plot, I
> understand we can split the whole scenario in two:
>
>  * Array size smaller than 4k, and then use VSX instructions to perform
copy
>  * Array size bigger than 4k, and then use VMX instructions to perform
copy
>
> The same mechanism could be used to copy arrays of short elements, as
Gustavo was
> working on. Do you agree?
>
> That said, I understand that a new patch should be generated that
contemplates
> both cases on a single patch, ready to be applied on OpenJDK 9 source
code. Hence
> a webrev should be generated mapping to bug id
> https://bugs.openjdk.java.net/browse/JDK-8154156
>
> If you need any help on the webrev[1] creation and hosting, Gustavo might
help,
> since he did this process already.
>
> [1] http://openjdk.java.net/guide/webrevHelp.html
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160531/798e66f1/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160531/798e66f1/graycol-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hotspot_jdk9_hscomp.diff
Type: application/octet-stream
Size: 5729 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160531/798e66f1/hotspot_jdk9_hscomp-0001.diff>

From gromero at linux.vnet.ibm.com  Tue May 31 12:53:44 2016
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Tue, 31 May 2016 09:53:44 -0300
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <da9cd044c3f04af2a03068a1f3a3d791@DEWDFE13DE09.global.corp.sap>
References: <56FEDBB3.5030106@linux.vnet.ibm.com>
	<CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com>
	<57339EE1.2040500@linux.vnet.ibm.com>
	<da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
	<OFF20F9685.DD164547-ON49257FB5.001F4757-49257FB5.00206401@notes.na.collabserv.com>
	<573A034C.9060602@br.ibm.com>
	<OF0DB404FA.2F5617F3-ON49257FBD.0052E126-49257FBD.0053CA88@LocalDomain>
	<201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com>
	<4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap>
	<201605310149.u4V1mmv5012147@mx0a-001b2d01.pphosted.com>
	<da9cd044c3f04af2a03068a1f3a3d791@DEWDFE13DE09.global.corp.sap>
Message-ID: <201605311253.u4VCnhtE023173@mx0a-001b2d01.pphosted.com>

Hi Goetz

Got it. Thanks for clarifying it.

Best regards,
Gustavo

On 31-05-2016 04:24, Lindenmaier, Goetz wrote:
> Hi Gustavo, 
> 
> you need a new bugId, as the change with the other one has been 
> pushed by Martin.  You can't have the same bugId on two different
> changes.
> http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/f8f067457966
> 
> Best regards,
>   Goetz.
> 
> 
> 
>> -----Original Message-----
>> From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-
>> bounces at openjdk.java.net] On Behalf Of Gustavo Romero
>> Sent: Dienstag, 31. Mai 2016 03:50
>> To: Doerr, Martin <martin.doerr at sap.com>; Michihiro Horie
>> <HORIE at jp.ibm.com>; Miki M Enoki <ENOMIKI at jp.ibm.com>
>> Cc: Simonis, Volker <volker.simonis at sap.com>; ppc-aix-port-
>> dev at openjdk.java.net; hotspot-dev at openjdk.java.net; Breno Leitao
>> <brenohl at br.ibm.com>
>> Subject: Re: PPC64 VSX load/store instructions in stubs
>>
>> Hi Michihiro
>>
>> Thanks a lot for providing a result summary for byte, short, int, and
>> long.
>>
>> Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate
>> the ABI, so you can use them as Martin suggested.
>>
>> Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi)
>> for byte, short, int, and long webrevs or open a new one?
>>
>> Thank you.
>>
>> Best regards,
>> Gustavo
>>
>> On 30-05-2016 06:56, Doerr, Martin wrote:
>>> Hi Michihiro,
>>>
>>> thanks for implementing the VSX versions.
>>>
>>> Gustavo's change "8154156: PPC64: improve array copy stubs by using
>> vector instructions" is pushed into hs-comp.
>>> Your change needs to get adapted:
>>>
>>> -          The vm_version and assembler parts are already there.
>>>
>>> -          Vector-scalar load/store instructions use VectorSRegisters, now.
>>>
>>> The byte and int version look good to me. I think the long version should be
>> implemented in a similar way: check for has_vsx() is necessary, the length
>> comparison should be done inside of the block.
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>> From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
>>> Sent: Montag, 30. Mai 2016 03:43
>>> To: Miki M Enoki <ENOMIKI at jp.ibm.com>
>>> Cc: Breno Leitao <brenohl at br.ibm.com>; Gustavo Romero
>> <gromero at linux.vnet.ibm.com>; hotspot-dev at openjdk.java.net; Doerr,
>> Martin <martin.doerr at sap.com>; ppc-aix-port-dev at openjdk.java.net;
>> Simonis, Volker <volker.simonis at sap.com>; Volker Simonis
>> <volker.simonis at gmail.com>
>>> Subject: Re: PPC64 VSX load/store instructions in stubs
>>>
>>>
>>> Dear Breno, Gustavo, Voker, and Martin,
>>> I am a cowoker of Miki.
>>>
>>> I implemented VSX disjoint arraycopy functions for byte, int, and long.
>> Although Miki had implemented VSX disjoint long arraycopy, we found a
>> couple of bugs so I fixed it. Would you please review them?
>>>
>>> Micro benchmarks for byte and int are as follows. (The one for long is the
>> same as Miki's, which was attached before by Miki)
>>> (See attached file: ArrayCopyTest_byte.java)(See attached file:
>> ArrayCopyTest_int.java)
>>>
>>> Results are as follows. (For the short result, I used Gustavo's code.)
>>> (See attached file: result_disjoint-arraycopy_vsx-max.jpg)
>>>
>>> Patch for Java8:
>>> (See attached file: hotspot_jdk8.diff)
>>>
>>> Patch for Java9:
>>> (See attached file: hotspot_jdk9.diff)
>>>
>>> Best regards,
>>> --
>>> Michihiro Horie,
>>> IBM Research - Tokyo
>>>
>>> [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno,
>> Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno,
>> Thank you for your reply.
>>>
>>> From: Miki M Enoki/Japan/IBM
>>> To: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
>>> Cc: Gustavo Romero
>> <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>,
>> "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>"
>> <hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>,
>> "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>,
>> "ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-
>> dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-
>> aix-port-dev at openjdk.java.net>>, "Simonis, Volker"
>> <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, Volker
>> Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>
>>> Date: 2016/05/25 00:15
>>> Subject: Re: PPC64 VSX load/store instructions in stubs
>>>
>>> ________________________________
>>>
>>>
>>> Hi Breno,
>>>
>>> Thank you for your reply.
>>>
>>>> The same mechanism could be used to copy arrays of short elements, as
>> Gustavo was
>>>> working on. Do you agree?
>>>
>>> I think the mechanism is different with type (byte, short, int, long...).
>>> Gustavo will apply a pach with VSX for short array copy, so it would be
>> reasonable to use VSX instruction for long array copy, too.
>>>
>>> My coworker is also creating byte and int arraycopy with VSX. He will post
>> an email to this mailing list.
>>> I appreciate it if our patch for byte, int and long copy is applied to OpenJDK.
>>>
>>>
>>> Best regards,
>>> Miki
>>>
>>>
>>>
>>>
>>> [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On
>> 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17
>> 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:
>>>
>>> From: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
>>> To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin"
>> <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>,
>>> Cc: Gustavo Romero
>> <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>,
>> Volker Simonis
>> <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>, "Simonis,
>> Volker" <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, "ppc-
>> aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-
>> dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-
>> aix-port-dev at openjdk.java.net>>, "hotspot-
>> dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>" <hotspot-
>> dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>
>>> Date: 2016/05/17 02:29
>>> Subject: Re: PPC64 VSX load/store instructions in stubs
>>> ________________________________
>>>
>>>
>>>
>>> Hi Miki,
>>>
>>> On 05/16/2016 02:53 AM, Miki M Enoki wrote:
>>>> I also implemented VSX disjoint long arraycopy.
>>>> I appreciate it if it is applied to OpenJDK, too.
>>>
>>> Thanks for the summarized information, this is helpful. Based on your plot,
>> I
>>> understand we can split the whole scenario in two:
>>>
>>>  * Array size smaller than 4k, and then use VSX instructions to perform copy
>>>  * Array size bigger than 4k, and then use VMX instructions to perform copy
>>>
>>> The same mechanism could be used to copy arrays of short elements, as
>> Gustavo was
>>> working on. Do you agree?
>>>
>>> That said, I understand that a new patch should be generated that
>> contemplates
>>> both cases on a single patch, ready to be applied on OpenJDK 9 source
>> code. Hence
>>> a webrev should be generated mapping to bug id
>>> https://bugs.openjdk.java.net/browse/JDK-8154156
>>>
>>> If you need any help on the webrev[1] creation and hosting, Gustavo might
>> help,
>>> since he did this process already.
>>>
>>> [1] http://openjdk.java.net/guide/webrevHelp.html
>>>
>>>
> 


From gromero at linux.vnet.ibm.com  Tue May 31 12:57:14 2016
From: gromero at linux.vnet.ibm.com (Gustavo Romero)
Date: Tue, 31 May 2016 09:57:14 -0300
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <e9873a9c690c48bcbde6058db84aa88d@DEWDFE13DE14.global.corp.sap>
References: <56FEDBB3.5030106@linux.vnet.ibm.com>
	<CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com>
	<57339EE1.2040500@linux.vnet.ibm.com>
	<da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
	<OFF20F9685.DD164547-ON49257FB5.001F4757-49257FB5.00206401@notes.na.collabserv.com>
	<573A034C.9060602@br.ibm.com>
	<OF0DB404FA.2F5617F3-ON49257FBD.0052E126-49257FBD.0053CA88@LocalDomain>
	<201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com>
	<4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap>
	<201605310149.u4V1mlAA012138@mx0a-001b2d01.pphosted.com>
	<e9873a9c690c48bcbde6058db84aa88d@DEWDFE13DE14.global.corp.sap>
Message-ID: <201605311257.u4VCsQHD036253@mx0a-001b2d01.pphosted.com>

Hi Martin!

Thanks for creating a new BugID.

Regards,
Gustavo

On 31-05-2016 07:17, Doerr, Martin wrote:
> Hello everybody,
> 
> I have created a new bug: JDK-8158232
> 
> We will need a webrev and a request for review mail to hotspot-dev:
> "RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions"
> 
> Thanks and best regards,
> Martin
> 
> -----Original Message-----
> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] 
> Sent: Dienstag, 31. Mai 2016 03:50
> To: Doerr, Martin <martin.doerr at sap.com>; Michihiro Horie <HORIE at jp.ibm.com>; Miki M Enoki <ENOMIKI at jp.ibm.com>
> Cc: Breno Leitao <brenohl at br.ibm.com>; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>; Volker Simonis <volker.simonis at gmail.com>; Breno Leitao <brenohl at br.ibm.com>
> Subject: Re: PPC64 VSX load/store instructions in stubs
> 
> Hi Michihiro
> 
> Thanks a lot for providing a result summary for byte, short, int, and
> long.
> 
> Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate
> the ABI, so you can use them as Martin suggested.
> 
> Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi)
> for byte, short, int, and long webrevs or open a new one?
> 
> Thank you.
> 
> Best regards,
> Gustavo
> 
> On 30-05-2016 06:56, Doerr, Martin wrote:
>> Hi Michihiro,
>>
>> thanks for implementing the VSX versions.
>>
>> Gustavo's change "8154156: PPC64: improve array copy stubs by using vector instructions" is pushed into hs-comp.
>> Your change needs to get adapted:
>>
>> -          The vm_version and assembler parts are already there.
>>
>> -          Vector-scalar load/store instructions use VectorSRegisters, now.
>>
>> The byte and int version look good to me. I think the long version should be implemented in a similar way: check for has_vsx() is necessary, the length comparison should be done inside of the block.
>>
>> Best regards,
>> Martin
>>
>>
>> From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
>> Sent: Montag, 30. Mai 2016 03:43
>> To: Miki M Enoki <ENOMIKI at jp.ibm.com>
>> Cc: Breno Leitao <brenohl at br.ibm.com>; Gustavo Romero <gromero at linux.vnet.ibm.com>; hotspot-dev at openjdk.java.net; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>; Volker Simonis <volker.simonis at gmail.com>
>> Subject: Re: PPC64 VSX load/store instructions in stubs
>>
>>
>> Dear Breno, Gustavo, Voker, and Martin,
>> I am a cowoker of Miki.
>>
>> I implemented VSX disjoint arraycopy functions for byte, int, and long. Although Miki had implemented VSX disjoint long arraycopy, we found a couple of bugs so I fixed it. Would you please review them?
>>
>> Micro benchmarks for byte and int are as follows. (The one for long is the same as Miki's, which was attached before by Miki)
>> (See attached file: ArrayCopyTest_byte.java)(See attached file: ArrayCopyTest_int.java)
>>
>> Results are as follows. (For the short result, I used Gustavo's code.)
>> (See attached file: result_disjoint-arraycopy_vsx-max.jpg)
>>
>> Patch for Java8:
>> (See attached file: hotspot_jdk8.diff)
>>
>> Patch for Java9:
>> (See attached file: hotspot_jdk9.diff)
>>
>> Best regards,
>> --
>> Michihiro Horie,
>> IBM Research - Tokyo
>>
>> [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.
>>
>> From: Miki M Enoki/Japan/IBM
>> To: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
>> Cc: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>, "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>, "ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>>, "Simonis, Volker" <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, Volker Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>
>> Date: 2016/05/25 00:15
>> Subject: Re: PPC64 VSX load/store instructions in stubs
>>
>> ________________________________
>>
>>
>> Hi Breno,
>>
>> Thank you for your reply.
>>
>>> The same mechanism could be used to copy arrays of short elements, as Gustavo was
>>> working on. Do you agree?
>>
>> I think the mechanism is different with type (byte, short, int, long...).
>> Gustavo will apply a pach with VSX for short array copy, so it would be reasonable to use VSX instruction for long array copy, too.
>>
>> My coworker is also creating byte and int arraycopy with VSX. He will post an email to this mailing list.
>> I appreciate it if our patch for byte, int and long copy is applied to OpenJDK.
>>
>>
>> Best regards,
>> Miki
>>
>>
>>
>>
>> [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:
>>
>> From: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
>> To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>,
>> Cc: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>, Volker Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>, "Simonis, Volker" <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, "ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>>, "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>
>> Date: 2016/05/17 02:29
>> Subject: Re: PPC64 VSX load/store instructions in stubs
>> ________________________________
>>
>>
>>
>> Hi Miki,
>>
>> On 05/16/2016 02:53 AM, Miki M Enoki wrote:
>>> I also implemented VSX disjoint long arraycopy.
>>> I appreciate it if it is applied to OpenJDK, too.
>>
>> Thanks for the summarized information, this is helpful. Based on your plot, I
>> understand we can split the whole scenario in two:
>>
>>  * Array size smaller than 4k, and then use VSX instructions to perform copy
>>  * Array size bigger than 4k, and then use VMX instructions to perform copy
>>
>> The same mechanism could be used to copy arrays of short elements, as Gustavo was
>> working on. Do you agree?
>>
>> That said, I understand that a new patch should be generated that contemplates
>> both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence
>> a webrev should be generated mapping to bug id
>> https://bugs.openjdk.java.net/browse/JDK-8154156
>>
>> If you need any help on the webrev[1] creation and hosting, Gustavo might help,
>> since he did this process already.
>>
>> [1] http://openjdk.java.net/guide/webrevHelp.html
>>
>>
> 


From martin.doerr at sap.com  Tue May 31 14:27:04 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 31 May 2016 14:27:04 +0000
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <201605311151.u4VBnUpB001747@mx0a-001b2d01.pphosted.com>
References: <56FEDBB3.5030106@linux.vnet.ibm.com>
	<CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com>
	<57339EE1.2040500@linux.vnet.ibm.com>
	<da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
	<OFF20F9685.DD164547-ON49257FB5.001F4757-49257FB5.00206401@notes.na.collabserv.com>
	<573A034C.9060602@br.ibm.com>
	<OF0DB404FA.2F5617F3-ON49257FBD.0052E126-49257FBD.0053CA88@LocalDomain>
	<201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com>
	<4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap>
	<201605310149.u4V1mlAA012138@mx0a-001b2d01.pphosted.com>
	<e9873a9c690c48bcbde6058db84aa88d@DEWDFE13DE14.global.corp.sap>
	<201605311151.u4VBnUpB001747@mx0a-001b2d01.pphosted.com>
Message-ID: <0a783f90ba244080b20723c4982dc37e@DEWDFE13DE14.global.corp.sap>

Hi Michihiro,

I have uploaded a webrev here:
http://cr.openjdk.java.net/~mdoerr/8158232_PPC_vsx_copy/webrev.00/

I had to change the formatting a little bit.
There was a
"src/cpu/ppc/vm/stubGenerator_ppc.cpp:1896: Trailing whitespace"
which is not allowed.

Please send out a request for review with the following subject and point to the webrev:
"RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions"

Best regards,
Martin

From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
Sent: Dienstag, 31. Mai 2016 13:51
To: Doerr, Martin <martin.doerr at sap.com>
Cc: Gustavo Romero <gromero at linux.vnet.ibm.com>; Miki M Enoki <ENOMIKI at jp.ibm.com>; Breno Leitao <brenohl at br.ibm.com>; hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com>; Volker Simonis <volker.simonis at gmail.com>; Breno Leitao <brenohl at br.ibm.com>
Subject: RE: PPC64 VSX load/store instructions in stubs


Hi Martin, Gustavo,

Thank you very much for your comments. I used VectorSRegisters, inserted an if-statement with has_vsx() in long arraycopy, and moved the length comparison inside the if-statement.

Diff from jdk9 hs-comp hotspot:
(See attached file: hotspot_jdk9_hscomp.diff)

Best regards,
--
Michihiro Horie,
IBM Research - Tokyo

[Inactive hide details for "Doerr, Martin" ---2016/05/31 19:18:19---Hello everybody, I have created a new bug: JDK-8158232]"Doerr, Martin" ---2016/05/31 19:18:19---Hello everybody, I have created a new bug: JDK-8158232

From: "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>
To: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>, Michihiro Horie/Japan/IBM at IBMJP, Miki M Enoki/Japan/IBM at IBMJP
Cc: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>, "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>, "ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>>, "Simonis, Volker" <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, Volker Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>, "Breno Leitao" <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
Date: 2016/05/31 19:18
Subject: RE: PPC64 VSX load/store instructions in stubs

________________________________


Hello everybody,

I have created a new bug: JDK-8158232

We will need a webrev and a request for review mail to hotspot-dev:
"RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by using VSX instructions"

Thanks and best regards,
Martin

-----Original Message-----
From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com]
Sent: Dienstag, 31. Mai 2016 03:50
To: Doerr, Martin <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>; Michihiro Horie <HORIE at jp.ibm.com<mailto:HORIE at jp.ibm.com>>; Miki M Enoki <ENOMIKI at jp.ibm.com<mailto:ENOMIKI at jp.ibm.com>>
Cc: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>; hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>; ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>; Simonis, Volker <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>; Volker Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>; Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
Subject: Re: PPC64 VSX load/store instructions in stubs

Hi Michihiro

Thanks a lot for providing a result summary for byte, short, int, and
long.

Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate
the ABI, so you can use them as Martin suggested.

Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi)
for byte, short, int, and long webrevs or open a new one?

Thank you.

Best regards,
Gustavo

On 30-05-2016 06:56, Doerr, Martin wrote:
> Hi Michihiro,
>
> thanks for implementing the VSX versions.
>
> Gustavo's change "8154156: PPC64: improve array copy stubs by using vector instructions" is pushed into hs-comp.
> Your change needs to get adapted:
>
> -          The vm_version and assembler parts are already there.
>
> -          Vector-scalar load/store instructions use VectorSRegisters, now.
>
> The byte and int version look good to me. I think the long version should be implemented in a similar way: check for has_vsx() is necessary, the length comparison should be done inside of the block.
>
> Best regards,
> Martin
>
>
> From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
> Sent: Montag, 30. Mai 2016 03:43
> To: Miki M Enoki <ENOMIKI at jp.ibm.com<mailto:ENOMIKI at jp.ibm.com>>
> Cc: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>; Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>; hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>; Doerr, Martin <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>; ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>; Simonis, Volker <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>; Volker Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>
> Subject: Re: PPC64 VSX load/store instructions in stubs
>
>
> Dear Breno, Gustavo, Voker, and Martin,
> I am a cowoker of Miki.
>
> I implemented VSX disjoint arraycopy functions for byte, int, and long. Although Miki had implemented VSX disjoint long arraycopy, we found a couple of bugs so I fixed it. Would you please review them?
>
> Micro benchmarks for byte and int are as follows. (The one for long is the same as Miki's, which was attached before by Miki)
> (See attached file: ArrayCopyTest_byte.java)(See attached file: ArrayCopyTest_int.java)
>
> Results are as follows. (For the short result, I used Gustavo's code.)
> (See attached file: result_disjoint-arraycopy_vsx-max.jpg)
>
> Patch for Java8:
> (See attached file: hotspot_jdk8.diff)
>
> Patch for Java9:
> (See attached file: hotspot_jdk9.diff)
>
> Best regards,
> --
> Michihiro Horie,
> IBM Research - Tokyo
>
> [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno, Thank you for your reply.
>
> From: Miki M Enoki/Japan/IBM
> To: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
> Cc: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>, "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>, "ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>>, "Simonis, Volker" <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, Volker Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>
> Date: 2016/05/25 00:15
> Subject: Re: PPC64 VSX load/store instructions in stubs
>
> ________________________________
>
>
> Hi Breno,
>
> Thank you for your reply.
>
>> The same mechanism could be used to copy arrays of short elements, as Gustavo was
>> working on. Do you agree?
>
> I think the mechanism is different with type (byte, short, int, long...).
> Gustavo will apply a pach with VSX for short array copy, so it would be reasonable to use VSX instruction for long array copy, too.
>
> My coworker is also creating byte and int arraycopy with VSX. He will post an email to this mailing list.
> I appreciate it if our patch for byte, int and long copy is applied to OpenJDK.
>
>
> Best regards,
> Miki
>
>
>
>
> [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17 02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:
>
> From: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
> To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>>,
> Cc: Gustavo Romero <gromero at linux.vnet.ibm.com<mailto:gromero at linux.vnet.ibm.com>>, Volker Simonis <volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>, "Simonis, Volker" <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, "ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>" <ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net>>, "hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<mailto:hotspot-dev at openjdk.java.net>>
> Date: 2016/05/17 02:29
> Subject: Re: PPC64 VSX load/store instructions in stubs
> ________________________________
>
>
>
> Hi Miki,
>
> On 05/16/2016 02:53 AM, Miki M Enoki wrote:
>> I also implemented VSX disjoint long arraycopy.
>> I appreciate it if it is applied to OpenJDK, too.
>
> Thanks for the summarized information, this is helpful. Based on your plot, I
> understand we can split the whole scenario in two:
>
>  * Array size smaller than 4k, and then use VSX instructions to perform copy
>  * Array size bigger than 4k, and then use VMX instructions to perform copy
>
> The same mechanism could be used to copy arrays of short elements, as Gustavo was
> working on. Do you agree?
>
> That said, I understand that a new patch should be generated that contemplates
> both cases on a single patch, ready to be applied on OpenJDK 9 source code. Hence
> a webrev should be generated mapping to bug id
> https://bugs.openjdk.java.net/browse/JDK-8154156
>
> If you need any help on the webrev[1] creation and hosting, Gustavo might help,
> since he did this process already.
>
> [1] http://openjdk.java.net/guide/webrevHelp.html
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160531/744f58ee/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 105 bytes
Desc: image001.gif
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160531/744f58ee/image001-0001.gif>

From HORIE at jp.ibm.com  Tue May 31 15:36:38 2016
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Wed, 1 Jun 2016 00:36:38 +0900
Subject: RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by
	using VSX instructions
Message-ID: <201605311537.u4VFYZWQ029737@mx0a-001b2d01.pphosted.com>


Dear all,

Could you please review the following webrev?

http://cr.openjdk.java.net/~mdoerr/8158232_PPC_vsx_copy/webrev.00/

This change improves performance of disjoint arraycopy of byte, int, and
long by using VSX load/store instructions.

Discussion started from:
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002483.html

Performance improvement with micro benchmarks is shown in:
http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/2016-May/002531.html

Thank you very much,

Best regards,
--
Michihiro Horie,
IBM Research - Tokyo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160601/e6114200/attachment.html>

From HORIE at jp.ibm.com  Tue May 31 15:46:35 2016
From: HORIE at jp.ibm.com (Michihiro Horie)
Date: Wed, 1 Jun 2016 00:46:35 +0900
Subject: PPC64 VSX load/store instructions in stubs
In-Reply-To: <0a783f90ba244080b20723c4982dc37e@DEWDFE13DE14.global.corp.sap>
References: <56FEDBB3.5030106@linux.vnet.ibm.com>
	<CA+3eh13AWXQ3cd6g3awUXrJK162SOsSJcLrEvsY6MtrOTcQubQ@mail.gmail.com>
	<57339EE1.2040500@linux.vnet.ibm.com>
	<da14acb523644849ab8aecbad821991c@DEWDFE13DE14.global.corp.sap>
	<OFF20F9685.DD164547-ON49257FB5.001F4757-49257FB5.00206401@notes.na.collabserv.com>
	<573A034C.9060602@br.ibm.com>
	<OF0DB404FA.2F5617F3-ON49257FBD.0052E126-49257FBD.0053CA88@LocalDomain>
	<201605300143.u4U1cXX8003600@mx0a-001b2d01.pphosted.com>
	<4a58b7d611db4b3c944f47eb03f5df24@DEWDFE13DE14.global.corp.sap>
	<201605310149.u4V1mlAA012138@mx0a-001b2d01.pphosted.com>
	<e9873a9c690c48bcbde6058db84aa88d@DEWDFE13DE14.global.corp.sap>
	<201605311151.u4VBnUpB001747@mx0a-001b2d01.pphosted.com>
	<0a783f90ba244080b20723c4982dc37e@DEWDFE13DE14.global.corp.sap>
Message-ID: <201605311547.u4VFdbTm041151@mx0a-001b2d01.pphosted.com>


Hi Martin,

Thank you for fixing my code and uploading webrev. I sent a request with
the given title.

Best regards,
--
Michihiro Horie,
IBM Research - Tokyo


From:	"Doerr, Martin" <martin.doerr at sap.com>
To:	Michihiro Horie/Japan/IBM at IBMJP
Cc:	Gustavo Romero <gromero at linux.vnet.ibm.com>, Miki M
            Enoki/Japan/IBM at IBMJP, Breno Leitao <brenohl at br.ibm.com>,
            "hotspot-dev at openjdk.java.net" <hotspot-dev at openjdk.java.net>,
            "ppc-aix-port-dev at openjdk.java.net"
            <ppc-aix-port-dev at openjdk.java.net>, "Simonis, Volker"
            <volker.simonis at sap.com>, Volker Simonis
            <volker.simonis at gmail.com>, Breno Leitao <brenohl at br.ibm.com>
Date:	2016/05/31 23:28
Subject:	RE: PPC64 VSX load/store instructions in stubs


Hi Michihiro,

I have uploaded a webrev here:
http://cr.openjdk.java.net/~mdoerr/8158232_PPC_vsx_copy/webrev.00/

I had to change the formatting a little bit.
There was a
?src/cpu/ppc/vm/stubGenerator_ppc.cpp:1896: Trailing whitespace?
which is not allowed.

Please send out a request for review with the following subject and point
to the webrev:
"RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by
using VSX instructions"

Best regards,
Martin

From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
Sent: Dienstag, 31. Mai 2016 13:51
To: Doerr, Martin <martin.doerr at sap.com>
Cc: Gustavo Romero <gromero at linux.vnet.ibm.com>; Miki M Enoki
<ENOMIKI at jp.ibm.com>; Breno Leitao <brenohl at br.ibm.com>;
hotspot-dev at openjdk.java.net; ppc-aix-port-dev at openjdk.java.net; Simonis,
Volker <volker.simonis at sap.com>; Volker Simonis <volker.simonis at gmail.com>;
Breno Leitao <brenohl at br.ibm.com>
Subject: RE: PPC64 VSX load/store instructions in stubs


Hi Martin, Gustavo,

Thank you very much for your comments. I used VectorSRegisters, inserted an
if-statement with has_vsx() in long arraycopy, and moved the length
comparison inside the if-statement.

Diff from jdk9 hs-comp hotspot:
(See attached file: hotspot_jdk9_hscomp.diff)

Best regards,
--
Michihiro Horie,
IBM Research - Tokyo

Inactive hide details for "Doerr, Martin" ---2016/05/31 19:18:19---Hello
everybody, I have created a new bug: JDK-8158232"Doerr, Martin"
---2016/05/31 19:18:19---Hello everybody, I have created a new bug:
JDK-8158232

From: "Doerr, Martin" <martin.doerr at sap.com>
To: Gustavo Romero <gromero at linux.vnet.ibm.com>, Michihiro
Horie/Japan/IBM at IBMJP, Miki M Enoki/Japan/IBM at IBMJP
Cc: Breno Leitao <brenohl at br.ibm.com>, "hotspot-dev at openjdk.java.net" <
hotspot-dev at openjdk.java.net>, "ppc-aix-port-dev at openjdk.java.net" <
ppc-aix-port-dev at openjdk.java.net>, "Simonis, Volker" <
volker.simonis at sap.com>, Volker Simonis <volker.simonis at gmail.com>, "Breno
Leitao" <brenohl at br.ibm.com>
Date: 2016/05/31 19:18
Subject: RE: PPC64 VSX load/store instructions in stubs


Hello everybody,

I have created a new bug: JDK-8158232

We will need a webrev and a request for review mail to hotspot-dev:
"RFR(M): 8158232: PPC64: improve byte, int and long array copy stubs by
using VSX instructions"

Thanks and best regards,
Martin

-----Original Message-----
From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com]
Sent: Dienstag, 31. Mai 2016 03:50
To: Doerr, Martin <martin.doerr at sap.com>; Michihiro Horie <HORIE at jp.ibm.com
>; Miki M Enoki <ENOMIKI at jp.ibm.com>
Cc: Breno Leitao <brenohl at br.ibm.com>; hotspot-dev at openjdk.java.net;
ppc-aix-port-dev at openjdk.java.net; Simonis, Volker <volker.simonis at sap.com
>; Volker Simonis <volker.simonis at gmail.com>; Breno Leitao <
brenohl at br.ibm.com>
Subject: Re: PPC64 VSX load/store instructions in stubs

Hi Michihiro

Thanks a lot for providing a result summary for byte, short, int, and
long.

Using VSR0, 1, 2, and 3 (instead of the VR registers) will not violate
the ABI, so you can use them as Martin suggested.

Martin, should we use the same BugID (8154156: https://goo.gl/z2eGLi)
for byte, short, int, and long webrevs or open a new one?

Thank you.

Best regards,
Gustavo

On 30-05-2016 06:56, Doerr, Martin wrote:
> Hi Michihiro,
>
> thanks for implementing the VSX versions.
>
> Gustavo's change "8154156: PPC64: improve array copy stubs by using
vector instructions" is pushed into hs-comp.
> Your change needs to get adapted:
>
> -          The vm_version and assembler parts are already there.
>
> -          Vector-scalar load/store instructions use VectorSRegisters,
now.
>
> The byte and int version look good to me. I think the long version should
be implemented in a similar way: check for has_vsx() is necessary, the
length comparison should be done inside of the block.
>
> Best regards,
> Martin
>
>
> From: Michihiro Horie [mailto:HORIE at jp.ibm.com]
> Sent: Montag, 30. Mai 2016 03:43
> To: Miki M Enoki <ENOMIKI at jp.ibm.com>
> Cc: Breno Leitao <brenohl at br.ibm.com>; Gustavo Romero <
gromero at linux.vnet.ibm.com>; hotspot-dev at openjdk.java.net; Doerr, Martin <
martin.doerr at sap.com>; ppc-aix-port-dev at openjdk.java.net; Simonis, Volker <
volker.simonis at sap.com>; Volker Simonis <volker.simonis at gmail.com>
> Subject: Re: PPC64 VSX load/store instructions in stubs
>
>
> Dear Breno, Gustavo, Voker, and Martin,
> I am a cowoker of Miki.
>
> I implemented VSX disjoint arraycopy functions for byte, int, and long.
Although Miki had implemented VSX disjoint long arraycopy, we found a
couple of bugs so I fixed it. Would you please review them?
>
> Micro benchmarks for byte and int are as follows. (The one for long is
the same as Miki's, which was attached before by Miki)
> (See attached file: ArrayCopyTest_byte.java)(See attached file:
ArrayCopyTest_int.java)
>
> Results are as follows. (For the short result, I used Gustavo's code.)
> (See attached file: result_disjoint-arraycopy_vsx-max.jpg)
>
> Patch for Java8:
> (See attached file: hotspot_jdk8.diff)
>
> Patch for Java9:
> (See attached file: hotspot_jdk9.diff)
>
> Best regards,
> --
> Michihiro Horie,
> IBM Research - Tokyo
>
> [Inactive hide details for Miki M Enoki---2016/05/25 00:15:19---Hi Breno,
Thank you for your reply.]Miki M Enoki---2016/05/25 00:15:19---Hi Breno,
Thank you for your reply.
>
> From: Miki M Enoki/Japan/IBM
> To: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
> Cc: Gustavo Romero <gromero at linux.vnet.ibm.com<
mailto:gromero at linux.vnet.ibm.com>>, "hotspot-dev at openjdk.java.net<
mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<
mailto:hotspot-dev at openjdk.java.net>>, "Doerr, Martin"
<martin.doerr at sap.com<mailto:martin.doerr at sap.com>>,
"ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net
>" <ppc-aix-port-dev at openjdk.java.net<
mailto:ppc-aix-port-dev at openjdk.java.net>>, "Simonis, Volker"
<volker.simonis at sap.com<mailto:volker.simonis at sap.com>>, Volker Simonis
<volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>
> Date: 2016/05/25 00:15
> Subject: Re: PPC64 VSX load/store instructions in stubs
>
> ________________________________
>
>
> Hi Breno,
>
> Thank you for your reply.
>
>> The same mechanism could be used to copy arrays of short elements, as
Gustavo was
>> working on. Do you agree?
>
> I think the mechanism is different with type (byte, short, int, long...).
> Gustavo will apply a pach with VSX for short array copy, so it would be
reasonable to use VSX instruction for long array copy, too.
>
> My coworker is also creating byte and int arraycopy with VSX. He will
post an email to this mailing list.
> I appreciate it if our patch for byte, int and long copy is applied to
OpenJDK.
>
>
> Best regards,
> Miki
>
>
>
>
> [Inactive hide details for Breno Leitao ---2016/05/17 02:29:32---Hi Miki,
On 05/16/2016 02:53 AM, Miki M Enoki wrote:]Breno Leitao ---2016/05/17
02:29:32---Hi Miki, On 05/16/2016 02:53 AM, Miki M Enoki wrote:
>
> From: Breno Leitao <brenohl at br.ibm.com<mailto:brenohl at br.ibm.com>>
> To: Miki M Enoki/Japan/IBM at IBMJP, "Doerr, Martin" <martin.doerr at sap.com<
mailto:martin.doerr at sap.com>>,
> Cc: Gustavo Romero <gromero at linux.vnet.ibm.com<
mailto:gromero at linux.vnet.ibm.com>>, Volker Simonis
<volker.simonis at gmail.com<mailto:volker.simonis at gmail.com>>, "Simonis,
Volker" <volker.simonis at sap.com<mailto:volker.simonis at sap.com>>,
"ppc-aix-port-dev at openjdk.java.net<mailto:ppc-aix-port-dev at openjdk.java.net
>" <ppc-aix-port-dev at openjdk.java.net<
mailto:ppc-aix-port-dev at openjdk.java.net>>, "hotspot-dev at openjdk.java.net<
mailto:hotspot-dev at openjdk.java.net>" <hotspot-dev at openjdk.java.net<
mailto:hotspot-dev at openjdk.java.net>>
> Date: 2016/05/17 02:29
> Subject: Re: PPC64 VSX load/store instructions in stubs
> ________________________________
>
>
>
> Hi Miki,
>
> On 05/16/2016 02:53 AM, Miki M Enoki wrote:
>> I also implemented VSX disjoint long arraycopy.
>> I appreciate it if it is applied to OpenJDK, too.
>
> Thanks for the summarized information, this is helpful. Based on your
plot, I
> understand we can split the whole scenario in two:
>
>  * Array size smaller than 4k, and then use VSX instructions to perform
copy
>  * Array size bigger than 4k, and then use VMX instructions to perform
copy
>
> The same mechanism could be used to copy arrays of short elements, as
Gustavo was
> working on. Do you agree?
>
> That said, I understand that a new patch should be generated that
contemplates
> both cases on a single patch, ready to be applied on OpenJDK 9 source
code. Hence
> a webrev should be generated mapping to bug id
> https://bugs.openjdk.java.net/browse/JDK-8154156
>
> If you need any help on the webrev[1] creation and hosting, Gustavo might
help,
> since he did this process already.
>
> [1] http://openjdk.java.net/guide/webrevHelp.html
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160601/5f397e63/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160601/5f397e63/graycol-0001.gif>