RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub

Thu Apr 28 12:25:53 UTC 2016

Hi Mikael,

On Wed, Apr 27, 2016 at 5:54 PM, Mikael Vidstedt <mikael.vidstedt at oracle.com
> wrote:

>
>
> On 4/27/2016 12:24 AM, Thomas Stüfe wrote:
>
> Hi Mikael,
>
> On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt <
> <mikael.vidstedt at oracle.com>mikael.vidstedt at oracle.com> wrote:
>
>>
>>
>> On 4/12/2016 2:15 AM, Thomas Stüfe wrote:
>>
>> Hi Mikael, David,
>>
>> On Tue, Apr 12, 2016 at 2:29 AM, David Holmes < <david.holmes at oracle.com>
>> david.holmes at oracle.com> wrote:
>>
>>> On 11/04/2016 10:57 AM, David Holmes wrote:
>>>
>>>> Hi Mikael,
>>>>
>>>> I think we need to be able to answer the question as to why the stubbed
>>>> and stubless forms of this code exist to ensure that converting all
>>>> platforms to the same form is appropriate.
>>>>
>>>
>>> The more I look at this the more the stubs make no sense :) AIII a stub
>>> is generated when we need runtime code that may be different to that which
>>> we could write directly for compiling at build time - ie to use CPU
>>> specific features of the actual CPU. But I see nothing here that suggests
>>> any such usage.
>>>
>>> So I agree with removing the stubs.
>>>
>>> I'm still going through this but my initial reaction is to wonder why we
>>>> don't use the same form of handle_unsafe_access on all platforms and
>>>> always pass in npc? (That seems to be the only difference in code that
>>>> otherwise seems platform independent.)
>>>>
>>>
>>> Futher to this and Thomas's comments I think
>>> handle_unsafe_access(thread, pc, npc) can be defined in shared code (where?
>>> not sure). Further, if we always pass in npc then we don't need to pass in
>>> pc as it is unused (seems unused in original code too for sparc).
>>>
>>>
>> I agree. We commonized ucontext_set_pc for all Posix platforms, so we can
>> make a common function "handle_unsafe_access(thread, npc)" and inside use
>> os::Posix::ucontext_set_pc to modify the context. Then we can get rid of
>> the special handling in the signal handlers inside os_aix_ppc.cpp and
>> os_linux_ppc.cpp (for both the compiled and the interpreted case).
>>
>>
>> There is definitely room for unification and simplification here. Right
>> now the signal handling code is, sadly, different on all the different
>> platforms, despite the fact that in many cases it should be similar or the
>> exact same. That said, as much as a refactoring/rewrite of the signal
>> handler code is needed, it will very quickly turn into a much larger
>> effort...
>>
>> In this specific case, it would probably make more sense to pass in the
>> full context to the handle_unsafe_access method and have it do whatever it
>> feels is necessary to update it. However, a lot of the signal handler code
>> assumes that a "stub" variable gets set up and only at the end of the main
>> signal handler function does the actual context get updated. Changing how
>> that works only for this specific case is obviously not a good idea, which
>> means it's back to the full scale refactoring and out of scope for the bug
>> fix.
>>
>> So to me the fact that the method prototypes differ depending on the
>> exact platform is just a reflection of how the contexts differ. In lack of
>> the full context the handler method needs to take whatever parts of the
>> context is needed to do it's job. I could of course change the handler
>> method to only take a single "next_pc" argument, but to me that feels like
>> putting a key part of the logic which handles the unsafe access
>> (specifically, the part which calculates the next pc) in the wrong place -
>> IMHO that should really be tightly coupled with the rest of the logic
>> needed to handle an unsafe access (updating the thread state etc.), and
>> hence I feel that it really belongs in the handle_unsafe_access method
>> itself. Happy to hear your thoughts, but I hope we can agree that the
>> suggested fix, even in its current state, is still significantly better
>> than what is there now.
>>
>>
>> Unless somebody has a better suggestion, I'm going to be moving the
>> implementations of the handle_unsafe_access methods to sharedRuntime
>> (instead of stubRoutines) and will send out a new webrev shortly.
>>
>>
> I am unhappy with the fact that we factor unsafe handling out for x86 and
> sparc but do it inline for ppc. I know that was done before your change as
> well but would be happy if that could be improved. I would prefer either
> one of:
>
>
> Fully agree - this is an example of the more general problem of logic
> which is /almost/ the same across different platforms, but which has been
> effectively copy/pasted and drifted apart over time.
>
>
> 1) flatten out the coding into the signal handlers like it is done in
> os_linux_ppc.cpp and os_aix_ppc.cpp or
> 2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case
>
> I would actually prefer (1) even though this would multiply the code out
> for all os cases into <os_cpu>; we are only talking about 1-2 lines of
> additional coding, and it would improve the readability of the signal
> handlers.
>
> But this is only my personal opinion, and I do not have strong emotions. I
> agree with you that a full cleanup of the signal coding is out of scope for
> this issue.
>
>
> I spent yesterday going back and forth on the various alternatives and the
> only thing I can say with certainty now is that apart from refactoring the
> whole thing everything else is ugly... For example, I agree that
> consistency is an important goal here, but since there's little to no
> consistency there today it's really hard to make a relevant dent in it. :(
>
> Flattening it out is an alternative (and a good one), but that is not
> something I'm willing to do as part of this change because only flattening
> this specific case/return will actually add to the inconstency... So
> ultimately yesterday I chose to do something closer to your alternative 2).
> Is it still ugly? Yes; lipstick on pig and all of that. Have a look at it
> and see how you feel about it. I try to keep in mind that what is there
> today is (more) broken. :)
>
> Webrev:
>
>
> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02/hotspot/webrev/
>
> Incremental from webrev.01:
>
>
> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02.incr/hotspot/webrev/
>
> Cheers,
> Mikael
>
>
I am fine with the changes as they are in webrev.02. Any further cleanup
can be deferred to a later change.

I did build your change on AIX, it did build ok. Thanks for taking my input!

Kind Regards, Thomas

>
>
>
>
>> Cheers,
>> Mikael
>>
>>
>>
>> BTW I found this comment somewhat unfathomable (both now and in original
>>> code):
>>>
>>> +   // pc is the instruction which we must emulate
>>> +   // doing a no-op is fine:  return garbage from the load
>>>
>>> but finally realized that it means that after the load that raised the
>>> signal the native code proceeds normally but the value apparently loaded is
>>> just garbage/arbitrary, and the only sign something went wrong is the
>>> setting of the pending unsafe-access-error bit. This would be a potential
>>> source of bugs I think, except that when we hit the Java level, we throw
>>> the exception and so never actually "return" the garbage value. But it does
>>> mean we would have to be careful if calling the unsafe routines from native
>>> code.
>>>
>>>
>> I admit I do not understand fully how the _special_runtime_exit_condition
>> flag is processed later, at least not for all cases: If I have a java
>> method A using sun.misc.unsafe, which gets compiled, the sun.misc.unsafe
>> intrinsic gets inlined into that method. So, the whole method A gets marked
>> as "has unsafe access"? So, any SIGBUS happening inside this method - which
>> may be larger than the inlined sun.misc.unsafe call - will yield an
>> InternalError? And when is the flag checked if that method A is called from
>> another java method B?
>>
>> Sorry if the questions are stupid, I am not a JIT expert, but I try to
>> understand how much can happen between the SIGBUS and the InternalError
>> getting thrown.
>>
>>
>> No questions are stupid here. As you may have seen in the other thread, I
>> filed JDK-8154592[1] to cover making the handling of the faults
>> synchronous. Hope that helps.
>>
>>
> Thank you!
>
> Kind Regards, Thomas
>
>
>
>> Cheers,
>> Mikael
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8154592
>>
>>
>>
>> Thanks, Thomas
>>
>>
>>> Thanks,
>>> David
>>>
>>>
>>> Thanks,
>>>> David
>>>>
>>>> On 9/04/2016 8:33 AM, Mikael Vidstedt wrote:
>>>>
>>>>>
>>>>> Please review:
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153892
>>>>> Webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/
>>>>>
>>>>>
>>>>>
>>>>> * Note: this is patch 2 in a set of 3 all aiming to clean up and unify
>>>>> the unsafe memory getters/setters, along with the handling of unsafe
>>>>> access errors. The other two issues are:
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8153890 - Handle unsafe
>>>>> access
>>>>> error as an asynchronous exception
>>>>> https://bugs.openjdk.java.net/browse/JDK-8150921 - Update Unsafe
>>>>> getters/setters to use double-register variants
>>>>>
>>>>>
>>>>> * Summary (copied from the bug description)
>>>>>
>>>>>
>>>>> In certain cases, such as accessing a region of a memory mapped file
>>>>> which has been truncated on unix-style operating systems, a SIGBUS
>>>>> signal will be raised and the VM will process it in the signal handler.
>>>>>
>>>>> How the signal is processed differs depending on the operating system
>>>>> and/or CPU architecture, with two major alternatives:
>>>>>
>>>>> * "stubless"
>>>>>
>>>>> Do the necessary thread state updates directly in the signal handler,
>>>>> and modify the context so that the signal handler returns to the place
>>>>> where the execution should continue
>>>>>
>>>>> * Using a stub
>>>>>
>>>>> Update the context so that when the signal handler returns the thread
>>>>> will continue execution in a generated stub, which in turn will call
>>>>> some native code in the VM to update the thread state and figure out
>>>>> where execution should continue. The stub will then jump to that new
>>>>> place.
>>>>>
>>>>>
>>>>> It should be noted that the work of updating the thread state is very
>>>>> small - it's setting a flag or two in the thread structure, and figures
>>>>> out where the next instruction starts. It should also be noted that the
>>>>> generated stubs today are broken, because they do not preserve all the
>>>>> live registers over the call into the VM. There are two ways to address
>>>>> this:
>>>>>
>>>>> * Preserve all the necessary registers
>>>>>
>>>>> This would mean implementing, in macro assembly, the necessary logic
>>>>> for
>>>>> preserving all the live registers, including, but not limited to,
>>>>> floating point registers, flag registers, etc. It quickly becomes
>>>>> obvious that this platform specific and error prone.
>>>>>
>>>>> * Leverage the fact that the operating system already does this as part
>>>>> of the signal handling
>>>>>
>>>>> Do the necessary work in the signal handler instead, removing the need
>>>>> for the stub alltogether
>>>>>
>>>>> As mentioned, on some platforms the latter model is already in use. It
>>>>> is dramatically easier and all platforms should be updated to do it the
>>>>> same way.
>>>>>
>>>>>
>>>>> * Testing
>>>>>
>>>>> Just as mentioned in the RFR for JDK-8153890, a new test was developed
>>>>> to test this code path:
>>>>>
>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java
>>>>>
>>>>> In fact, it was when running this test I found the register
>>>>> preservation
>>>>> issue. JPRT also passes. Much like JDK-8153890 I wanted to get some
>>>>> feedback on this before running additional tests.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Mikael
>>>>>
>>>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160428/3a90bc9b/attachment-0001.html>