RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub

Wed Apr 27 07:24:32 UTC 2016

Hi Mikael,

On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt <mikael.vidstedt at oracle.com
> wrote:

>
>
> On 4/12/2016 2:15 AM, Thomas Stüfe wrote:
>
> Hi Mikael, David,
>
> On Tue, Apr 12, 2016 at 2:29 AM, David Holmes <david.holmes at oracle.com>
> wrote:
>
>> On 11/04/2016 10:57 AM, David Holmes wrote:
>>
>>> Hi Mikael,
>>>
>>> I think we need to be able to answer the question as to why the stubbed
>>> and stubless forms of this code exist to ensure that converting all
>>> platforms to the same form is appropriate.
>>>
>>
>> The more I look at this the more the stubs make no sense :) AIII a stub
>> is generated when we need runtime code that may be different to that which
>> we could write directly for compiling at build time - ie to use CPU
>> specific features of the actual CPU. But I see nothing here that suggests
>> any such usage.
>>
>> So I agree with removing the stubs.
>>
>> I'm still going through this but my initial reaction is to wonder why we
>>> don't use the same form of handle_unsafe_access on all platforms and
>>> always pass in npc? (That seems to be the only difference in code that
>>> otherwise seems platform independent.)
>>>
>>
>> Futher to this and Thomas's comments I think handle_unsafe_access(thread,
>> pc, npc) can be defined in shared code (where? not sure). Further, if we
>> always pass in npc then we don't need to pass in pc as it is unused (seems
>> unused in original code too for sparc).
>>
>>
> I agree. We commonized ucontext_set_pc for all Posix platforms, so we can
> make a common function "handle_unsafe_access(thread, npc)" and inside use
> os::Posix::ucontext_set_pc to modify the context. Then we can get rid of
> the special handling in the signal handlers inside os_aix_ppc.cpp and
> os_linux_ppc.cpp (for both the compiled and the interpreted case).
>
>
> There is definitely room for unification and simplification here. Right
> now the signal handling code is, sadly, different on all the different
> platforms, despite the fact that in many cases it should be similar or the
> exact same. That said, as much as a refactoring/rewrite of the signal
> handler code is needed, it will very quickly turn into a much larger
> effort...
>
> In this specific case, it would probably make more sense to pass in the
> full context to the handle_unsafe_access method and have it do whatever it
> feels is necessary to update it. However, a lot of the signal handler code
> assumes that a "stub" variable gets set up and only at the end of the main
> signal handler function does the actual context get updated. Changing how
> that works only for this specific case is obviously not a good idea, which
> means it's back to the full scale refactoring and out of scope for the bug
> fix.
>
> So to me the fact that the method prototypes differ depending on the exact
> platform is just a reflection of how the contexts differ. In lack of the
> full context the handler method needs to take whatever parts of the context
> is needed to do it's job. I could of course change the handler method to
> only take a single "next_pc" argument, but to me that feels like putting a
> key part of the logic which handles the unsafe access (specifically, the
> part which calculates the next pc) in the wrong place - IMHO that should
> really be tightly coupled with the rest of the logic needed to handle an
> unsafe access (updating the thread state etc.), and hence I feel that it
> really belongs in the handle_unsafe_access method itself. Happy to hear
> your thoughts, but I hope we can agree that the suggested fix, even in its
> current state, is still significantly better than what is there now.
>
>
> Unless somebody has a better suggestion, I'm going to be moving the
> implementations of the handle_unsafe_access methods to sharedRuntime
> (instead of stubRoutines) and will send out a new webrev shortly.
>
>
I am unhappy with the fact that we factor unsafe handling out for x86 and
sparc but do it inline for ppc. I know that was done before your change as
well but would be happy if that could be improved. I would prefer either
one of:

1) flatten out the coding into the signal handlers like it is done in
os_linux_ppc.cpp and os_aix_ppc.cpp or
2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case

I would actually prefer (1) even though this would multiply the code out
for all os cases into <os_cpu>; we are only talking about 1-2 lines of
additional coding, and it would improve the readability of the signal
handlers.

But this is only my personal opinion, and I do not have strong emotions. I
agree with you that a full cleanup of the signal coding is out of scope for
this issue.

> Cheers,
> Mikael
>
>
>
> BTW I found this comment somewhat unfathomable (both now and in original
>> code):
>>
>> +   // pc is the instruction which we must emulate
>> +   // doing a no-op is fine:  return garbage from the load
>>
>> but finally realized that it means that after the load that raised the
>> signal the native code proceeds normally but the value apparently loaded is
>> just garbage/arbitrary, and the only sign something went wrong is the
>> setting of the pending unsafe-access-error bit. This would be a potential
>> source of bugs I think, except that when we hit the Java level, we throw
>> the exception and so never actually "return" the garbage value. But it does
>> mean we would have to be careful if calling the unsafe routines from native
>> code.
>>
>>
> I admit I do not understand fully how the _special_runtime_exit_condition
> flag is processed later, at least not for all cases: If I have a java
> method A using sun.misc.unsafe, which gets compiled, the sun.misc.unsafe
> intrinsic gets inlined into that method. So, the whole method A gets marked
> as "has unsafe access"? So, any SIGBUS happening inside this method - which
> may be larger than the inlined sun.misc.unsafe call - will yield an
> InternalError? And when is the flag checked if that method A is called from
> another java method B?
>
> Sorry if the questions are stupid, I am not a JIT expert, but I try to
> understand how much can happen between the SIGBUS and the InternalError
> getting thrown.
>
>
> No questions are stupid here. As you may have seen in the other thread, I
> filed JDK-8154592[1] to cover making the handling of the faults
> synchronous. Hope that helps.
>
>
Thank you!

Kind Regards, Thomas

> Cheers,
> Mikael
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8154592
>
>
>
> Thanks, Thomas
>
>
>> Thanks,
>> David
>>
>>
>> Thanks,
>>> David
>>>
>>> On 9/04/2016 8:33 AM, Mikael Vidstedt wrote:
>>>
>>>>
>>>> Please review:
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153892
>>>> Webrev:
>>>>
>>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/
>>>>
>>>>
>>>>
>>>> * Note: this is patch 2 in a set of 3 all aiming to clean up and unify
>>>> the unsafe memory getters/setters, along with the handling of unsafe
>>>> access errors. The other two issues are:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8153890 - Handle unsafe access
>>>> error as an asynchronous exception
>>>> https://bugs.openjdk.java.net/browse/JDK-8150921 - Update Unsafe
>>>> getters/setters to use double-register variants
>>>>
>>>>
>>>> * Summary (copied from the bug description)
>>>>
>>>>
>>>> In certain cases, such as accessing a region of a memory mapped file
>>>> which has been truncated on unix-style operating systems, a SIGBUS
>>>> signal will be raised and the VM will process it in the signal handler.
>>>>
>>>> How the signal is processed differs depending on the operating system
>>>> and/or CPU architecture, with two major alternatives:
>>>>
>>>> * "stubless"
>>>>
>>>> Do the necessary thread state updates directly in the signal handler,
>>>> and modify the context so that the signal handler returns to the place
>>>> where the execution should continue
>>>>
>>>> * Using a stub
>>>>
>>>> Update the context so that when the signal handler returns the thread
>>>> will continue execution in a generated stub, which in turn will call
>>>> some native code in the VM to update the thread state and figure out
>>>> where execution should continue. The stub will then jump to that new
>>>> place.
>>>>
>>>>
>>>> It should be noted that the work of updating the thread state is very
>>>> small - it's setting a flag or two in the thread structure, and figures
>>>> out where the next instruction starts. It should also be noted that the
>>>> generated stubs today are broken, because they do not preserve all the
>>>> live registers over the call into the VM. There are two ways to address
>>>> this:
>>>>
>>>> * Preserve all the necessary registers
>>>>
>>>> This would mean implementing, in macro assembly, the necessary logic for
>>>> preserving all the live registers, including, but not limited to,
>>>> floating point registers, flag registers, etc. It quickly becomes
>>>> obvious that this platform specific and error prone.
>>>>
>>>> * Leverage the fact that the operating system already does this as part
>>>> of the signal handling
>>>>
>>>> Do the necessary work in the signal handler instead, removing the need
>>>> for the stub alltogether
>>>>
>>>> As mentioned, on some platforms the latter model is already in use. It
>>>> is dramatically easier and all platforms should be updated to do it the
>>>> same way.
>>>>
>>>>
>>>> * Testing
>>>>
>>>> Just as mentioned in the RFR for JDK-8153890, a new test was developed
>>>> to test this code path:
>>>>
>>>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java
>>>>
>>>> In fact, it was when running this test I found the register preservation
>>>> issue. JPRT also passes. Much like JDK-8153890 I wanted to get some
>>>> feedback on this before running additional tests.
>>>>
>>>>
>>>> Cheers,
>>>> Mikael
>>>>
>>>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160427/1091c84e/attachment-0001.html>