RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub

Wed Apr 27 15:54:48 UTC 2016

On 4/27/2016 12:24 AM, Thomas Stüfe wrote:
> Hi Mikael,
>
> On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt 
> <mikael.vidstedt at oracle.com <mailto:mikael.vidstedt at oracle.com>> wrote:
>
>
>
>     On 4/12/2016 2:15 AM, Thomas Stüfe wrote:
>>     Hi Mikael, David,
>>
>>     On Tue, Apr 12, 2016 at 2:29 AM, David Holmes
>>     <david.holmes at oracle.com <mailto:david.holmes at oracle.com>> wrote:
>>
>>         On 11/04/2016 10:57 AM, David Holmes wrote:
>>
>>             Hi Mikael,
>>
>>             I think we need to be able to answer the question as to
>>             why the stubbed
>>             and stubless forms of this code exist to ensure that
>>             converting all
>>             platforms to the same form is appropriate.
>>
>>
>>         The more I look at this the more the stubs make no sense :)
>>         AIII a stub is generated when we need runtime code that may
>>         be different to that which we could write directly for
>>         compiling at build time - ie to use CPU specific features of
>>         the actual CPU. But I see nothing here that suggests any such
>>         usage.
>>
>>         So I agree with removing the stubs.
>>
>>             I'm still going through this but my initial reaction is
>>             to wonder why we
>>             don't use the same form of handle_unsafe_access on all
>>             platforms and
>>             always pass in npc? (That seems to be the only difference
>>             in code that
>>             otherwise seems platform independent.)
>>
>>
>>         Futher to this and Thomas's comments I think
>>         handle_unsafe_access(thread, pc, npc) can be defined in
>>         shared code (where? not sure). Further, if we always pass in
>>         npc then we don't need to pass in pc as it is unused (seems
>>         unused in original code too for sparc).
>>
>>
>>     I agree. We commonized ucontext_set_pc for all Posix platforms,
>>     so we can make a common function "handle_unsafe_access(thread,
>>     npc)" and inside use os::Posix::ucontext_set_pc to modify the
>>     context. Then we can get rid of the special handling in the
>>     signal handlers inside os_aix_ppc.cpp and os_linux_ppc.cpp (for
>>     both the compiled and the interpreted case).
>
>     There is definitely room for unification and simplification here.
>     Right now the signal handling code is, sadly, different on all the
>     different platforms, despite the fact that in many cases it should
>     be similar or the exact same. That said, as much as a
>     refactoring/rewrite of the signal handler code is needed, it will
>     very quickly turn into a much larger effort...
>
>     In this specific case, it would probably make more sense to pass
>     in the full context to the handle_unsafe_access method and have it
>     do whatever it feels is necessary to update it. However, a lot of
>     the signal handler code assumes that a "stub" variable gets set up
>     and only at the end of the main signal handler function does the
>     actual context get updated. Changing how that works only for this
>     specific case is obviously not a good idea, which means it's back
>     to the full scale refactoring and out of scope for the bug fix.
>
>     So to me the fact that the method prototypes differ depending on
>     the exact platform is just a reflection of how the contexts
>     differ. In lack of the full context the handler method needs to
>     take whatever parts of the context is needed to do it's job. I
>     could of course change the handler method to only take a single
>     "next_pc" argument, but to me that feels like putting a key part
>     of the logic which handles the unsafe access (specifically, the
>     part which calculates the next pc) in the wrong place - IMHO that
>     should really be tightly coupled with the rest of the logic needed
>     to handle an unsafe access (updating the thread state etc.), and
>     hence I feel that it really belongs in the handle_unsafe_access
>     method itself. Happy to hear your thoughts, but I hope we can
>     agree that the suggested fix, even in its current state, is still
>     significantly better than what is there now.
>
>
>     Unless somebody has a better suggestion, I'm going to be moving
>     the implementations of the handle_unsafe_access methods to
>     sharedRuntime (instead of stubRoutines) and will send out a new
>     webrev shortly.
>
>
> I am unhappy with the fact that we factor unsafe handling out for x86 
> and sparc but do it inline for ppc. I know that was done before your 
> change as well but would be happy if that could be improved. I would 
> prefer either one of:

Fully agree - this is an example of the more general problem of logic 
which is /almost/ the same across different platforms, but which has 
been effectively copy/pasted and drifted apart over time.

>
> 1) flatten out the coding into the signal handlers like it is done in 
> os_linux_ppc.cpp and os_aix_ppc.cpp or
> 2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case
>
> I would actually prefer (1) even though this would multiply the code 
> out for all os cases into <os_cpu>; we are only talking about 1-2 
> lines of additional coding, and it would improve the readability of 
> the signal handlers.
>
> But this is only my personal opinion, and I do not have strong 
> emotions. I agree with you that a full cleanup of the signal coding is 
> out of scope for this issue.

I spent yesterday going back and forth on the various alternatives and 
the only thing I can say with certainty now is that apart from 
refactoring the whole thing everything else is ugly... For example, I 
agree that consistency is an important goal here, but since there's 
little to no consistency there today it's really hard to make a relevant 
dent in it. :(

Flattening it out is an alternative (and a good one), but that is not 
something I'm willing to do as part of this change because only 
flattening this specific case/return will actually add to the 
inconstency... So ultimately yesterday I chose to do something closer to 
your alternative 2). Is it still ugly? Yes; lipstick on pig and all of 
that. Have a look at it and see how you feel about it. I try to keep in 
mind that what is there today is (more) broken. :)

Webrev:

http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02/hotspot/webrev/

Incremental from webrev.01:

http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02.incr/hotspot/webrev/

Cheers,
Mikael

>
>
>     Cheers,
>     Mikael
>
>>
>>
>>         BTW I found this comment somewhat unfathomable (both now and
>>         in original code):
>>
>>         +   // pc is the instruction which we must emulate
>>         +   // doing a no-op is fine:  return garbage from the load
>>
>>         but finally realized that it means that after the load that
>>         raised the signal the native code proceeds normally but the
>>         value apparently loaded is just garbage/arbitrary, and the
>>         only sign something went wrong is the setting of the pending
>>         unsafe-access-error bit. This would be a potential source of
>>         bugs I think, except that when we hit the Java level, we
>>         throw the exception and so never actually "return" the
>>         garbage value. But it does mean we would have to be careful
>>         if calling the unsafe routines from native code.
>>
>>
>>     I admit I do not understand fully how the
>>     _special_runtime_exit_condition flag is processed later, at least
>>     not for all cases: If I have a java method A using
>>     sun.misc.unsafe, which gets compiled, the sun.misc.unsafe
>>     intrinsic gets inlined into that method. So, the whole method A
>>     gets marked as "has unsafe access"? So, any SIGBUS happening
>>     inside this method - which may be larger than the inlined
>>     sun.misc.unsafe call - will yield an InternalError? And when is
>>     the flag checked if that method A is called from another java
>>     method B?
>>
>>     Sorry if the questions are stupid, I am not a JIT expert, but I
>>     try to understand how much can happen between the SIGBUS and the
>>     InternalError getting thrown.
>
>     No questions are stupid here. As you may have seen in the other
>     thread, I filed JDK-8154592[1] to cover making the handling of the
>     faults synchronous. Hope that helps.
>
>
> Thank you!
>
> Kind Regards, Thomas
>
>     Cheers,
>     Mikael
>
>     [1] https://bugs.openjdk.java.net/browse/JDK-8154592
>
>
>>
>>     Thanks, Thomas
>>
>>         Thanks,
>>         David
>>
>>
>>             Thanks,
>>             David
>>
>>             On 9/04/2016 8:33 AM, Mikael Vidstedt wrote:
>>
>>
>>                 Please review:
>>
>>                 Bug: https://bugs.openjdk.java.net/browse/JDK-8153892
>>                 Webrev:
>>                 http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/
>>                 <http://cr.openjdk.java.net/%7Emikael/webrevs/8153892/webrev.01/hotspot/webrev/>
>>
>>
>>
>>                 * Note: this is patch 2 in a set of 3 all aiming to
>>                 clean up and unify
>>                 the unsafe memory getters/setters, along with the
>>                 handling of unsafe
>>                 access errors. The other two issues are:
>>
>>                 https://bugs.openjdk.java.net/browse/JDK-8153890 -
>>                 Handle unsafe access
>>                 error as an asynchronous exception
>>                 https://bugs.openjdk.java.net/browse/JDK-8150921 -
>>                 Update Unsafe
>>                 getters/setters to use double-register variants
>>
>>
>>                 * Summary (copied from the bug description)
>>
>>
>>                 In certain cases, such as accessing a region of a
>>                 memory mapped file
>>                 which has been truncated on unix-style operating
>>                 systems, a SIGBUS
>>                 signal will be raised and the VM will process it in
>>                 the signal handler.
>>
>>                 How the signal is processed differs depending on the
>>                 operating system
>>                 and/or CPU architecture, with two major alternatives:
>>
>>                 * "stubless"
>>
>>                 Do the necessary thread state updates directly in the
>>                 signal handler,
>>                 and modify the context so that the signal handler
>>                 returns to the place
>>                 where the execution should continue
>>
>>                 * Using a stub
>>
>>                 Update the context so that when the signal handler
>>                 returns the thread
>>                 will continue execution in a generated stub, which in
>>                 turn will call
>>                 some native code in the VM to update the thread state
>>                 and figure out
>>                 where execution should continue. The stub will then
>>                 jump to that new
>>                 place.
>>
>>
>>                 It should be noted that the work of updating the
>>                 thread state is very
>>                 small - it's setting a flag or two in the thread
>>                 structure, and figures
>>                 out where the next instruction starts. It should also
>>                 be noted that the
>>                 generated stubs today are broken, because they do not
>>                 preserve all the
>>                 live registers over the call into the VM. There are
>>                 two ways to address
>>                 this:
>>
>>                 * Preserve all the necessary registers
>>
>>                 This would mean implementing, in macro assembly, the
>>                 necessary logic for
>>                 preserving all the live registers, including, but not
>>                 limited to,
>>                 floating point registers, flag registers, etc. It
>>                 quickly becomes
>>                 obvious that this platform specific and error prone.
>>
>>                 * Leverage the fact that the operating system already
>>                 does this as part
>>                 of the signal handling
>>
>>                 Do the necessary work in the signal handler instead,
>>                 removing the need
>>                 for the stub alltogether
>>
>>                 As mentioned, on some platforms the latter model is
>>                 already in use. It
>>                 is dramatically easier and all platforms should be
>>                 updated to do it the
>>                 same way.
>>
>>
>>                 * Testing
>>
>>                 Just as mentioned in the RFR for JDK-8153890, a new
>>                 test was developed
>>                 to test this code path:
>>
>>                 http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java
>>                 <http://cr.openjdk.java.net/%7Emikael/webrevs/8150921/MappedTruncated.java>
>>
>>                 In fact, it was when running this test I found the
>>                 register preservation
>>                 issue. JPRT also passes. Much like JDK-8153890 I
>>                 wanted to get some
>>                 feedback on this before running additional tests.
>>
>>
>>                 Cheers,
>>                 Mikael
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160427/b9c41745/attachment-0001.html>