RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub
Mikael Vidstedt
mikael.vidstedt at oracle.com
Wed Apr 27 15:54:48 UTC 2016
On 4/27/2016 12:24 AM, Thomas Stüfe wrote:
> Hi Mikael,
>
> On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt
> <mikael.vidstedt at oracle.com <mailto:mikael.vidstedt at oracle.com>> wrote:
>
>
>
> On 4/12/2016 2:15 AM, Thomas Stüfe wrote:
>> Hi Mikael, David,
>>
>> On Tue, Apr 12, 2016 at 2:29 AM, David Holmes
>> <david.holmes at oracle.com <mailto:david.holmes at oracle.com>> wrote:
>>
>> On 11/04/2016 10:57 AM, David Holmes wrote:
>>
>> Hi Mikael,
>>
>> I think we need to be able to answer the question as to
>> why the stubbed
>> and stubless forms of this code exist to ensure that
>> converting all
>> platforms to the same form is appropriate.
>>
>>
>> The more I look at this the more the stubs make no sense :)
>> AIII a stub is generated when we need runtime code that may
>> be different to that which we could write directly for
>> compiling at build time - ie to use CPU specific features of
>> the actual CPU. But I see nothing here that suggests any such
>> usage.
>>
>> So I agree with removing the stubs.
>>
>> I'm still going through this but my initial reaction is
>> to wonder why we
>> don't use the same form of handle_unsafe_access on all
>> platforms and
>> always pass in npc? (That seems to be the only difference
>> in code that
>> otherwise seems platform independent.)
>>
>>
>> Futher to this and Thomas's comments I think
>> handle_unsafe_access(thread, pc, npc) can be defined in
>> shared code (where? not sure). Further, if we always pass in
>> npc then we don't need to pass in pc as it is unused (seems
>> unused in original code too for sparc).
>>
>>
>> I agree. We commonized ucontext_set_pc for all Posix platforms,
>> so we can make a common function "handle_unsafe_access(thread,
>> npc)" and inside use os::Posix::ucontext_set_pc to modify the
>> context. Then we can get rid of the special handling in the
>> signal handlers inside os_aix_ppc.cpp and os_linux_ppc.cpp (for
>> both the compiled and the interpreted case).
>
> There is definitely room for unification and simplification here.
> Right now the signal handling code is, sadly, different on all the
> different platforms, despite the fact that in many cases it should
> be similar or the exact same. That said, as much as a
> refactoring/rewrite of the signal handler code is needed, it will
> very quickly turn into a much larger effort...
>
> In this specific case, it would probably make more sense to pass
> in the full context to the handle_unsafe_access method and have it
> do whatever it feels is necessary to update it. However, a lot of
> the signal handler code assumes that a "stub" variable gets set up
> and only at the end of the main signal handler function does the
> actual context get updated. Changing how that works only for this
> specific case is obviously not a good idea, which means it's back
> to the full scale refactoring and out of scope for the bug fix.
>
> So to me the fact that the method prototypes differ depending on
> the exact platform is just a reflection of how the contexts
> differ. In lack of the full context the handler method needs to
> take whatever parts of the context is needed to do it's job. I
> could of course change the handler method to only take a single
> "next_pc" argument, but to me that feels like putting a key part
> of the logic which handles the unsafe access (specifically, the
> part which calculates the next pc) in the wrong place - IMHO that
> should really be tightly coupled with the rest of the logic needed
> to handle an unsafe access (updating the thread state etc.), and
> hence I feel that it really belongs in the handle_unsafe_access
> method itself. Happy to hear your thoughts, but I hope we can
> agree that the suggested fix, even in its current state, is still
> significantly better than what is there now.
>
>
> Unless somebody has a better suggestion, I'm going to be moving
> the implementations of the handle_unsafe_access methods to
> sharedRuntime (instead of stubRoutines) and will send out a new
> webrev shortly.
>
>
> I am unhappy with the fact that we factor unsafe handling out for x86
> and sparc but do it inline for ppc. I know that was done before your
> change as well but would be happy if that could be improved. I would
> prefer either one of:
Fully agree - this is an example of the more general problem of logic
which is /almost/ the same across different platforms, but which has
been effectively copy/pasted and drifted apart over time.
>
> 1) flatten out the coding into the signal handlers like it is done in
> os_linux_ppc.cpp and os_aix_ppc.cpp or
> 2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case
>
> I would actually prefer (1) even though this would multiply the code
> out for all os cases into <os_cpu>; we are only talking about 1-2
> lines of additional coding, and it would improve the readability of
> the signal handlers.
>
> But this is only my personal opinion, and I do not have strong
> emotions. I agree with you that a full cleanup of the signal coding is
> out of scope for this issue.
I spent yesterday going back and forth on the various alternatives and
the only thing I can say with certainty now is that apart from
refactoring the whole thing everything else is ugly... For example, I
agree that consistency is an important goal here, but since there's
little to no consistency there today it's really hard to make a relevant
dent in it. :(
Flattening it out is an alternative (and a good one), but that is not
something I'm willing to do as part of this change because only
flattening this specific case/return will actually add to the
inconstency... So ultimately yesterday I chose to do something closer to
your alternative 2). Is it still ugly? Yes; lipstick on pig and all of
that. Have a look at it and see how you feel about it. I try to keep in
mind that what is there today is (more) broken. :)
Webrev:
http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02/hotspot/webrev/
Incremental from webrev.01:
http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02.incr/hotspot/webrev/
Cheers,
Mikael
>
>
> Cheers,
> Mikael
>
>>
>>
>> BTW I found this comment somewhat unfathomable (both now and
>> in original code):
>>
>> + // pc is the instruction which we must emulate
>> + // doing a no-op is fine: return garbage from the load
>>
>> but finally realized that it means that after the load that
>> raised the signal the native code proceeds normally but the
>> value apparently loaded is just garbage/arbitrary, and the
>> only sign something went wrong is the setting of the pending
>> unsafe-access-error bit. This would be a potential source of
>> bugs I think, except that when we hit the Java level, we
>> throw the exception and so never actually "return" the
>> garbage value. But it does mean we would have to be careful
>> if calling the unsafe routines from native code.
>>
>>
>> I admit I do not understand fully how the
>> _special_runtime_exit_condition flag is processed later, at least
>> not for all cases: If I have a java method A using
>> sun.misc.unsafe, which gets compiled, the sun.misc.unsafe
>> intrinsic gets inlined into that method. So, the whole method A
>> gets marked as "has unsafe access"? So, any SIGBUS happening
>> inside this method - which may be larger than the inlined
>> sun.misc.unsafe call - will yield an InternalError? And when is
>> the flag checked if that method A is called from another java
>> method B?
>>
>> Sorry if the questions are stupid, I am not a JIT expert, but I
>> try to understand how much can happen between the SIGBUS and the
>> InternalError getting thrown.
>
> No questions are stupid here. As you may have seen in the other
> thread, I filed JDK-8154592[1] to cover making the handling of the
> faults synchronous. Hope that helps.
>
>
> Thank you!
>
> Kind Regards, Thomas
>
> Cheers,
> Mikael
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8154592
>
>
>>
>> Thanks, Thomas
>>
>> Thanks,
>> David
>>
>>
>> Thanks,
>> David
>>
>> On 9/04/2016 8:33 AM, Mikael Vidstedt wrote:
>>
>>
>> Please review:
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153892
>> Webrev:
>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/
>> <http://cr.openjdk.java.net/%7Emikael/webrevs/8153892/webrev.01/hotspot/webrev/>
>>
>>
>>
>> * Note: this is patch 2 in a set of 3 all aiming to
>> clean up and unify
>> the unsafe memory getters/setters, along with the
>> handling of unsafe
>> access errors. The other two issues are:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8153890 -
>> Handle unsafe access
>> error as an asynchronous exception
>> https://bugs.openjdk.java.net/browse/JDK-8150921 -
>> Update Unsafe
>> getters/setters to use double-register variants
>>
>>
>> * Summary (copied from the bug description)
>>
>>
>> In certain cases, such as accessing a region of a
>> memory mapped file
>> which has been truncated on unix-style operating
>> systems, a SIGBUS
>> signal will be raised and the VM will process it in
>> the signal handler.
>>
>> How the signal is processed differs depending on the
>> operating system
>> and/or CPU architecture, with two major alternatives:
>>
>> * "stubless"
>>
>> Do the necessary thread state updates directly in the
>> signal handler,
>> and modify the context so that the signal handler
>> returns to the place
>> where the execution should continue
>>
>> * Using a stub
>>
>> Update the context so that when the signal handler
>> returns the thread
>> will continue execution in a generated stub, which in
>> turn will call
>> some native code in the VM to update the thread state
>> and figure out
>> where execution should continue. The stub will then
>> jump to that new
>> place.
>>
>>
>> It should be noted that the work of updating the
>> thread state is very
>> small - it's setting a flag or two in the thread
>> structure, and figures
>> out where the next instruction starts. It should also
>> be noted that the
>> generated stubs today are broken, because they do not
>> preserve all the
>> live registers over the call into the VM. There are
>> two ways to address
>> this:
>>
>> * Preserve all the necessary registers
>>
>> This would mean implementing, in macro assembly, the
>> necessary logic for
>> preserving all the live registers, including, but not
>> limited to,
>> floating point registers, flag registers, etc. It
>> quickly becomes
>> obvious that this platform specific and error prone.
>>
>> * Leverage the fact that the operating system already
>> does this as part
>> of the signal handling
>>
>> Do the necessary work in the signal handler instead,
>> removing the need
>> for the stub alltogether
>>
>> As mentioned, on some platforms the latter model is
>> already in use. It
>> is dramatically easier and all platforms should be
>> updated to do it the
>> same way.
>>
>>
>> * Testing
>>
>> Just as mentioned in the RFR for JDK-8153890, a new
>> test was developed
>> to test this code path:
>>
>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java
>> <http://cr.openjdk.java.net/%7Emikael/webrevs/8150921/MappedTruncated.java>
>>
>> In fact, it was when running this test I found the
>> register preservation
>> issue. JPRT also passes. Much like JDK-8153890 I
>> wanted to get some
>> feedback on this before running additional tests.
>>
>>
>> Cheers,
>> Mikael
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/ppc-aix-port-dev/attachments/20160427/b9c41745/attachment-0001.html>
More information about the ppc-aix-port-dev
mailing list