RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub
David Holmes
david.holmes at oracle.com
Mon May 2 23:04:24 UTC 2016
On 29/04/2016 1:44 AM, Mikael Vidstedt wrote:
>
>
> On 4/28/2016 5:41 AM, David Holmes wrote:
>> Hi Mikael,
>>
>> On 28/04/2016 1:54 AM, Mikael Vidstedt wrote:
>>>
>>>
>>> On 4/27/2016 12:24 AM, Thomas Stüfe wrote:
>>>> Hi Mikael,
>>>>
>>>> On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt
>>>> <<mailto:mikael.vidstedt at oracle.com>mikael.vidstedt at oracle.com> wrote:
>>>>
>>>>
>>>>
>>>> On 4/12/2016 2:15 AM, Thomas Stüfe wrote:
>>>>> Hi Mikael, David,
>>>>>
>>>>> On Tue, Apr 12, 2016 at 2:29 AM, David Holmes
>>>>> <<mailto:david.holmes at oracle.com>david.holmes at oracle.com> wrote:
>>>>>
>>>>> On 11/04/2016 10:57 AM, David Holmes wrote:
>>>>>
>>>>> Hi Mikael,
>>>>>
>>>>> I think we need to be able to answer the question as to
>>>>> why the stubbed
>>>>> and stubless forms of this code exist to ensure that
>>>>> converting all
>>>>> platforms to the same form is appropriate.
>>>>>
>>>>>
>>>>> The more I look at this the more the stubs make no sense :)
>>>>> AIII a stub is generated when we need runtime code that may
>>>>> be different to that which we could write directly for
>>>>> compiling at build time - ie to use CPU specific features of
>>>>> the actual CPU. But I see nothing here that suggests any such
>>>>> usage.
>>>>>
>>>>> So I agree with removing the stubs.
>>>>>
>>>>> I'm still going through this but my initial reaction is
>>>>> to wonder why we
>>>>> don't use the same form of handle_unsafe_access on all
>>>>> platforms and
>>>>> always pass in npc? (That seems to be the only difference
>>>>> in code that
>>>>> otherwise seems platform independent.)
>>>>>
>>>>>
>>>>> Futher to this and Thomas's comments I think
>>>>> handle_unsafe_access(thread, pc, npc) can be defined in
>>>>> shared code (where? not sure). Further, if we always pass in
>>>>> npc then we don't need to pass in pc as it is unused (seems
>>>>> unused in original code too for sparc).
>>>>>
>>>>>
>>>>> I agree. We commonized ucontext_set_pc for all Posix platforms,
>>>>> so we can make a common function "handle_unsafe_access(thread,
>>>>> npc)" and inside use os::Posix::ucontext_set_pc to modify the
>>>>> context. Then we can get rid of the special handling in the
>>>>> signal handlers inside os_aix_ppc.cpp and os_linux_ppc.cpp (for
>>>>> both the compiled and the interpreted case).
>>>>
>>>> There is definitely room for unification and simplification here.
>>>> Right now the signal handling code is, sadly, different on all the
>>>> different platforms, despite the fact that in many cases it should
>>>> be similar or the exact same. That said, as much as a
>>>> refactoring/rewrite of the signal handler code is needed, it will
>>>> very quickly turn into a much larger effort...
>>>>
>>>> In this specific case, it would probably make more sense to pass
>>>> in the full context to the handle_unsafe_access method and have it
>>>> do whatever it feels is necessary to update it. However, a lot of
>>>> the signal handler code assumes that a "stub" variable gets set up
>>>> and only at the end of the main signal handler function does the
>>>> actual context get updated. Changing how that works only for this
>>>> specific case is obviously not a good idea, which means it's back
>>>> to the full scale refactoring and out of scope for the bug fix.
>>>>
>>>> So to me the fact that the method prototypes differ depending on
>>>> the exact platform is just a reflection of how the contexts
>>>> differ. In lack of the full context the handler method needs to
>>>> take whatever parts of the context is needed to do it's job. I
>>>> could of course change the handler method to only take a single
>>>> "next_pc" argument, but to me that feels like putting a key part
>>>> of the logic which handles the unsafe access (specifically, the
>>>> part which calculates the next pc) in the wrong place - IMHO that
>>>> should really be tightly coupled with the rest of the logic needed
>>>> to handle an unsafe access (updating the thread state etc.), and
>>>> hence I feel that it really belongs in the handle_unsafe_access
>>>> method itself. Happy to hear your thoughts, but I hope we can
>>>> agree that the suggested fix, even in its current state, is still
>>>> significantly better than what is there now.
>>>>
>>>>
>>>> Unless somebody has a better suggestion, I'm going to be moving
>>>> the implementations of the handle_unsafe_access methods to
>>>> sharedRuntime (instead of stubRoutines) and will send out a new
>>>> webrev shortly.
>>>>
>>>>
>>>> I am unhappy with the fact that we factor unsafe handling out for x86
>>>> and sparc but do it inline for ppc. I know that was done before your
>>>> change as well but would be happy if that could be improved. I would
>>>> prefer either one of:
>>>
>>> Fully agree - this is an example of the more general problem of logic
>>> which is /almost/ the same across different platforms, but which has
>>> been effectively copy/pasted and drifted apart over time.
>>>
>>>>
>>>> 1) flatten out the coding into the signal handlers like it is done in
>>>> os_linux_ppc.cpp and os_aix_ppc.cpp or
>>>> 2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case
>>>>
>>>> I would actually prefer (1) even though this would multiply the code
>>>> out for all os cases into <os_cpu>; we are only talking about 1-2
>>>> lines of additional coding, and it would improve the readability of
>>>> the signal handlers.
>>>>
>>>> But this is only my personal opinion, and I do not have strong
>>>> emotions. I agree with you that a full cleanup of the signal coding is
>>>> out of scope for this issue.
>>>
>>> I spent yesterday going back and forth on the various alternatives and
>>> the only thing I can say with certainty now is that apart from
>>> refactoring the whole thing everything else is ugly... For example, I
>>> agree that consistency is an important goal here, but since there's
>>> little to no consistency there today it's really hard to make a relevant
>>> dent in it. :(
>>>
>>> Flattening it out is an alternative (and a good one), but that is not
>>> something I'm willing to do as part of this change because only
>>> flattening this specific case/return will actually add to the
>>> inconstency... So ultimately yesterday I chose to do something closer to
>>> your alternative 2). Is it still ugly? Yes; lipstick on pig and all of
>>> that. Have a look at it and see how you feel about it. I try to keep in
>>> mind that what is there today is (more) broken. :)
>>>
>>> Webrev:
>>>
>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02/hotspot/webrev/
>>>
>>
>> Now I see this in code form I really don't understand why next_pc is
>> passed in, unused and then returned ??
>
> Given that the full context can't easily be passed in and updated (which
> I still think is the right, long term way of doing this), I chose to do
> it this way instead. It is a signal to a caller that *only* calling
> handle_unsafe_access is not enough, there's more to it in that the
> context also needs to be updated. I see it as a way to make sure the
> actual next_pc calculation and update is not forgotten, and make it
> clear that it goes hand in hand with updating the thread state. It's
> obviously not perfect, but I do feel like it ever so slightly helps
> clarify how the access fault needs to be handled.
Okay.
Thanks,
David
-----
>>
>> Otherwise in src/share/vm/runtime/sharedRuntime.cpp in the comment
>> block - capitals after periods please :)
>
> Fixed, I'll not post a new webrev for it though :)
>
> Cheers,
> Mikael
>
>>
>> Stub removal seems fine.
>>
>> Thanks,
>> David
>>
>>> Incremental from webrev.01:
>>>
>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02.incr/hotspot/webrev/
>>>
>>>
>>> Cheers,
>>> Mikael
>>>
>>>>
>>>>
>>>> Cheers,
>>>> Mikael
>>>>
>>>>>
>>>>>
>>>>> BTW I found this comment somewhat unfathomable (both now and
>>>>> in original code):
>>>>>
>>>>> + // pc is the instruction which we must emulate
>>>>> + // doing a no-op is fine: return garbage from the load
>>>>>
>>>>> but finally realized that it means that after the load that
>>>>> raised the signal the native code proceeds normally but the
>>>>> value apparently loaded is just garbage/arbitrary, and the
>>>>> only sign something went wrong is the setting of the pending
>>>>> unsafe-access-error bit. This would be a potential source of
>>>>> bugs I think, except that when we hit the Java level, we
>>>>> throw the exception and so never actually "return" the
>>>>> garbage value. But it does mean we would have to be careful
>>>>> if calling the unsafe routines from native code.
>>>>>
>>>>>
>>>>> I admit I do not understand fully how the
>>>>> _special_runtime_exit_condition flag is processed later, at least
>>>>> not for all cases: If I have a java method A using
>>>>> sun.misc.unsafe, which gets compiled, the sun.misc.unsafe
>>>>> intrinsic gets inlined into that method. So, the whole method A
>>>>> gets marked as "has unsafe access"? So, any SIGBUS happening
>>>>> inside this method - which may be larger than the inlined
>>>>> sun.misc.unsafe call - will yield an InternalError? And when is
>>>>> the flag checked if that method A is called from another java
>>>>> method B?
>>>>>
>>>>> Sorry if the questions are stupid, I am not a JIT expert, but I
>>>>> try to understand how much can happen between the SIGBUS and the
>>>>> InternalError getting thrown.
>>>>
>>>> No questions are stupid here. As you may have seen in the other
>>>> thread, I filed JDK-8154592[1] to cover making the handling of the
>>>> faults synchronous. Hope that helps.
>>>>
>>>>
>>>> Thank you!
>>>>
>>>> Kind Regards, Thomas
>>>>
>>>> Cheers,
>>>> Mikael
>>>>
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8154592
>>>>
>>>>
>>>>>
>>>>> Thanks, Thomas
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 9/04/2016 8:33 AM, Mikael Vidstedt wrote:
>>>>>
>>>>>
>>>>> Please review:
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153892
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/
>>>>>
>>>>> <http://cr.openjdk.java.net/%7Emikael/webrevs/8153892/webrev.01/hotspot/webrev/>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> * Note: this is patch 2 in a set of 3 all aiming to
>>>>> clean up and unify
>>>>> the unsafe memory getters/setters, along with the
>>>>> handling of unsafe
>>>>> access errors. The other two issues are:
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8153890 -
>>>>> Handle unsafe access
>>>>> error as an asynchronous exception
>>>>> https://bugs.openjdk.java.net/browse/JDK-8150921 -
>>>>> Update Unsafe
>>>>> getters/setters to use double-register variants
>>>>>
>>>>>
>>>>> * Summary (copied from the bug description)
>>>>>
>>>>>
>>>>> In certain cases, such as accessing a region of a
>>>>> memory mapped file
>>>>> which has been truncated on unix-style operating
>>>>> systems, a SIGBUS
>>>>> signal will be raised and the VM will process it in
>>>>> the signal handler.
>>>>>
>>>>> How the signal is processed differs depending on the
>>>>> operating system
>>>>> and/or CPU architecture, with two major alternatives:
>>>>>
>>>>> * "stubless"
>>>>>
>>>>> Do the necessary thread state updates directly in the
>>>>> signal handler,
>>>>> and modify the context so that the signal handler
>>>>> returns to the place
>>>>> where the execution should continue
>>>>>
>>>>> * Using a stub
>>>>>
>>>>> Update the context so that when the signal handler
>>>>> returns the thread
>>>>> will continue execution in a generated stub, which in
>>>>> turn will call
>>>>> some native code in the VM to update the thread state
>>>>> and figure out
>>>>> where execution should continue. The stub will then
>>>>> jump to that new
>>>>> place.
>>>>>
>>>>>
>>>>> It should be noted that the work of updating the
>>>>> thread state is very
>>>>> small - it's setting a flag or two in the thread
>>>>> structure, and figures
>>>>> out where the next instruction starts. It should also
>>>>> be noted that the
>>>>> generated stubs today are broken, because they do not
>>>>> preserve all the
>>>>> live registers over the call into the VM. There are
>>>>> two ways to address
>>>>> this:
>>>>>
>>>>> * Preserve all the necessary registers
>>>>>
>>>>> This would mean implementing, in macro assembly, the
>>>>> necessary logic for
>>>>> preserving all the live registers, including, but not
>>>>> limited to,
>>>>> floating point registers, flag registers, etc. It
>>>>> quickly becomes
>>>>> obvious that this platform specific and error prone.
>>>>>
>>>>> * Leverage the fact that the operating system already
>>>>> does this as part
>>>>> of the signal handling
>>>>>
>>>>> Do the necessary work in the signal handler instead,
>>>>> removing the need
>>>>> for the stub alltogether
>>>>>
>>>>> As mentioned, on some platforms the latter model is
>>>>> already in use. It
>>>>> is dramatically easier and all platforms should be
>>>>> updated to do it the
>>>>> same way.
>>>>>
>>>>>
>>>>> * Testing
>>>>>
>>>>> Just as mentioned in the RFR for JDK-8153890, a new
>>>>> test was developed
>>>>> to test this code path:
>>>>>
>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java
>>>>>
>>>>> <http://cr.openjdk.java.net/%7Emikael/webrevs/8150921/MappedTruncated.java>
>>>>>
>>>>>
>>>>> In fact, it was when running this test I found the
>>>>> register preservation
>>>>> issue. JPRT also passes. Much like JDK-8153890 I
>>>>> wanted to get some
>>>>> feedback on this before running additional tests.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Mikael
>>>>>
>>>>>
>>>>
>>>>
>>>
>
More information about the ppc-aix-port-dev
mailing list