RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub
Mikael Vidstedt
mikael.vidstedt at oracle.com
Tue May 3 04:12:09 UTC 2016
On 5/2/2016 4:04 PM, David Holmes wrote:
> On 29/04/2016 1:44 AM, Mikael Vidstedt wrote:
>>
>>
>> On 4/28/2016 5:41 AM, David Holmes wrote:
>>> Hi Mikael,
>>>
>>> On 28/04/2016 1:54 AM, Mikael Vidstedt wrote:
>>>>
>>>>
>>>> On 4/27/2016 12:24 AM, Thomas Stüfe wrote:
>>>>> Hi Mikael,
>>>>>
>>>>> On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt
>>>>> <<mailto:mikael.vidstedt at oracle.com>mikael.vidstedt at oracle.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 4/12/2016 2:15 AM, Thomas Stüfe wrote:
>>>>>> Hi Mikael, David,
>>>>>>
>>>>>> On Tue, Apr 12, 2016 at 2:29 AM, David Holmes
>>>>>> <<mailto:david.holmes at oracle.com>david.holmes at oracle.com> wrote:
>>>>>>
>>>>>> On 11/04/2016 10:57 AM, David Holmes wrote:
>>>>>>
>>>>>> Hi Mikael,
>>>>>>
>>>>>> I think we need to be able to answer the question as to
>>>>>> why the stubbed
>>>>>> and stubless forms of this code exist to ensure that
>>>>>> converting all
>>>>>> platforms to the same form is appropriate.
>>>>>>
>>>>>>
>>>>>> The more I look at this the more the stubs make no sense :)
>>>>>> AIII a stub is generated when we need runtime code that may
>>>>>> be different to that which we could write directly for
>>>>>> compiling at build time - ie to use CPU specific features of
>>>>>> the actual CPU. But I see nothing here that suggests any
>>>>>> such
>>>>>> usage.
>>>>>>
>>>>>> So I agree with removing the stubs.
>>>>>>
>>>>>> I'm still going through this but my initial reaction is
>>>>>> to wonder why we
>>>>>> don't use the same form of handle_unsafe_access on all
>>>>>> platforms and
>>>>>> always pass in npc? (That seems to be the only
>>>>>> difference
>>>>>> in code that
>>>>>> otherwise seems platform independent.)
>>>>>>
>>>>>>
>>>>>> Futher to this and Thomas's comments I think
>>>>>> handle_unsafe_access(thread, pc, npc) can be defined in
>>>>>> shared code (where? not sure). Further, if we always pass in
>>>>>> npc then we don't need to pass in pc as it is unused (seems
>>>>>> unused in original code too for sparc).
>>>>>>
>>>>>>
>>>>>> I agree. We commonized ucontext_set_pc for all Posix platforms,
>>>>>> so we can make a common function "handle_unsafe_access(thread,
>>>>>> npc)" and inside use os::Posix::ucontext_set_pc to modify the
>>>>>> context. Then we can get rid of the special handling in the
>>>>>> signal handlers inside os_aix_ppc.cpp and os_linux_ppc.cpp (for
>>>>>> both the compiled and the interpreted case).
>>>>>
>>>>> There is definitely room for unification and simplification here.
>>>>> Right now the signal handling code is, sadly, different on all
>>>>> the
>>>>> different platforms, despite the fact that in many cases it
>>>>> should
>>>>> be similar or the exact same. That said, as much as a
>>>>> refactoring/rewrite of the signal handler code is needed, it will
>>>>> very quickly turn into a much larger effort...
>>>>>
>>>>> In this specific case, it would probably make more sense to pass
>>>>> in the full context to the handle_unsafe_access method and
>>>>> have it
>>>>> do whatever it feels is necessary to update it. However, a lot of
>>>>> the signal handler code assumes that a "stub" variable gets
>>>>> set up
>>>>> and only at the end of the main signal handler function does the
>>>>> actual context get updated. Changing how that works only for this
>>>>> specific case is obviously not a good idea, which means it's back
>>>>> to the full scale refactoring and out of scope for the bug fix.
>>>>>
>>>>> So to me the fact that the method prototypes differ depending on
>>>>> the exact platform is just a reflection of how the contexts
>>>>> differ. In lack of the full context the handler method needs to
>>>>> take whatever parts of the context is needed to do it's job. I
>>>>> could of course change the handler method to only take a single
>>>>> "next_pc" argument, but to me that feels like putting a key part
>>>>> of the logic which handles the unsafe access (specifically, the
>>>>> part which calculates the next pc) in the wrong place - IMHO that
>>>>> should really be tightly coupled with the rest of the logic
>>>>> needed
>>>>> to handle an unsafe access (updating the thread state etc.), and
>>>>> hence I feel that it really belongs in the handle_unsafe_access
>>>>> method itself. Happy to hear your thoughts, but I hope we can
>>>>> agree that the suggested fix, even in its current state, is still
>>>>> significantly better than what is there now.
>>>>>
>>>>>
>>>>> Unless somebody has a better suggestion, I'm going to be moving
>>>>> the implementations of the handle_unsafe_access methods to
>>>>> sharedRuntime (instead of stubRoutines) and will send out a new
>>>>> webrev shortly.
>>>>>
>>>>>
>>>>> I am unhappy with the fact that we factor unsafe handling out for x86
>>>>> and sparc but do it inline for ppc. I know that was done before your
>>>>> change as well but would be happy if that could be improved. I would
>>>>> prefer either one of:
>>>>
>>>> Fully agree - this is an example of the more general problem of logic
>>>> which is /almost/ the same across different platforms, but which has
>>>> been effectively copy/pasted and drifted apart over time.
>>>>
>>>>>
>>>>> 1) flatten out the coding into the signal handlers like it is done in
>>>>> os_linux_ppc.cpp and os_aix_ppc.cpp or
>>>>> 2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case
>>>>>
>>>>> I would actually prefer (1) even though this would multiply the code
>>>>> out for all os cases into <os_cpu>; we are only talking about 1-2
>>>>> lines of additional coding, and it would improve the readability of
>>>>> the signal handlers.
>>>>>
>>>>> But this is only my personal opinion, and I do not have strong
>>>>> emotions. I agree with you that a full cleanup of the signal
>>>>> coding is
>>>>> out of scope for this issue.
>>>>
>>>> I spent yesterday going back and forth on the various alternatives and
>>>> the only thing I can say with certainty now is that apart from
>>>> refactoring the whole thing everything else is ugly... For example, I
>>>> agree that consistency is an important goal here, but since there's
>>>> little to no consistency there today it's really hard to make a
>>>> relevant
>>>> dent in it. :(
>>>>
>>>> Flattening it out is an alternative (and a good one), but that is not
>>>> something I'm willing to do as part of this change because only
>>>> flattening this specific case/return will actually add to the
>>>> inconstency... So ultimately yesterday I chose to do something
>>>> closer to
>>>> your alternative 2). Is it still ugly? Yes; lipstick on pig and all of
>>>> that. Have a look at it and see how you feel about it. I try to
>>>> keep in
>>>> mind that what is there today is (more) broken. :)
>>>>
>>>> Webrev:
>>>>
>>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02/hotspot/webrev/
>>>>
>>>>
>>>
>>> Now I see this in code form I really don't understand why next_pc is
>>> passed in, unused and then returned ??
>>
>> Given that the full context can't easily be passed in and updated (which
>> I still think is the right, long term way of doing this), I chose to do
>> it this way instead. It is a signal to a caller that *only* calling
>> handle_unsafe_access is not enough, there's more to it in that the
>> context also needs to be updated. I see it as a way to make sure the
>> actual next_pc calculation and update is not forgotten, and make it
>> clear that it goes hand in hand with updating the thread state. It's
>> obviously not perfect, but I do feel like it ever so slightly helps
>> clarify how the access fault needs to be handled.
>
> Okay.
>
> Thanks,
> David
> -----
Thomas/David - thank you very much for the feedback and reviews!
Cheers,
Mikael
>
>
>>>
>>> Otherwise in src/share/vm/runtime/sharedRuntime.cpp in the comment
>>> block - capitals after periods please :)
>>
>> Fixed, I'll not post a new webrev for it though :)
>>
>> Cheers,
>> Mikael
>>
>>>
>>> Stub removal seems fine.
>>>
>>> Thanks,
>>> David
>>>
>>>> Incremental from webrev.01:
>>>>
>>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02.incr/hotspot/webrev/
>>>>
>>>>
>>>>
>>>> Cheers,
>>>> Mikael
>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Mikael
>>>>>
>>>>>>
>>>>>>
>>>>>> BTW I found this comment somewhat unfathomable (both now and
>>>>>> in original code):
>>>>>>
>>>>>> + // pc is the instruction which we must emulate
>>>>>> + // doing a no-op is fine: return garbage from the load
>>>>>>
>>>>>> but finally realized that it means that after the load that
>>>>>> raised the signal the native code proceeds normally but the
>>>>>> value apparently loaded is just garbage/arbitrary, and the
>>>>>> only sign something went wrong is the setting of the pending
>>>>>> unsafe-access-error bit. This would be a potential source of
>>>>>> bugs I think, except that when we hit the Java level, we
>>>>>> throw the exception and so never actually "return" the
>>>>>> garbage value. But it does mean we would have to be careful
>>>>>> if calling the unsafe routines from native code.
>>>>>>
>>>>>>
>>>>>> I admit I do not understand fully how the
>>>>>> _special_runtime_exit_condition flag is processed later, at
>>>>>> least
>>>>>> not for all cases: If I have a java method A using
>>>>>> sun.misc.unsafe, which gets compiled, the sun.misc.unsafe
>>>>>> intrinsic gets inlined into that method. So, the whole method A
>>>>>> gets marked as "has unsafe access"? So, any SIGBUS happening
>>>>>> inside this method - which may be larger than the inlined
>>>>>> sun.misc.unsafe call - will yield an InternalError? And when is
>>>>>> the flag checked if that method A is called from another java
>>>>>> method B?
>>>>>>
>>>>>> Sorry if the questions are stupid, I am not a JIT expert, but I
>>>>>> try to understand how much can happen between the SIGBUS and the
>>>>>> InternalError getting thrown.
>>>>>
>>>>> No questions are stupid here. As you may have seen in the other
>>>>> thread, I filed JDK-8154592[1] to cover making the handling of
>>>>> the
>>>>> faults synchronous. Hope that helps.
>>>>>
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Kind Regards, Thomas
>>>>>
>>>>> Cheers,
>>>>> Mikael
>>>>>
>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8154592
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks, Thomas
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 9/04/2016 8:33 AM, Mikael Vidstedt wrote:
>>>>>>
>>>>>>
>>>>>> Please review:
>>>>>>
>>>>>> Bug:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153892
>>>>>> Webrev:
>>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/
>>>>>>
>>>>>>
>>>>>> <http://cr.openjdk.java.net/%7Emikael/webrevs/8153892/webrev.01/hotspot/webrev/>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> * Note: this is patch 2 in a set of 3 all aiming to
>>>>>> clean up and unify
>>>>>> the unsafe memory getters/setters, along with the
>>>>>> handling of unsafe
>>>>>> access errors. The other two issues are:
>>>>>>
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8153890 -
>>>>>> Handle unsafe access
>>>>>> error as an asynchronous exception
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8150921 -
>>>>>> Update Unsafe
>>>>>> getters/setters to use double-register variants
>>>>>>
>>>>>>
>>>>>> * Summary (copied from the bug description)
>>>>>>
>>>>>>
>>>>>> In certain cases, such as accessing a region of a
>>>>>> memory mapped file
>>>>>> which has been truncated on unix-style operating
>>>>>> systems, a SIGBUS
>>>>>> signal will be raised and the VM will process it in
>>>>>> the signal handler.
>>>>>>
>>>>>> How the signal is processed differs depending on the
>>>>>> operating system
>>>>>> and/or CPU architecture, with two major
>>>>>> alternatives:
>>>>>>
>>>>>> * "stubless"
>>>>>>
>>>>>> Do the necessary thread state updates directly in
>>>>>> the
>>>>>> signal handler,
>>>>>> and modify the context so that the signal handler
>>>>>> returns to the place
>>>>>> where the execution should continue
>>>>>>
>>>>>> * Using a stub
>>>>>>
>>>>>> Update the context so that when the signal handler
>>>>>> returns the thread
>>>>>> will continue execution in a generated stub,
>>>>>> which in
>>>>>> turn will call
>>>>>> some native code in the VM to update the thread
>>>>>> state
>>>>>> and figure out
>>>>>> where execution should continue. The stub will then
>>>>>> jump to that new
>>>>>> place.
>>>>>>
>>>>>>
>>>>>> It should be noted that the work of updating the
>>>>>> thread state is very
>>>>>> small - it's setting a flag or two in the thread
>>>>>> structure, and figures
>>>>>> out where the next instruction starts. It should
>>>>>> also
>>>>>> be noted that the
>>>>>> generated stubs today are broken, because they do
>>>>>> not
>>>>>> preserve all the
>>>>>> live registers over the call into the VM. There are
>>>>>> two ways to address
>>>>>> this:
>>>>>>
>>>>>> * Preserve all the necessary registers
>>>>>>
>>>>>> This would mean implementing, in macro assembly, the
>>>>>> necessary logic for
>>>>>> preserving all the live registers, including, but
>>>>>> not
>>>>>> limited to,
>>>>>> floating point registers, flag registers, etc. It
>>>>>> quickly becomes
>>>>>> obvious that this platform specific and error prone.
>>>>>>
>>>>>> * Leverage the fact that the operating system
>>>>>> already
>>>>>> does this as part
>>>>>> of the signal handling
>>>>>>
>>>>>> Do the necessary work in the signal handler instead,
>>>>>> removing the need
>>>>>> for the stub alltogether
>>>>>>
>>>>>> As mentioned, on some platforms the latter model is
>>>>>> already in use. It
>>>>>> is dramatically easier and all platforms should be
>>>>>> updated to do it the
>>>>>> same way.
>>>>>>
>>>>>>
>>>>>> * Testing
>>>>>>
>>>>>> Just as mentioned in the RFR for JDK-8153890, a new
>>>>>> test was developed
>>>>>> to test this code path:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java
>>>>>>
>>>>>>
>>>>>> <http://cr.openjdk.java.net/%7Emikael/webrevs/8150921/MappedTruncated.java>
>>>>>>
>>>>>>
>>>>>>
>>>>>> In fact, it was when running this test I found the
>>>>>> register preservation
>>>>>> issue. JPRT also passes. Much like JDK-8153890 I
>>>>>> wanted to get some
>>>>>> feedback on this before running additional tests.
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Mikael
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>
More information about the ppc-aix-port-dev
mailing list