RFR(S): 8153892: Handle unsafe access error directly in signal handler instead of going through a stub
David Holmes
david.holmes at oracle.com
Thu Apr 28 12:41:57 UTC 2016
Hi Mikael,
On 28/04/2016 1:54 AM, Mikael Vidstedt wrote:
>
>
> On 4/27/2016 12:24 AM, Thomas Stüfe wrote:
>> Hi Mikael,
>>
>> On Tue, Apr 26, 2016 at 8:35 PM, Mikael Vidstedt
>> <<mailto:mikael.vidstedt at oracle.com>mikael.vidstedt at oracle.com> wrote:
>>
>>
>>
>> On 4/12/2016 2:15 AM, Thomas Stüfe wrote:
>>> Hi Mikael, David,
>>>
>>> On Tue, Apr 12, 2016 at 2:29 AM, David Holmes
>>> <<mailto:david.holmes at oracle.com>david.holmes at oracle.com> wrote:
>>>
>>> On 11/04/2016 10:57 AM, David Holmes wrote:
>>>
>>> Hi Mikael,
>>>
>>> I think we need to be able to answer the question as to
>>> why the stubbed
>>> and stubless forms of this code exist to ensure that
>>> converting all
>>> platforms to the same form is appropriate.
>>>
>>>
>>> The more I look at this the more the stubs make no sense :)
>>> AIII a stub is generated when we need runtime code that may
>>> be different to that which we could write directly for
>>> compiling at build time - ie to use CPU specific features of
>>> the actual CPU. But I see nothing here that suggests any such
>>> usage.
>>>
>>> So I agree with removing the stubs.
>>>
>>> I'm still going through this but my initial reaction is
>>> to wonder why we
>>> don't use the same form of handle_unsafe_access on all
>>> platforms and
>>> always pass in npc? (That seems to be the only difference
>>> in code that
>>> otherwise seems platform independent.)
>>>
>>>
>>> Futher to this and Thomas's comments I think
>>> handle_unsafe_access(thread, pc, npc) can be defined in
>>> shared code (where? not sure). Further, if we always pass in
>>> npc then we don't need to pass in pc as it is unused (seems
>>> unused in original code too for sparc).
>>>
>>>
>>> I agree. We commonized ucontext_set_pc for all Posix platforms,
>>> so we can make a common function "handle_unsafe_access(thread,
>>> npc)" and inside use os::Posix::ucontext_set_pc to modify the
>>> context. Then we can get rid of the special handling in the
>>> signal handlers inside os_aix_ppc.cpp and os_linux_ppc.cpp (for
>>> both the compiled and the interpreted case).
>>
>> There is definitely room for unification and simplification here.
>> Right now the signal handling code is, sadly, different on all the
>> different platforms, despite the fact that in many cases it should
>> be similar or the exact same. That said, as much as a
>> refactoring/rewrite of the signal handler code is needed, it will
>> very quickly turn into a much larger effort...
>>
>> In this specific case, it would probably make more sense to pass
>> in the full context to the handle_unsafe_access method and have it
>> do whatever it feels is necessary to update it. However, a lot of
>> the signal handler code assumes that a "stub" variable gets set up
>> and only at the end of the main signal handler function does the
>> actual context get updated. Changing how that works only for this
>> specific case is obviously not a good idea, which means it's back
>> to the full scale refactoring and out of scope for the bug fix.
>>
>> So to me the fact that the method prototypes differ depending on
>> the exact platform is just a reflection of how the contexts
>> differ. In lack of the full context the handler method needs to
>> take whatever parts of the context is needed to do it's job. I
>> could of course change the handler method to only take a single
>> "next_pc" argument, but to me that feels like putting a key part
>> of the logic which handles the unsafe access (specifically, the
>> part which calculates the next pc) in the wrong place - IMHO that
>> should really be tightly coupled with the rest of the logic needed
>> to handle an unsafe access (updating the thread state etc.), and
>> hence I feel that it really belongs in the handle_unsafe_access
>> method itself. Happy to hear your thoughts, but I hope we can
>> agree that the suggested fix, even in its current state, is still
>> significantly better than what is there now.
>>
>>
>> Unless somebody has a better suggestion, I'm going to be moving
>> the implementations of the handle_unsafe_access methods to
>> sharedRuntime (instead of stubRoutines) and will send out a new
>> webrev shortly.
>>
>>
>> I am unhappy with the fact that we factor unsafe handling out for x86
>> and sparc but do it inline for ppc. I know that was done before your
>> change as well but would be happy if that could be improved. I would
>> prefer either one of:
>
> Fully agree - this is an example of the more general problem of logic
> which is /almost/ the same across different platforms, but which has
> been effectively copy/pasted and drifted apart over time.
>
>>
>> 1) flatten out the coding into the signal handlers like it is done in
>> os_linux_ppc.cpp and os_aix_ppc.cpp or
>> 2) add a StubRoutines::ppc64::handle_unsafe_access() for the ppc case
>>
>> I would actually prefer (1) even though this would multiply the code
>> out for all os cases into <os_cpu>; we are only talking about 1-2
>> lines of additional coding, and it would improve the readability of
>> the signal handlers.
>>
>> But this is only my personal opinion, and I do not have strong
>> emotions. I agree with you that a full cleanup of the signal coding is
>> out of scope for this issue.
>
> I spent yesterday going back and forth on the various alternatives and
> the only thing I can say with certainty now is that apart from
> refactoring the whole thing everything else is ugly... For example, I
> agree that consistency is an important goal here, but since there's
> little to no consistency there today it's really hard to make a relevant
> dent in it. :(
>
> Flattening it out is an alternative (and a good one), but that is not
> something I'm willing to do as part of this change because only
> flattening this specific case/return will actually add to the
> inconstency... So ultimately yesterday I chose to do something closer to
> your alternative 2). Is it still ugly? Yes; lipstick on pig and all of
> that. Have a look at it and see how you feel about it. I try to keep in
> mind that what is there today is (more) broken. :)
>
> Webrev:
>
> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02/hotspot/webrev/
Now I see this in code form I really don't understand why next_pc is
passed in, unused and then returned ??
Otherwise in src/share/vm/runtime/sharedRuntime.cpp in the comment block
- capitals after periods please :)
Stub removal seems fine.
Thanks,
David
> Incremental from webrev.01:
>
> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.02.incr/hotspot/webrev/
>
> Cheers,
> Mikael
>
>>
>>
>> Cheers,
>> Mikael
>>
>>>
>>>
>>> BTW I found this comment somewhat unfathomable (both now and
>>> in original code):
>>>
>>> + // pc is the instruction which we must emulate
>>> + // doing a no-op is fine: return garbage from the load
>>>
>>> but finally realized that it means that after the load that
>>> raised the signal the native code proceeds normally but the
>>> value apparently loaded is just garbage/arbitrary, and the
>>> only sign something went wrong is the setting of the pending
>>> unsafe-access-error bit. This would be a potential source of
>>> bugs I think, except that when we hit the Java level, we
>>> throw the exception and so never actually "return" the
>>> garbage value. But it does mean we would have to be careful
>>> if calling the unsafe routines from native code.
>>>
>>>
>>> I admit I do not understand fully how the
>>> _special_runtime_exit_condition flag is processed later, at least
>>> not for all cases: If I have a java method A using
>>> sun.misc.unsafe, which gets compiled, the sun.misc.unsafe
>>> intrinsic gets inlined into that method. So, the whole method A
>>> gets marked as "has unsafe access"? So, any SIGBUS happening
>>> inside this method - which may be larger than the inlined
>>> sun.misc.unsafe call - will yield an InternalError? And when is
>>> the flag checked if that method A is called from another java
>>> method B?
>>>
>>> Sorry if the questions are stupid, I am not a JIT expert, but I
>>> try to understand how much can happen between the SIGBUS and the
>>> InternalError getting thrown.
>>
>> No questions are stupid here. As you may have seen in the other
>> thread, I filed JDK-8154592[1] to cover making the handling of the
>> faults synchronous. Hope that helps.
>>
>>
>> Thank you!
>>
>> Kind Regards, Thomas
>>
>> Cheers,
>> Mikael
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8154592
>>
>>
>>>
>>> Thanks, Thomas
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> Thanks,
>>> David
>>>
>>> On 9/04/2016 8:33 AM, Mikael Vidstedt wrote:
>>>
>>>
>>> Please review:
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8153892
>>> Webrev:
>>> http://cr.openjdk.java.net/~mikael/webrevs/8153892/webrev.01/hotspot/webrev/
>>> <http://cr.openjdk.java.net/%7Emikael/webrevs/8153892/webrev.01/hotspot/webrev/>
>>>
>>>
>>>
>>> * Note: this is patch 2 in a set of 3 all aiming to
>>> clean up and unify
>>> the unsafe memory getters/setters, along with the
>>> handling of unsafe
>>> access errors. The other two issues are:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8153890 -
>>> Handle unsafe access
>>> error as an asynchronous exception
>>> https://bugs.openjdk.java.net/browse/JDK-8150921 -
>>> Update Unsafe
>>> getters/setters to use double-register variants
>>>
>>>
>>> * Summary (copied from the bug description)
>>>
>>>
>>> In certain cases, such as accessing a region of a
>>> memory mapped file
>>> which has been truncated on unix-style operating
>>> systems, a SIGBUS
>>> signal will be raised and the VM will process it in
>>> the signal handler.
>>>
>>> How the signal is processed differs depending on the
>>> operating system
>>> and/or CPU architecture, with two major alternatives:
>>>
>>> * "stubless"
>>>
>>> Do the necessary thread state updates directly in the
>>> signal handler,
>>> and modify the context so that the signal handler
>>> returns to the place
>>> where the execution should continue
>>>
>>> * Using a stub
>>>
>>> Update the context so that when the signal handler
>>> returns the thread
>>> will continue execution in a generated stub, which in
>>> turn will call
>>> some native code in the VM to update the thread state
>>> and figure out
>>> where execution should continue. The stub will then
>>> jump to that new
>>> place.
>>>
>>>
>>> It should be noted that the work of updating the
>>> thread state is very
>>> small - it's setting a flag or two in the thread
>>> structure, and figures
>>> out where the next instruction starts. It should also
>>> be noted that the
>>> generated stubs today are broken, because they do not
>>> preserve all the
>>> live registers over the call into the VM. There are
>>> two ways to address
>>> this:
>>>
>>> * Preserve all the necessary registers
>>>
>>> This would mean implementing, in macro assembly, the
>>> necessary logic for
>>> preserving all the live registers, including, but not
>>> limited to,
>>> floating point registers, flag registers, etc. It
>>> quickly becomes
>>> obvious that this platform specific and error prone.
>>>
>>> * Leverage the fact that the operating system already
>>> does this as part
>>> of the signal handling
>>>
>>> Do the necessary work in the signal handler instead,
>>> removing the need
>>> for the stub alltogether
>>>
>>> As mentioned, on some platforms the latter model is
>>> already in use. It
>>> is dramatically easier and all platforms should be
>>> updated to do it the
>>> same way.
>>>
>>>
>>> * Testing
>>>
>>> Just as mentioned in the RFR for JDK-8153890, a new
>>> test was developed
>>> to test this code path:
>>>
>>> http://cr.openjdk.java.net/~mikael/webrevs/8150921/MappedTruncated.java
>>> <http://cr.openjdk.java.net/%7Emikael/webrevs/8150921/MappedTruncated.java>
>>>
>>> In fact, it was when running this test I found the
>>> register preservation
>>> issue. JPRT also passes. Much like JDK-8153890 I
>>> wanted to get some
>>> feedback on this before running additional tests.
>>>
>>>
>>> Cheers,
>>> Mikael
>>>
>>>
>>
>>
>
More information about the ppc-aix-port-dev
mailing list