RFR: JDK-8255711: Fix and unify hotspot signal handlers
David Holmes
david.holmes at oracle.com
Thu Nov 5 23:57:30 UTC 2020
Hi Thomas,
On 5/11/2020 5:32 pm, Thomas Stüfe wrote:
> Hi David,
>
> <trimming and combining your mails>
>
>
> > > 10) When invoking the fatal error handler, we extract the
> pc from
> > the context and hand it over as "faulting pc". For SIGILL and
> > SIGFPE, this is not totally correct. According to POSIX [3], for
> > those signals the address of the faulting instruction is
> handed over
> > in `si_info.si_addr`.
> > >
> > > On most platforms this does not matter, they are the same.
> But on
> > some architectures the pc in the signal context actually points
> > somewhere else, e.g. beyond the faulting instruction. Therefore
> > `si_info.si_addr` is the better choice.
> >
> > The Posix spec also states "For some implementations, the
> value of
> > si_addr may be inaccurate." - so I'm not at all sure which
> "pc" we
> > should be trusting here? I thought the ucontext was the detailed
> > platform specific "context" object that we should extract
> information
> > from. Which architectures give different values in the two
> and is there
> > some documentation stating what happens for any given os/cpu?
> >
> >
> > I overlooked this. Well, this is a helpful standard :)
> >
> > I saw this happen on s390 and on pa-risc. Before this patch, I did
> > correct this in the platform handler. Out of caution I could
> #ifdef this
> > section to s390.
>
> I prefer not see any change in behaviour for platforms where no problem
> has been observed. Also see what seems to be related comment in
> ./cpu/arm/vm_version_arm_32.cpp
>
>
> Okay, made this section s390 only.
>
> Arm may or may not have the same problem. Their signal handler is in
> parts using si_addr, partly the context pc. But since you want to keep
> old behavior, let's keep it for now.
>
> I still think si_addr would be more correct and simpler, but we can
> leave this for a follow up fix.
Okay. Thanks.
> // JVM_handle_linux_signal moves PC here if SIGILL happens
>
> > > ----
> > >
> > > The changes in this patch:
> > >
> > > a) hotspot signal handling is now done by the following
> functions:
> > >
> > > <OS> <foreign code, probably signal
> handler too>
> > > | |
> > > v v
> > > javaSignalHandler JVM_handle_linux_signal()
> > > | /
> > > v v
> > > javaSignalHandler_inner
> >
> > Not clear why we need the _inner version. Why can't we just have
> > javaSignalHandler which is installed as the handler and which
> is called
> > by JVM_handle_XXX_signal?
> >
> >
> > Because JVM_handle_XXX_signal has one more argument than the
> standard
> > signal handler (abort_if_unrecognized).
>
> Okay so why introduce this shape instead of keeping the existing form:
>
> javaSignalHandler -> JVM_handle_xxx_signal(..., true) ->
> javaSignalHandler_inner
>
> ? With the new arrangement the equivalence between javaSignalHandler
> and
> JVM_handle_xxx_signal can only be seen by inspecting the code of both.
>
>
> I see now what you mean.
>
> To my mind, JVM_handle_linux_signal() is an external API with a contract
> which does not necessarily correspond with
> what javaSignalHandler_inner() is about to do. Its existence
> demonstrates that (as an explicit side door into the signal handler
> hierarchy), and that door can be guarded and adorned with pre- and
> post-processing if needed.
In an abstract sense yes, but today javaSignalHandler does call
JVM_handle_xxx_signal and it is that implementation that we have
factored out into javaSignalHandler_inner.
> For example, the contract says that the user shall not pass anything
> other than the given list of signals. Well, we can assert it here more
> clearly. So users could use a debug VM to test their application. It
> also means we can, for the release case, safely shortcut signals it
> should ignore. That isolates this interface a bit from any future
> changes to the hotspot signal handlers (because no-one ever thinks about
> this stuff when working with the handlers).
I think you may be making this "contract" a bit more concrete than it
actually is. The large comments block states:
// The user-defined signal handler must pass unrecognized signals to this
// routine,
which doesn't suggest only a specific sub-set. It also states:
// This routine may recognize any of the following kinds of signals:
// SIGBUS, SIGSEGV, SIGILL, SIGFPE, SIGQUIT, SIGPIPE, SIGXFSZ, SIGUSR1.
but not that only says "may" - it is not IMO intended to be definitive,
only indicative.
So to me the application handler would be free to pass all signals
through (with abort_if_unrecognized==false) and use the return value to
determine if this was a VM related signal.
> I know these are not hard arguments, it is a matter of taste. If you
> insist, I change it in the way you prefer.
>
>> > So I think JVM_handle_xx_signal() should test for the list of allowed
>> > signals, and just return false right away in case sig is none of the
>> > hotspot signals.
>
> > I don't think we should change existing behaviour here.
>
> But is that not an error worth fixing? Before, someone could have passed
> in 177 as a signal number and this would have crashed the VM if he also
> passed in abort_if_unrecognized=true.
And this is exactly how it is supposed to work. There is a
responsibility on the application code to either pass known good signals
with abort_if_unrecognized, or else not set abort_if_unrecognized.
> This is a bit like my argument from above. I believe we have a clear
> contract with this API. If that contract is broken, what should we do? I
> prefer to assert, but in release case to be tolerant and ignore it.
> Otherwise, what good is this contract if we cannot rely on it and don't
> check it?
The "contract" includes the abort_if_unrecognized flag. If it is true
then we enforce things, otherwise we don't.
> > > // This routine may recognize any of the following kinds of signals:
> > > // SIGBUS, SIGSEGV, SIGILL, SIGFPE, SIGQUIT, SIGPIPE, SIGXFSZ,
> SIGUSR1.
> > > // It should be consulted by handlers for any of those signals.
> > >
> > > Note that this list is not really correct, as it includes SIGUSR1
> > > and SIGQUIT. None of which are handled by the hotspot signal
> handler. As
> > > you wrote before, this mechanism is only to handle signals the hotspot
> > > commandeers.
> >
> > SIGUSR1 is probably a leftover from when we did use SIGUSR1 for
> something.
> >
> > SIGQUIT is interesting because if the app is in charge of signals then
> > it will install a SIGQUIT handler and we would not want the "VM" to do
> > it's normal SIGQUIT handler.
>
> Hm, I can see it either way. SIGQUIT is for thread dumping. Whether or
> not the application would want us to honor it we cannot say.
I'm always forgetting which of SIGQUIT and SIGINT do which (to me QUIT
== exit the VM). But in both cases these are signals for which the JDK
installs a handler.
> The current behavior would be to treat SIGQUIT as unknown signal (crash
> or ignore). I would keep that behavior for now.
>
> > I have a big question mark over how the use
> > of the signal handler thread can/should interact with
> > AllowUserSignalHandlers.
>
> What is the signal handler thread?
The signal handler thread is a Java thread that synchronously processes
the signals that are managed by the os::user_handler(). But I realize
now these signals can be controlled by -Xrs so that the application can
manage them instead. (Though with no programmatic way to pass them on to
the JVM AFAICS).
Thanks,
David
-----
> Thanks, Thomas
More information about the hotspot-dev
mailing list