RFR: JDK-8255711: Fix and unify hotspot signal handlers
Thomas Stüfe
thomas.stuefe at gmail.com
Thu Nov 5 07:32:15 UTC 2020
Hi David,
<trimming and combining your mails>
>
> > > 10) When invoking the fatal error handler, we extract the pc from
> > the context and hand it over as "faulting pc". For SIGILL and
> > SIGFPE, this is not totally correct. According to POSIX [3], for
> > those signals the address of the faulting instruction is handed over
> > in `si_info.si_addr`.
> > >
> > > On most platforms this does not matter, they are the same. But on
> > some architectures the pc in the signal context actually points
> > somewhere else, e.g. beyond the faulting instruction. Therefore
> > `si_info.si_addr` is the better choice.
> >
> > The Posix spec also states "For some implementations, the value of
> > si_addr may be inaccurate." - so I'm not at all sure which "pc" we
> > should be trusting here? I thought the ucontext was the detailed
> > platform specific "context" object that we should extract information
> > from. Which architectures give different values in the two and is
> there
> > some documentation stating what happens for any given os/cpu?
> >
> >
> > I overlooked this. Well, this is a helpful standard :)
> >
> > I saw this happen on s390 and on pa-risc. Before this patch, I did
> > correct this in the platform handler. Out of caution I could #ifdef this
> > section to s390.
>
> I prefer not see any change in behaviour for platforms where no problem
> has been observed. Also see what seems to be related comment in
> ./cpu/arm/vm_version_arm_32.cpp
>
>
Okay, made this section s390 only.
Arm may or may not have the same problem. Their signal handler is in parts
using si_addr, partly the context pc. But since you want to keep old
behavior, let's keep it for now.
I still think si_addr would be more correct and simpler, but we can leave
this for a follow up fix.
> // JVM_handle_linux_signal moves PC here if SIGILL happens
>
> > > ----
> > >
> > > The changes in this patch:
> > >
> > > a) hotspot signal handling is now done by the following functions:
> > >
> > > <OS> <foreign code, probably signal handler too>
> > > | |
> > > v v
> > > javaSignalHandler JVM_handle_linux_signal()
> > > | /
> > > v v
> > > javaSignalHandler_inner
> >
> > Not clear why we need the _inner version. Why can't we just have
> > javaSignalHandler which is installed as the handler and which is
> called
> > by JVM_handle_XXX_signal?
> >
> >
> > Because JVM_handle_XXX_signal has one more argument than the standard
> > signal handler (abort_if_unrecognized).
>
> Okay so why introduce this shape instead of keeping the existing form:
>
> javaSignalHandler -> JVM_handle_xxx_signal(..., true) ->
> javaSignalHandler_inner
>
> ? With the new arrangement the equivalence between javaSignalHandler and
> JVM_handle_xxx_signal can only be seen by inspecting the code of both.
>
>
I see now what you mean.
To my mind, JVM_handle_linux_signal() is an external API with a contract
which does not necessarily correspond with what javaSignalHandler_inner()
is about to do. Its existence demonstrates that (as an explicit side door
into the signal handler hierarchy), and that door can be guarded and
adorned with pre- and post-processing if needed.
For example, the contract says that the user shall not pass anything other
than the given list of signals. Well, we can assert it here more clearly.
So users could use a debug VM to test their application. It also means we
can, for the release case, safely shortcut signals it should ignore. That
isolates this interface a bit from any future changes to the hotspot signal
handlers (because no-one ever thinks about this stuff when working with the
handlers).
I know these are not hard arguments, it is a matter of taste. If you
insist, I change it in the way you prefer.
> > So I think JVM_handle_xx_signal() should test for the list of allowed
> > signals, and just return false right away in case sig is none of the
> > hotspot signals.
> I don't think we should change existing behaviour here.
But is that not an error worth fixing? Before, someone could have passed in
177 as a signal number and this would have crashed the VM if he also passed
in abort_if_unrecognized=true.
This is a bit like my argument from above. I believe we have a clear
contract with this API. If that contract is broken, what should we do? I
prefer to assert, but in release case to be tolerant and ignore it.
Otherwise, what good is this contract if we cannot rely on it and don't
check it?
> > // This routine may recognize any of the following kinds of signals:
> > // SIGBUS, SIGSEGV, SIGILL, SIGFPE, SIGQUIT, SIGPIPE, SIGXFSZ,
SIGUSR1.
> > // It should be consulted by handlers for any of those signals.
> >
> > Note that this list is not really correct, as it includes SIGUSR1
> > and SIGQUIT. None of which are handled by the hotspot signal handler. As
> > you wrote before, this mechanism is only to handle signals the hotspot
> > commandeers.
>
> SIGUSR1 is probably a leftover from when we did use SIGUSR1 for something.
>
> SIGQUIT is interesting because if the app is in charge of signals then
> it will install a SIGQUIT handler and we would not want the "VM" to do
> it's normal SIGQUIT handler.
Hm, I can see it either way. SIGQUIT is for thread dumping. Whether or not
the application would want us to honor it we cannot say.
The current behavior would be to treat SIGQUIT as unknown signal (crash or
ignore). I would keep that behavior for now.
> I have a big question mark over how the use
> of the signal handler thread can/should interact with
> AllowUserSignalHandlers.
What is the signal handler thread?
Thanks, Thomas
More information about the hotspot-dev
mailing list