RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM

David Holmes david.holmes at oracle.com
Mon Nov 16 04:03:54 UTC 2015


On 13/11/2015 11:38 PM, David Holmes wrote:
> On 13/11/2015 7:53 PM, Thomas Stüfe wrote:
>> Hi Goetz,
>>
>> sorry for not looking at this earlier. This is a nice cleanup. Some
>> remarks:
>>
>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/aix/vm/os_aix.cpp.udiff.html
>>
>>
>> +    if (sig > MAX2(SIGSEGV, SIGBUS) &&  // See 4355769.
>> +        sig < NSIG) {                   // Must be legal signal and fit
>> into sigflags[].
>>
>> I do not like much the MAX2() construct. I would like it better to
>> explicitly check whether the SR signal is one of the "forbidden" ones
>> the VM uses.
>
> I must confess I had not looked into 4355769 but this check seems rather
> spurious. It is not at all clear to me what signals could be used here -

Okay should have looked into 4355769. The problem is how multiple 
pending signals are handled. It seems that in the past (no idea if still 
true) pending signals were handled in signal-number order (lowest 
first), not FIFO. The problem scenario is this:
- thread accesses a null pointer in compiled Java code and the SEGV 
handler will cause NPE to be thrown
- at the same time as the SEGV is being raised the thread is also hit 
with the SR signal to suspend it.
- the SR signal will be delivered first and the SR handler starts to run 
- with signals unblocked.
- the SEGV then gets delivered to the thread in the SR handler, and the 
regular signal handler is run
- the regular signal handler tries to detect if we're running in Java 
code so it can post the NPE, but the presence of the SR handler causes 
that check to fail - so we abort thinking it is a real SEGV.

I don't know how much of that is still true today. It seems strange to 
me that a kill based directed signal can usurp a synchronous signal.

Anyway the fix, rather workaround, for that problem, was to ensure that 
the SR_signum is greater than any potential synchronous signal the VM 
cares about. Why SIGBUS was included there I don't know give that:

a) it is already a lower signal number than SIGUSR1, SIGSEGV and SIGUSR2
b) we don't deliberately generate and use SIGBUS ... though perhaps 
unsafe-fetch needs to be considered.

A better fix in my opinion, and as mentioned in the bug, would have been 
to disable delivery of SEGV whilst the SR handler is executing. But we 
start to touch on some grey areas of the POSIX spec there, and likely 
the implementation too.

So I suggest that for this cleanup we simply leave this logic exactly as is.

Thanks,
David

> other than SIGUSR1 or SIGUSR2 (if -Xrs is specified), or else a
> real-time signal (modulo discussion below). Hijacking anything else
> seems rather suspect.
>
>> Maybe keep a mask defined centrally for each platform which contains
>> signals the VM needs for itself ?
>
> Such masks already exist.
>
>> +sigset_t os::Aix::sigs = { 0 };
>>
>> I would not initialize the signal set this way. sigset_t is an opaque
>> type; the only way to initialize it is with one of sigemptyset() or
>> sigfillset().
>
> Good catch - I overlooked that.
>
>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/aix/vm/os_aix.hpp.udiff.html
>>
>>
>> +  static struct sigaction sigact[NSIG]; // saved preinstalled sigactions
>> +  static sigset_t sigs;                 // mask of signals that have
>>
>> +  static int sigflags[NSIG];
>>
>> I know this is not in the scope of your change, but I would like to see
>> those removed from os::Aix and put into os_aix.cpp at static filescope.
>> There is no need at all to export those, and you would get rid of the
>> signal.h dependency you know have when including os_aix.hpp.
>>
>>
>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/bsd/vm/jsig.c.udiff.html
>>
>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/bsd/vm/os_bsd.cpp.udiff.html
>>
>>
>> On BSD, we have realtime signals.
>>
>> http://fxr.watson.org/fxr/source/sys/signal.h
>> #define SIGRTMAX 126
>> and NSIG does not contain them:
>> #define NSIG 32
>>
>> The max. possible signal number would be 126, which unfortunately does
>> not even fit into a 64bit mask.
>
> So this simply limits the signal choice to not be a real-time signal -
> same as today.
>
>> So the code in jsig.c is broken for the case that someone wants to
>> register realtime signals, if the VM were to ever use realtime signals
>> itself, which now is not the case.
>>
>> The same is true for os_bsd.cpp, where signal chaining will not work if
>> the application did have handler for real time signals pre-installed
>> before jvm is loaded.
>
> Chaining is only used when the JVM will catch signals. Aren't all the
> real-time signals going to be blocked by the VM by default and so
> chaining is not needed as no handler will exist in the VM ?? (Unless a
> real-time signal is supplied for SR_signum)
>
> I must admit I don't know if any of this code actually works for
> real-time signals.
>
>> Solaris:
>>
>> The only platform where NSIG is missing?
>>
>> Here, we calculate the max. signal number dynamically in os_solaris.cpp,
>> presumably because SIGRTMAX is not a constant and can be changed using
>> system configuration. But then, on Linux we have the same situation
>> (SIGRTMAX is dynamic) and there we do not go through the trouble of
>> calculating the max. signal number dynamically. Instead we just use
>> NSIG=64 and rely on the fact that NSIG is larger than the largest
>> possible dynamic value for SIGRTMAX.
>
> Linux ensures that _NSIG (and thus NSIG) includes all the real-time
> signals. But libc can expose a subset and steal some for its own use.
>
>> Solaris does not seem to have NSIG defined, but I am sure there is also
>> a max. possible value for SIGRTMAX (the default seems to be 48). So, one
>> could probably safely define NSIG for Solaris too, so that we have NSIG
>> defined on all Posix platforms.
>
> Solaris doesn't have any of this SR_signum related code. A more general
> cleanup of signal related code would potentially involve a lot of cleanup.
>
> David
> -----
>
>>
>> On Thu, Nov 12, 2015 at 8:24 PM, Lindenmaier, Goetz
>> <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>> wrote:
>>
>>     Hi David, Dmitry,
>>
>>     I've come up with a new webrev:
>>     http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/
>>
>>     I hit on some more issues:
>>       - As proposed, I replaced MAXSIGNUM by NSIG
>>       - On AIX, NSIG=255.  Therefore storing bits in a word does not
>> work.
>>          I'm now using bitset functionality from signal.h as it's done
>>     in other places.
>>         sigset_t is >> NSIG on linux, so it's no good idea to use it
>> there.
>>
>>
>> Why do we not do this on all platforms, provided sigset_t contains all
>> signals (incl. realtime signals) ?
>>
>>       - In the os files I found another bit vector that now is too
>>     small: sigs.
>>         I adapted that, too.  Removed the dead declaration of this on
>>     solaris.
>>
>>     Best regards,
>>        Goetz.
>>
>>
>>
>> Kind Regards, Thomas
>>
>>     > -----Original Message-----
>>     > From: Dmitry Samersoff [mailto:dmitry.samersoff at oracle.com
>> <mailto:dmitry.samersoff at oracle.com>]
>>      > Sent: Donnerstag, 12. November 2015 10:05
>>      > To: Lindenmaier, Goetz; David Holmes; hotspot-runtime-
>>      > dev at openjdk.java.net <mailto:dev at openjdk.java.net>;
>>     serviceability-dev
>>      > Subject: Re: RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM
>>      >
>>      > Goetz,
>>      >
>>      > *BSD including OS X also defines NSIG (just checked) and if my
>>     memory is
>>      > not bogus, AIX defines it too.
>>      >
>>      > So you may consider to use NSIG on all platform.
>>      >
>>      > -Dmitry
>>      >
>>      > On 2015-11-12 11:36, Lindenmaier, Goetz wrote:
>>      > > OK I'll change it to NSIG.  That's used in other places in
>>     os_linux, too.
>>      > > So it's really more consistent.
>>      > >
>>      > > Best regards,
>>      > >   Goetz
>>      > >
>>      > >> -----Original Message-----
>>      > >> From: Dmitry Samersoff [mailto:dmitry.samersoff at oracle.com
>>     <mailto:dmitry.samersoff at oracle.com>]
>>      > >> Sent: Donnerstag, 12. November 2015 09:22
>>      > >> To: David Holmes; Lindenmaier, Goetz; hotspot-runtime-
>>      > >> dev at openjdk.java.net <mailto:dev at openjdk.java.net>;
>>     serviceability-dev
>>      > >> Subject: Re: RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM
>>      > >>
>>      > >> David,
>>      > >>
>>      > >> I think it's better to use NSIG (without underscore) defined
>>     in signal.h
>>      > >>
>>      > >> -Dmitry
>>      > >>
>>      > >>
>>      > >> On 2015-11-12 10:35, David Holmes wrote:
>>      > >>> Hi Goetz,
>>      > >>>
>>      > >>> Adding in serviceability-dev
>>      > >>>
>>      > >>> On 9/11/2015 6:22 PM, Lindenmaier, Goetz wrote:
>>      > >>>> Hi,
>>      > >>>>
>>      > >>>> The environment variable _JAVA_SR_SIGNUM can be set to a
>> signal
>>      > >> number
>>      > >>>> do be used by the JVM's suspend/resume mechanism.
>>      > >>>>
>>      > >>>> If set, a signal handler is installed and the current signal
>>     handler
>>      > >>>> is saved to an array.
>>      > >>>> On linux, this array had size MAXSIGNUM=32, and
>> _JAVA_SR_SIGNUM
>>      > >> was
>>      > >>>> allowed
>>      > >>>> to range up to _NSIG=65. This could cause memory corruption.
>>      > >>>>
>>      > >>>> Further, in jsig.c, an unsinged int is used to set a bit for
>>     signals.
>>      > >>>> This also
>>      > >>>> is too small, as only 32 signals can be supported.
>> Further, the
>>      > >>>> signals are mapped
>>      > >>>> wrong to these bits.  '0' is not a valid signal, but '32'
>>     was.  1<<32
>>      > >>>> happens to map to
>>      > >>>> zero, so the signal could be stored, but this probably was
>> not
>>      > >>>> intended that way.
>>      > >>>>
>>      > >>>> This change increases MAXSIGNUM to 65 on linux, and to 64 on
>>     aix. It
>>      > >>>> introduces
>>      > >>>> proper checking of the signal read from the env var, and
>>     issues a
>>      > >>>> warning if it
>>      > >>>> does not use the signal set.  It adapts the data types in
>>     jisig.c
>>      > >>>> properly.
>>      > >>>>
>>      > >>>> Please review this change.  I please need a sponsor.
>>      > >>>>
>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.00
>>      > >>>
>>      > >>> This all sounds very good to me. (I must find out why Solaris
>>     is not
>>      > >>> involved here :) ).
>>      > >>>
>>      > >>> On Linux you didn't add the bounds check to
>>     os::Linux::set_our_sigflags.
>>      > >>>
>>      > >>> I'm also wondering about documenting where we are
>> determining the
>>      > >>> maximum from? Is it simply _NSIG on some/all distributions?
>>     And I see
>>      > >>> _NSIG is supposed to be the biggest signal number + one. Also
>>     linux
>>      > >>> defines NSIG = _NSIG so which should we be using?
>>      > >>>
>>      > >>> Thanks,
>>      > >>> David
>>      > >>>
>>      > >>>> Best regards,
>>      > >>>>    Goetz.
>>      > >>>>
>>      > >>
>>      > >>
>>      > >> --
>>      > >> Dmitry Samersoff
>>      > >> Oracle Java development team, Saint Petersburg, Russia
>>      > >> * I would love to change the world, but they won't give me the
>>     sources.
>>      >
>>      >
>>      > --
>>      > Dmitry Samersoff
>>      > Oracle Java development team, Saint Petersburg, Russia
>>      > * I would love to change the world, but they won't give me the
>>     sources.
>>
>>


More information about the hotspot-runtime-dev mailing list