RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM

Dmitry Samersoff dmitry.samersoff at oracle.com
Mon Nov 16 10:54:19 UTC 2015


Goetz,

On 2015-11-16 13:32, Lindenmaier, Goetz wrote:
> Hi Dmitry, 
> 
> I think this is a rather complex issue which is better documented in the 
> bug than in the code.  The code already references the bugID.
> Also, I would have to copy the explanation into three files ...
> I you don't object I'll leave the comment as is.

I'm OK with it. Leave it as is.

-Dmitry


> 
> Best regards,
>   Goetz.
> 
> -----Original Message-----
> From: Dmitry Samersoff [mailto:dmitry.samersoff at oracle.com] 
> Sent: Montag, 16. November 2015 09:23
> To: David Holmes; Thomas Stüfe; Lindenmaier, Goetz
> Cc: serviceability-dev; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM
> 
> David,
> 
> Thank you for detailed explanation. Could we put it as a comment right
> into the code?
> 
> -Dmitry
> 
> On 2015-11-16 07:03, David Holmes wrote:
>> On 13/11/2015 11:38 PM, David Holmes wrote:
>>> On 13/11/2015 7:53 PM, Thomas Stüfe wrote:
>>>> Hi Goetz,
>>>>
>>>> sorry for not looking at this earlier. This is a nice cleanup. Some
>>>> remarks:
>>>>
>>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/aix/vm/os_aix.cpp.udiff.html
>>>>
>>>>
>>>>
>>>> +    if (sig > MAX2(SIGSEGV, SIGBUS) &&  // See 4355769.
>>>> +        sig < NSIG) {                   // Must be legal signal and fit
>>>> into sigflags[].
>>>>
>>>> I do not like much the MAX2() construct. I would like it better to
>>>> explicitly check whether the SR signal is one of the "forbidden" ones
>>>> the VM uses.
>>>
>>> I must confess I had not looked into 4355769 but this check seems rather
>>> spurious. It is not at all clear to me what signals could be used here -
>>
>> Okay should have looked into 4355769. The problem is how multiple
>> pending signals are handled. It seems that in the past (no idea if still
>> true) pending signals were handled in signal-number order (lowest
>> first), not FIFO. The problem scenario is this:
>> - thread accesses a null pointer in compiled Java code and the SEGV
>> handler will cause NPE to be thrown
>> - at the same time as the SEGV is being raised the thread is also hit
>> with the SR signal to suspend it.
>> - the SR signal will be delivered first and the SR handler starts to run
>> - with signals unblocked.
>> - the SEGV then gets delivered to the thread in the SR handler, and the
>> regular signal handler is run
>> - the regular signal handler tries to detect if we're running in Java
>> code so it can post the NPE, but the presence of the SR handler causes
>> that check to fail - so we abort thinking it is a real SEGV.
>>
>> I don't know how much of that is still true today. It seems strange to
>> me that a kill based directed signal can usurp a synchronous signal.
>>
>> Anyway the fix, rather workaround, for that problem, was to ensure that
>> the SR_signum is greater than any potential synchronous signal the VM
>> cares about. Why SIGBUS was included there I don't know give that:
>>
>> a) it is already a lower signal number than SIGUSR1, SIGSEGV and SIGUSR2
>> b) we don't deliberately generate and use SIGBUS ... though perhaps
>> unsafe-fetch needs to be considered.
>>
>> A better fix in my opinion, and as mentioned in the bug, would have been
>> to disable delivery of SEGV whilst the SR handler is executing. But we
>> start to touch on some grey areas of the POSIX spec there, and likely
>> the implementation too.
>>
>> So I suggest that for this cleanup we simply leave this logic exactly as
>> is.
>>
>> Thanks,
>> David
>>
>>> other than SIGUSR1 or SIGUSR2 (if -Xrs is specified), or else a
>>> real-time signal (modulo discussion below). Hijacking anything else
>>> seems rather suspect.
>>>
>>>> Maybe keep a mask defined centrally for each platform which contains
>>>> signals the VM needs for itself ?
>>>
>>> Such masks already exist.
>>>
>>>> +sigset_t os::Aix::sigs = { 0 };
>>>>
>>>> I would not initialize the signal set this way. sigset_t is an opaque
>>>> type; the only way to initialize it is with one of sigemptyset() or
>>>> sigfillset().
>>>
>>> Good catch - I overlooked that.
>>>
>>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/aix/vm/os_aix.hpp.udiff.html
>>>>
>>>>
>>>>
>>>> +  static struct sigaction sigact[NSIG]; // saved preinstalled
>>>> sigactions
>>>> +  static sigset_t sigs;                 // mask of signals that have
>>>>
>>>> +  static int sigflags[NSIG];
>>>>
>>>> I know this is not in the scope of your change, but I would like to see
>>>> those removed from os::Aix and put into os_aix.cpp at static filescope.
>>>> There is no need at all to export those, and you would get rid of the
>>>> signal.h dependency you know have when including os_aix.hpp.
>>>>
>>>>
>>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/bsd/vm/jsig.c.udiff.html
>>>>
>>>>
>>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/bsd/vm/os_bsd.cpp.udiff.html
>>>>
>>>>
>>>>
>>>> On BSD, we have realtime signals.
>>>>
>>>> http://fxr.watson.org/fxr/source/sys/signal.h
>>>> #define SIGRTMAX 126
>>>> and NSIG does not contain them:
>>>> #define NSIG 32
>>>>
>>>> The max. possible signal number would be 126, which unfortunately does
>>>> not even fit into a 64bit mask.
>>>
>>> So this simply limits the signal choice to not be a real-time signal -
>>> same as today.
>>>
>>>> So the code in jsig.c is broken for the case that someone wants to
>>>> register realtime signals, if the VM were to ever use realtime signals
>>>> itself, which now is not the case.
>>>>
>>>> The same is true for os_bsd.cpp, where signal chaining will not work if
>>>> the application did have handler for real time signals pre-installed
>>>> before jvm is loaded.
>>>
>>> Chaining is only used when the JVM will catch signals. Aren't all the
>>> real-time signals going to be blocked by the VM by default and so
>>> chaining is not needed as no handler will exist in the VM ?? (Unless a
>>> real-time signal is supplied for SR_signum)
>>>
>>> I must admit I don't know if any of this code actually works for
>>> real-time signals.
>>>
>>>> Solaris:
>>>>
>>>> The only platform where NSIG is missing?
>>>>
>>>> Here, we calculate the max. signal number dynamically in os_solaris.cpp,
>>>> presumably because SIGRTMAX is not a constant and can be changed using
>>>> system configuration. But then, on Linux we have the same situation
>>>> (SIGRTMAX is dynamic) and there we do not go through the trouble of
>>>> calculating the max. signal number dynamically. Instead we just use
>>>> NSIG=64 and rely on the fact that NSIG is larger than the largest
>>>> possible dynamic value for SIGRTMAX.
>>>
>>> Linux ensures that _NSIG (and thus NSIG) includes all the real-time
>>> signals. But libc can expose a subset and steal some for its own use.
>>>
>>>> Solaris does not seem to have NSIG defined, but I am sure there is also
>>>> a max. possible value for SIGRTMAX (the default seems to be 48). So, one
>>>> could probably safely define NSIG for Solaris too, so that we have NSIG
>>>> defined on all Posix platforms.
>>>
>>> Solaris doesn't have any of this SR_signum related code. A more general
>>> cleanup of signal related code would potentially involve a lot of
>>> cleanup.
>>>
>>> David
>>> -----
>>>
>>>>
>>>> On Thu, Nov 12, 2015 at 8:24 PM, Lindenmaier, Goetz
>>>> <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>> wrote:
>>>>
>>>>     Hi David, Dmitry,
>>>>
>>>>     I've come up with a new webrev:
>>>>     http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/
>>>>
>>>>     I hit on some more issues:
>>>>       - As proposed, I replaced MAXSIGNUM by NSIG
>>>>       - On AIX, NSIG=255.  Therefore storing bits in a word does not
>>>> work.
>>>>          I'm now using bitset functionality from signal.h as it's done
>>>>     in other places.
>>>>         sigset_t is >> NSIG on linux, so it's no good idea to use it
>>>> there.
>>>>
>>>>
>>>> Why do we not do this on all platforms, provided sigset_t contains all
>>>> signals (incl. realtime signals) ?
>>>>
>>>>       - In the os files I found another bit vector that now is too
>>>>     small: sigs.
>>>>         I adapted that, too.  Removed the dead declaration of this on
>>>>     solaris.
>>>>
>>>>     Best regards,
>>>>        Goetz.
>>>>
>>>>
>>>>
>>>> Kind Regards, Thomas
>>>>
>>>>     > -----Original Message-----
>>>>     > From: Dmitry Samersoff [mailto:dmitry.samersoff at oracle.com
>>>> <mailto:dmitry.samersoff at oracle.com>]
>>>>      > Sent: Donnerstag, 12. November 2015 10:05
>>>>      > To: Lindenmaier, Goetz; David Holmes; hotspot-runtime-
>>>>      > dev at openjdk.java.net <mailto:dev at openjdk.java.net>;
>>>>     serviceability-dev
>>>>      > Subject: Re: RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM
>>>>      >
>>>>      > Goetz,
>>>>      >
>>>>      > *BSD including OS X also defines NSIG (just checked) and if my
>>>>     memory is
>>>>      > not bogus, AIX defines it too.
>>>>      >
>>>>      > So you may consider to use NSIG on all platform.
>>>>      >
>>>>      > -Dmitry
>>>>      >
>>>>      > On 2015-11-12 11:36, Lindenmaier, Goetz wrote:
>>>>      > > OK I'll change it to NSIG.  That's used in other places in
>>>>     os_linux, too.
>>>>      > > So it's really more consistent.
>>>>      > >
>>>>      > > Best regards,
>>>>      > >   Goetz
>>>>      > >
>>>>      > >> -----Original Message-----
>>>>      > >> From: Dmitry Samersoff [mailto:dmitry.samersoff at oracle.com
>>>>     <mailto:dmitry.samersoff at oracle.com>]
>>>>      > >> Sent: Donnerstag, 12. November 2015 09:22
>>>>      > >> To: David Holmes; Lindenmaier, Goetz; hotspot-runtime-
>>>>      > >> dev at openjdk.java.net <mailto:dev at openjdk.java.net>;
>>>>     serviceability-dev
>>>>      > >> Subject: Re: RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM
>>>>      > >>
>>>>      > >> David,
>>>>      > >>
>>>>      > >> I think it's better to use NSIG (without underscore) defined
>>>>     in signal.h
>>>>      > >>
>>>>      > >> -Dmitry
>>>>      > >>
>>>>      > >>
>>>>      > >> On 2015-11-12 10:35, David Holmes wrote:
>>>>      > >>> Hi Goetz,
>>>>      > >>>
>>>>      > >>> Adding in serviceability-dev
>>>>      > >>>
>>>>      > >>> On 9/11/2015 6:22 PM, Lindenmaier, Goetz wrote:
>>>>      > >>>> Hi,
>>>>      > >>>>
>>>>      > >>>> The environment variable _JAVA_SR_SIGNUM can be set to a
>>>> signal
>>>>      > >> number
>>>>      > >>>> do be used by the JVM's suspend/resume mechanism.
>>>>      > >>>>
>>>>      > >>>> If set, a signal handler is installed and the current signal
>>>>     handler
>>>>      > >>>> is saved to an array.
>>>>      > >>>> On linux, this array had size MAXSIGNUM=32, and
>>>> _JAVA_SR_SIGNUM
>>>>      > >> was
>>>>      > >>>> allowed
>>>>      > >>>> to range up to _NSIG=65. This could cause memory corruption.
>>>>      > >>>>
>>>>      > >>>> Further, in jsig.c, an unsinged int is used to set a bit for
>>>>     signals.
>>>>      > >>>> This also
>>>>      > >>>> is too small, as only 32 signals can be supported.
>>>> Further, the
>>>>      > >>>> signals are mapped
>>>>      > >>>> wrong to these bits.  '0' is not a valid signal, but '32'
>>>>     was.  1<<32
>>>>      > >>>> happens to map to
>>>>      > >>>> zero, so the signal could be stored, but this probably was
>>>> not
>>>>      > >>>> intended that way.
>>>>      > >>>>
>>>>      > >>>> This change increases MAXSIGNUM to 65 on linux, and to 64 on
>>>>     aix. It
>>>>      > >>>> introduces
>>>>      > >>>> proper checking of the signal read from the env var, and
>>>>     issues a
>>>>      > >>>> warning if it
>>>>      > >>>> does not use the signal set.  It adapts the data types in
>>>>     jisig.c
>>>>      > >>>> properly.
>>>>      > >>>>
>>>>      > >>>> Please review this change.  I please need a sponsor.
>>>>      > >>>>
>>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.00
>>>>      > >>>
>>>>      > >>> This all sounds very good to me. (I must find out why Solaris
>>>>     is not
>>>>      > >>> involved here :) ).
>>>>      > >>>
>>>>      > >>> On Linux you didn't add the bounds check to
>>>>     os::Linux::set_our_sigflags.
>>>>      > >>>
>>>>      > >>> I'm also wondering about documenting where we are
>>>> determining the
>>>>      > >>> maximum from? Is it simply _NSIG on some/all distributions?
>>>>     And I see
>>>>      > >>> _NSIG is supposed to be the biggest signal number + one. Also
>>>>     linux
>>>>      > >>> defines NSIG = _NSIG so which should we be using?
>>>>      > >>>
>>>>      > >>> Thanks,
>>>>      > >>> David
>>>>      > >>>
>>>>      > >>>> Best regards,
>>>>      > >>>>    Goetz.
>>>>      > >>>>
>>>>      > >>
>>>>      > >>
>>>>      > >> --
>>>>      > >> Dmitry Samersoff
>>>>      > >> Oracle Java development team, Saint Petersburg, Russia
>>>>      > >> * I would love to change the world, but they won't give me the
>>>>     sources.
>>>>      >
>>>>      >
>>>>      > --
>>>>      > Dmitry Samersoff
>>>>      > Oracle Java development team, Saint Petersburg, Russia
>>>>      > * I would love to change the world, but they won't give me the
>>>>     sources.
>>>>
>>>>
> 
> 


-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.


More information about the hotspot-runtime-dev mailing list