RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM
Lindenmaier, Goetz
goetz.lindenmaier at sap.com
Mon Nov 16 10:32:44 UTC 2015
Hi Dmitry,
I think this is a rather complex issue which is better documented in the
bug than in the code. The code already references the bugID.
Also, I would have to copy the explanation into three files ...
I you don't object I'll leave the comment as is.
Best regards,
Goetz.
-----Original Message-----
From: Dmitry Samersoff [mailto:dmitry.samersoff at oracle.com]
Sent: Montag, 16. November 2015 09:23
To: David Holmes; Thomas Stüfe; Lindenmaier, Goetz
Cc: serviceability-dev; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM
David,
Thank you for detailed explanation. Could we put it as a comment right
into the code?
-Dmitry
On 2015-11-16 07:03, David Holmes wrote:
> On 13/11/2015 11:38 PM, David Holmes wrote:
>> On 13/11/2015 7:53 PM, Thomas Stüfe wrote:
>>> Hi Goetz,
>>>
>>> sorry for not looking at this earlier. This is a nice cleanup. Some
>>> remarks:
>>>
>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/aix/vm/os_aix.cpp.udiff.html
>>>
>>>
>>>
>>> + if (sig > MAX2(SIGSEGV, SIGBUS) && // See 4355769.
>>> + sig < NSIG) { // Must be legal signal and fit
>>> into sigflags[].
>>>
>>> I do not like much the MAX2() construct. I would like it better to
>>> explicitly check whether the SR signal is one of the "forbidden" ones
>>> the VM uses.
>>
>> I must confess I had not looked into 4355769 but this check seems rather
>> spurious. It is not at all clear to me what signals could be used here -
>
> Okay should have looked into 4355769. The problem is how multiple
> pending signals are handled. It seems that in the past (no idea if still
> true) pending signals were handled in signal-number order (lowest
> first), not FIFO. The problem scenario is this:
> - thread accesses a null pointer in compiled Java code and the SEGV
> handler will cause NPE to be thrown
> - at the same time as the SEGV is being raised the thread is also hit
> with the SR signal to suspend it.
> - the SR signal will be delivered first and the SR handler starts to run
> - with signals unblocked.
> - the SEGV then gets delivered to the thread in the SR handler, and the
> regular signal handler is run
> - the regular signal handler tries to detect if we're running in Java
> code so it can post the NPE, but the presence of the SR handler causes
> that check to fail - so we abort thinking it is a real SEGV.
>
> I don't know how much of that is still true today. It seems strange to
> me that a kill based directed signal can usurp a synchronous signal.
>
> Anyway the fix, rather workaround, for that problem, was to ensure that
> the SR_signum is greater than any potential synchronous signal the VM
> cares about. Why SIGBUS was included there I don't know give that:
>
> a) it is already a lower signal number than SIGUSR1, SIGSEGV and SIGUSR2
> b) we don't deliberately generate and use SIGBUS ... though perhaps
> unsafe-fetch needs to be considered.
>
> A better fix in my opinion, and as mentioned in the bug, would have been
> to disable delivery of SEGV whilst the SR handler is executing. But we
> start to touch on some grey areas of the POSIX spec there, and likely
> the implementation too.
>
> So I suggest that for this cleanup we simply leave this logic exactly as
> is.
>
> Thanks,
> David
>
>> other than SIGUSR1 or SIGUSR2 (if -Xrs is specified), or else a
>> real-time signal (modulo discussion below). Hijacking anything else
>> seems rather suspect.
>>
>>> Maybe keep a mask defined centrally for each platform which contains
>>> signals the VM needs for itself ?
>>
>> Such masks already exist.
>>
>>> +sigset_t os::Aix::sigs = { 0 };
>>>
>>> I would not initialize the signal set this way. sigset_t is an opaque
>>> type; the only way to initialize it is with one of sigemptyset() or
>>> sigfillset().
>>
>> Good catch - I overlooked that.
>>
>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/aix/vm/os_aix.hpp.udiff.html
>>>
>>>
>>>
>>> + static struct sigaction sigact[NSIG]; // saved preinstalled
>>> sigactions
>>> + static sigset_t sigs; // mask of signals that have
>>>
>>> + static int sigflags[NSIG];
>>>
>>> I know this is not in the scope of your change, but I would like to see
>>> those removed from os::Aix and put into os_aix.cpp at static filescope.
>>> There is no need at all to export those, and you would get rid of the
>>> signal.h dependency you know have when including os_aix.hpp.
>>>
>>>
>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/bsd/vm/jsig.c.udiff.html
>>>
>>>
>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/src/os/bsd/vm/os_bsd.cpp.udiff.html
>>>
>>>
>>>
>>> On BSD, we have realtime signals.
>>>
>>> http://fxr.watson.org/fxr/source/sys/signal.h
>>> #define SIGRTMAX 126
>>> and NSIG does not contain them:
>>> #define NSIG 32
>>>
>>> The max. possible signal number would be 126, which unfortunately does
>>> not even fit into a 64bit mask.
>>
>> So this simply limits the signal choice to not be a real-time signal -
>> same as today.
>>
>>> So the code in jsig.c is broken for the case that someone wants to
>>> register realtime signals, if the VM were to ever use realtime signals
>>> itself, which now is not the case.
>>>
>>> The same is true for os_bsd.cpp, where signal chaining will not work if
>>> the application did have handler for real time signals pre-installed
>>> before jvm is loaded.
>>
>> Chaining is only used when the JVM will catch signals. Aren't all the
>> real-time signals going to be blocked by the VM by default and so
>> chaining is not needed as no handler will exist in the VM ?? (Unless a
>> real-time signal is supplied for SR_signum)
>>
>> I must admit I don't know if any of this code actually works for
>> real-time signals.
>>
>>> Solaris:
>>>
>>> The only platform where NSIG is missing?
>>>
>>> Here, we calculate the max. signal number dynamically in os_solaris.cpp,
>>> presumably because SIGRTMAX is not a constant and can be changed using
>>> system configuration. But then, on Linux we have the same situation
>>> (SIGRTMAX is dynamic) and there we do not go through the trouble of
>>> calculating the max. signal number dynamically. Instead we just use
>>> NSIG=64 and rely on the fact that NSIG is larger than the largest
>>> possible dynamic value for SIGRTMAX.
>>
>> Linux ensures that _NSIG (and thus NSIG) includes all the real-time
>> signals. But libc can expose a subset and steal some for its own use.
>>
>>> Solaris does not seem to have NSIG defined, but I am sure there is also
>>> a max. possible value for SIGRTMAX (the default seems to be 48). So, one
>>> could probably safely define NSIG for Solaris too, so that we have NSIG
>>> defined on all Posix platforms.
>>
>> Solaris doesn't have any of this SR_signum related code. A more general
>> cleanup of signal related code would potentially involve a lot of
>> cleanup.
>>
>> David
>> -----
>>
>>>
>>> On Thu, Nov 12, 2015 at 8:24 PM, Lindenmaier, Goetz
>>> <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>> wrote:
>>>
>>> Hi David, Dmitry,
>>>
>>> I've come up with a new webrev:
>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.01/
>>>
>>> I hit on some more issues:
>>> - As proposed, I replaced MAXSIGNUM by NSIG
>>> - On AIX, NSIG=255. Therefore storing bits in a word does not
>>> work.
>>> I'm now using bitset functionality from signal.h as it's done
>>> in other places.
>>> sigset_t is >> NSIG on linux, so it's no good idea to use it
>>> there.
>>>
>>>
>>> Why do we not do this on all platforms, provided sigset_t contains all
>>> signals (incl. realtime signals) ?
>>>
>>> - In the os files I found another bit vector that now is too
>>> small: sigs.
>>> I adapted that, too. Removed the dead declaration of this on
>>> solaris.
>>>
>>> Best regards,
>>> Goetz.
>>>
>>>
>>>
>>> Kind Regards, Thomas
>>>
>>> > -----Original Message-----
>>> > From: Dmitry Samersoff [mailto:dmitry.samersoff at oracle.com
>>> <mailto:dmitry.samersoff at oracle.com>]
>>> > Sent: Donnerstag, 12. November 2015 10:05
>>> > To: Lindenmaier, Goetz; David Holmes; hotspot-runtime-
>>> > dev at openjdk.java.net <mailto:dev at openjdk.java.net>;
>>> serviceability-dev
>>> > Subject: Re: RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM
>>> >
>>> > Goetz,
>>> >
>>> > *BSD including OS X also defines NSIG (just checked) and if my
>>> memory is
>>> > not bogus, AIX defines it too.
>>> >
>>> > So you may consider to use NSIG on all platform.
>>> >
>>> > -Dmitry
>>> >
>>> > On 2015-11-12 11:36, Lindenmaier, Goetz wrote:
>>> > > OK I'll change it to NSIG. That's used in other places in
>>> os_linux, too.
>>> > > So it's really more consistent.
>>> > >
>>> > > Best regards,
>>> > > Goetz
>>> > >
>>> > >> -----Original Message-----
>>> > >> From: Dmitry Samersoff [mailto:dmitry.samersoff at oracle.com
>>> <mailto:dmitry.samersoff at oracle.com>]
>>> > >> Sent: Donnerstag, 12. November 2015 09:22
>>> > >> To: David Holmes; Lindenmaier, Goetz; hotspot-runtime-
>>> > >> dev at openjdk.java.net <mailto:dev at openjdk.java.net>;
>>> serviceability-dev
>>> > >> Subject: Re: RFR(M): 8141529: Fix handling of _JAVA_SR_SIGNUM
>>> > >>
>>> > >> David,
>>> > >>
>>> > >> I think it's better to use NSIG (without underscore) defined
>>> in signal.h
>>> > >>
>>> > >> -Dmitry
>>> > >>
>>> > >>
>>> > >> On 2015-11-12 10:35, David Holmes wrote:
>>> > >>> Hi Goetz,
>>> > >>>
>>> > >>> Adding in serviceability-dev
>>> > >>>
>>> > >>> On 9/11/2015 6:22 PM, Lindenmaier, Goetz wrote:
>>> > >>>> Hi,
>>> > >>>>
>>> > >>>> The environment variable _JAVA_SR_SIGNUM can be set to a
>>> signal
>>> > >> number
>>> > >>>> do be used by the JVM's suspend/resume mechanism.
>>> > >>>>
>>> > >>>> If set, a signal handler is installed and the current signal
>>> handler
>>> > >>>> is saved to an array.
>>> > >>>> On linux, this array had size MAXSIGNUM=32, and
>>> _JAVA_SR_SIGNUM
>>> > >> was
>>> > >>>> allowed
>>> > >>>> to range up to _NSIG=65. This could cause memory corruption.
>>> > >>>>
>>> > >>>> Further, in jsig.c, an unsinged int is used to set a bit for
>>> signals.
>>> > >>>> This also
>>> > >>>> is too small, as only 32 signals can be supported.
>>> Further, the
>>> > >>>> signals are mapped
>>> > >>>> wrong to these bits. '0' is not a valid signal, but '32'
>>> was. 1<<32
>>> > >>>> happens to map to
>>> > >>>> zero, so the signal could be stored, but this probably was
>>> not
>>> > >>>> intended that way.
>>> > >>>>
>>> > >>>> This change increases MAXSIGNUM to 65 on linux, and to 64 on
>>> aix. It
>>> > >>>> introduces
>>> > >>>> proper checking of the signal read from the env var, and
>>> issues a
>>> > >>>> warning if it
>>> > >>>> does not use the signal set. It adapts the data types in
>>> jisig.c
>>> > >>>> properly.
>>> > >>>>
>>> > >>>> Please review this change. I please need a sponsor.
>>> > >>>>
>>> http://cr.openjdk.java.net/~goetz/webrevs/8141529-NSIG/webrev.00
>>> > >>>
>>> > >>> This all sounds very good to me. (I must find out why Solaris
>>> is not
>>> > >>> involved here :) ).
>>> > >>>
>>> > >>> On Linux you didn't add the bounds check to
>>> os::Linux::set_our_sigflags.
>>> > >>>
>>> > >>> I'm also wondering about documenting where we are
>>> determining the
>>> > >>> maximum from? Is it simply _NSIG on some/all distributions?
>>> And I see
>>> > >>> _NSIG is supposed to be the biggest signal number + one. Also
>>> linux
>>> > >>> defines NSIG = _NSIG so which should we be using?
>>> > >>>
>>> > >>> Thanks,
>>> > >>> David
>>> > >>>
>>> > >>>> Best regards,
>>> > >>>> Goetz.
>>> > >>>>
>>> > >>
>>> > >>
>>> > >> --
>>> > >> Dmitry Samersoff
>>> > >> Oracle Java development team, Saint Petersburg, Russia
>>> > >> * I would love to change the world, but they won't give me the
>>> sources.
>>> >
>>> >
>>> > --
>>> > Dmitry Samersoff
>>> > Oracle Java development team, Saint Petersburg, Russia
>>> > * I would love to change the world, but they won't give me the
>>> sources.
>>>
>>>
--
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.
More information about the serviceability-dev
mailing list