Misbehaving exit status from Hotspot

David Holmes david.holmes at oracle.com
Thu Jun 28 05:24:02 UTC 2018


On 28/06/2018 1:33 AM, Charles Oliver Nutter wrote:
> Oops, in my editing of the post I lost the link to sources. Perhaps this 
> will illustrate the problem I'm talking about a bit better!
> 
> https://gist.github.com/headius/b87bc50b488fd73e753cbc518550ae5f

Okay so the test outputs:

$ ./sigtest `which java` Loop
pid: 22136
status: 36608
exited: 1, stop signal: 143, term signal: 0, exit status: 143

which comes from the following code:

   waitpid(pid, &status, 0);
   printf("pid: %d\n", pid);
   printf("status: %d\n", status);
   printf("exited: %d, stop signal: %d, term signal: %d, exit status: %d\n",
WIFEXITED(status), WSTOPSIG(status), WTERMSIG(status), WEXITSTATUS(status));

WIFEXITED is defined as:

"Evaluates to a non-zero value if status was returned for a child 
process that terminated normally."

So this value is one because the JVM process did exit "normally" - it 
called exit(143);

WSTOPSIG is defined as:

"If the value of WIFSTOPPED(stat_val) is non-zero, this macro evaluates 
to the number of the signal that caused the child process to stop."

But we haven't checked WIFSTOPPED (and the process is terminated not 
stopped) so this is "garbage".

WTERMSIG is defined as:

"If the value of WIFSIGNALED(stat_val) is non-zero, this macro evaluates 
to the number of the signal that caused the termination of the child 
process."

You haven't checked WIFSIGNALED but it will be zero as the process was 
not terminated by an _uncaught signal_. So the value zero here is fine, 
but could be anything given WIFSIGNALED will be zero.

WEXITSTATUS is defined as:

"If the value of WIFEXITED(stat_val) is non-zero, this macro evaluates 
to the low-order 8 bits of the status argument that the child process 
passed to _exit() or exit(), or the value the child process returned 
from main()."

The JVM called exit(143) so we expect to get 143 and that's exactly what 
we do get.

That leaves the "status: 36608" but that's fine too - it's an encoding 
of information for the other macros to parse.

36608 = 0x8F00
       = 0x8F and 0x00
       = 143 and 0

so we have encoded the termination mode in the lower 8-bits - which is 
zero as the process terminated normally (actually those bits encode the 
signal that caused termination - hence zero means no signal ie normal 
termination). And the upper 8 bits is the exit code of 143.

Everything working exactly as it should.

The flaw with your thinking here is that sending a signal to tell the VM 
to terminate should behave as-if the VM received (and terminated due to) 
an uncaught signal. It doesn't - nor should it.

David
-----

> - Charlie
> 
> On Tue, Jun 26, 2018, 21:45 David Holmes <david.holmes at oracle.com 
> <mailto:david.holmes at oracle.com>> wrote:
> 
>     Hi Charlie,
> 
>     I don't know if you tried to attach your test programs but attachments
>     get stripped. So just based on the descriptions ...
> 
>     On 27/06/2018 5:48 AM, Charles Oliver Nutter wrote:
>      > A bit more background and info...
>      >
>      > This investigation was spawned by a JRuby bug:
>      > https://github.com/jruby/jruby/issues/5224
>      >
>      > Zhengyu Gu pointed out that -XX:+ReduceSignalUsage allows my
>     gisted example
>      > to work as expected.
>      >
>      > $ ./sigtest `which java`  -XX:+ReduceSignalUsage Loop
>      > pid: 28705
>      > status: 15
>      > exited: 0, stop signal: 0, term signal: 15, exit status: 0
>      >
>      > That isn't too surprising to me, but it's also undesirable to
>     have to pass
>      > this flag. Shouldn't Hotspot's shutdown handler be propagating
>     SIGTERM to
>      > the system-default handler as its final step? It seems like
>     that's the
>      > missing piece here, if I'm reading the tea leaves correctly.
> 
>     We (the VM) only chain user-handlers for signals. We never call the
>     default handler to effect an abort - we can't as signal handling is
>     asynchronous: we just notify the signal thread that a signal was raised
>     and it then dispatches it to the Java level. If the signal is
>     unexpected/unhandled and leading to a crash then we generate the hs_err
>     file and explicitly call either abort() or exit(1) depending on the
>     desire for a core file.
> 
>     SIGTERM is a termination signal for the JVM (SHUTDOWN2_SIGNAL - unless
>     using -Xrs). It performs an orderly shutdown of the VM and exits. This
>     is setup in:
> 
>     src/java.base/windows/classes/java/lang/Terminator.java
> 
>     It will cause execution of:
> 
>        Shutdown.exit(sig.getNumber() + 0200);
> 
>     which performs the orderly shutdown (i.e it causes shutdown hooks to
>     run
>     to completion) and should then exit with that exit code. And the exit
>     code is the expected SIG+128.
> 
>      > Interestingly, just setting this flag and running JRuby is not
>     enough to
>      > "fix" our bug...I also need to add a downcall to raise(3) as part of
>      > termination.
>      >
>      > Bottom line for me: Hotspot is not being a good actor wrt exit
>     statuses and
>      > signal handling, and it should at *least* produce valid exit
>     conditions for
>      > the process when terminated prematurely by a signal.
> 
>     Well it's not hotspot at fault if there is a "fault" here - unless
>     there's some bug with the eventual process exit logic regarding the
>     exit
>     code. The signal has a handler installed and we invoke that handler
>     which delegates to the Java shutdown logic as described. Hotspot
>     doesn't
>     try to second-guess what should happen after that.
> 
>       From your Ruby bug the issue seems to be that child processes are not
>     being informed about the parent termination correctly. I can't really
>     speak to that. Your quote from the libc manual is interesting:
> 
>         /* Now reraise the signal.  We reactivate the signal’s
>            default handling, which is to terminate the process.
>            We could just call exit or abort,
>            but reraising the signal sets the return status
>            from the process correctly. */
> 
>     as it implies that exit/abort are broken if they don't set the process
>     return status correctly! But it's not relevant to the regular SIGTERM
>     case as we are not exiting from within a signal handler.
> 
>     AFAICS things work as expected:
> 
>        > java Sleep &
>     [1] 18175
>        > kill -TERM 18175
>     [1]+  Exit 143                java Sleep
> 
>     Cheers,
>     David
>     -----
> 
>      > - Charlie
>      >
>      > On Tue, Jun 26, 2018 at 2:05 PM, Charles Oliver Nutter
>     <headius at headius.com <mailto:headius at headius.com>>
>      > wrote:
>      >
>      >> (mods: previous version of this was sent without subscription
>     complete;
>      >> disregard)
>      >>
>      >> Hello all!
>      >>
>      >> I've been struggling to fix some signal-handling issues in JRuby
>     and I've
>      >> come to the determination that Hotspot is not being a good actor
>     as far as
>      >> signals and exit statuses go.
>      >>
>      >> I've put together some C and Java code to demonstrate the
>     problem. I could
>      >> have flaws in my understanding of signal handling and exit statuses.
>      >>
>      >> The test program just spawns a given command, waits for
>     termination, and
>      >> uses the wait(2) W macros to parse out the process exit states.
>      >>
>      >> "loop.c" just loops.
>      >> "loop2.c" has the loop but also installs a TERM signal handler,
>     closer in
>      >> behavior to shutdown hooks in Ruby and JVM.
>      >> "Loop.java" just loops.
>      >>
>      >> The first two produce the expected results...
>      >>
>      >> $ ./sigtest `pwd`/loop
>      >> pid: 22130
>      >> status: 15
>      >> exited: 0, stop signal: 0, term signal: 15, exit status: 0
>      >>
>      >> $ ./sigtest `pwd`/loop2
>      >> term received
>      >> pid: 22173
>      >> status: 15
>      >> exited: 0, stop signal: 0, term signal: 15, exit status: 0
>      >>
>      >> Java produces nonsense results...
>      >>
>      >> $ ./sigtest `which java` Loop
>      >> pid: 22136
>      >> status: 36608
>      >> exited: 1, stop signal: 143, term signal: 0, exit status: 143
>      >>
>      >> I have tried various combinations of using the sun.misc.Signal
>     stuff,
>      >> doing a native downcall to raise(3), and so on...but nothing helps,
>      >> probably because the Hotspot's own TERM handler is swallowing or
>     otherwise
>      >> mutilating the exit status.
>      >>
>      >> We do get bug reports about JRuby's exit statuses not making any
>     sense. I
>      >> assumed it was our fault until this week.
>      >>
>      >> Help?
>      >>
>      >> - Charlie
>      >>
>      >>
> 


More information about the hotspot-runtime-dev mailing list