RFR(XXS): 8224793: os::die() does not honor CreateCoredumpOnCrash option

Daniel D. Daugherty daniel.daugherty at oracle.com
Wed May 29 14:27:52 UTC 2019


Kim,

Thanks for the quick review! I see you sent two mesgs. I'm going to
reply to them separately to maintain context...


On 5/28/19 9:03 PM, Kim Barrett wrote:
>
>> On May 28, 2019, at 8:27 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>> Calling os::exit here seems wrong.  We may have already tried calling
>> os::exit and gotten here because of problems therein.  And the doc
>> comment for os::die says "no exit hook, no abort hook, no cleanup."
>>
>> SIGABRT (signaled by ::abort) has a default action of Core, which is
>> the reason for the current behavior.  (And ::abort will invoke that
>> action if the installed action returns rather than terminating the
>> process.)  If no core is wanted, SIGKILL has a default action of
>> Terminate, and that behavior can't be replaced.  So issuing a SIGKILL
>> when -CreateCoredumpOnCrash seems like it should get the desired
>> behavior.
>>
>> These various implementations of os::die() look like they could be
>> merged into os_posix.cpp.
>>
> And looking at this some more, I think I agree with David's comment in
> the CR, that os::die *should* always dump core.  It gets called when
> things have gone horribly wrong, and a core dump might be the only way
> to understand what went wrong.

I understand where you and David are coming from. I was there myself
months ago when I first started investigating JDK-8188872 (the bug that
motivated me to file this bug). From my JVM engineer POV, I thought:

     When I call os::die(), I want a core file.

And I kept thinking about it every so often when I returned to poke at
JDK-8188872 a little more... (That's the advantage/disadvantage of
having a bug that only repros on Solaris machines... and you're one of
the few engineers with a Solaris machine...)

I eventually realized two things:

1) os::die() doesn't say _anything_ about generating a core file:

    src/hotspot/share/runtime/os.hpp:

      // Die immediately, no exit hook, no abort hook, no cleanup.
      static void die();

2) I was looking at this from the JVM engineer's POV and not from
    the test author's POV.

The '-XX:-CreateCoredumpOnCrash' option is used by a test author to
tell the JVM that the test does not care about core files. Typically
that means that the test is interested in some other artifact like
the hs_err_pid file or something on stdout or stderr.

To run a test with '-XX:-CreateCoredumpOnCrash' and then see a core
file in the execution directory violates the principle of least
astonishment. From the test author's POV, they did what they are
supposed to do (passing the '-XX:-CreateCoredumpOnCrash' option)
and the result is unexpected.


> Maybe there are paths to os::die that ought to be calling os::abort
> instead?

Maybe, but not really the problem that I'm trying to solve today. :-)


> I think os::abort probably should not be calling ::exit either, but
> should be raising SIGKILL if no core dump is requested.  os::exit is
> the path to exit functions and the like.

Agreed (or maybe not). If I'm in os::abort() and I call '::exit()',
am I calling 'os::exit()' or am I calling libc's exit()? I thought
I was calling libc's exit(), but I could be wrong...


> It's unfortunate that the Linux version of os::abort needs to do
> something special for DumpPrivateMappingsInCore; without that, we
> could have a merged posix version of os::abort.

I think that's a fairly recent addition to the Linux version of
os::abort(). It could probably be handled by a posix version of
os::abort() that calls a PD version of os::abort() as needed or
something like that.

Again, not the problem that I'm trying to solve today.

Thanks again for the fast review(s)!

Dan


More information about the hotspot-runtime-dev mailing list