RFR(XXS): 8224793: os::die() does not honor CreateCoredumpOnCrash option

Daniel D. Daugherty daniel.daugherty at oracle.com
Wed May 29 17:50:31 UTC 2019


On 5/29/19 1:19 PM, Kim Barrett wrote:
>> On May 29, 2019, at 10:27 AM, Daniel D. Daugherty <daniel.daugherty at oracle.com> wrote:
>> On 5/28/19 9:03 PM, Kim Barrett wrote:
>>> And looking at this some more, I think I agree with David's comment in
>>> the CR, that os::die *should* always dump core.  It gets called when
>>> things have gone horribly wrong, and a core dump might be the only way
>>> to understand what went wrong.
>> I understand where you and David are coming from. […]
>>
>> To run a test with '-XX:-CreateCoredumpOnCrash' and then see a core
>> file in the execution directory violates the principle of least
>> astonishment. From the test author's POV, they did what they are
>> supposed to do (passing the '-XX:-CreateCoredumpOnCrash' option)
>> and the result is unexpected.
> The position that David and I are taking is that the unexpected core
> dump is not a problem, and is indeed a good thing, because the test
> apparently ran off into the weeds.

The test in JDK-8188872 didn't run off into the weeds. It is testing
exactly what it is intending to test:

   - If an error handling step times out, we get a message to tell us
     that the step timed out. We make sure we get at least two of those.
   - If error handling reaches the global timeout, then the VM should
     have been aborted by the WatcherThread. That's the intentional
     os::die() call...

and I'm asserting that the test author knows what he/she is doing when
they specify the '-XX:-CreateCoredumpOnCrash' option. Please explain
why you think it is a good thing to ignore what the test writer asked
for and dump core when '-XX:-CreateCoredumpOnCrash' option is specified?


>>> Maybe there are paths to os::die that ought to be calling os::abort
>>> instead?
>> Maybe, but not really the problem that I'm trying to solve today. :-)
> That presumes that the problem is the behavior of os::die rather than
> who is calling it.  David and I are suggesting the behavior of os::die
> is fine, and the reason for the unexpected core dump from the test is
> that os::die is being called when (perhaps) it shouldn't be.  Or maybe
> the test really is going off into the weeds and a call to os::die is
> appropriate and the core dump might contain useful information.

I believe in this case, that the author of the step timeout code,
Thomas Stüfe, is calling os::die() intentionally. As noted above, the
test is not off in the weeds.

You edited out part of my reply where I said:

> 1) os::die() doesn't say _anything_ about generating a core file:
>
>    src/hotspot/share/runtime/os.hpp:
>
>      // Die immediately, no exit hook, no abort hook, no cleanup.
>      static void die(); 

and you didn't address my assertion that there is no contract for
os::die() to produce a core file. Remember that I'm also a user of
os::die() when I want a core file... So it surprised me that there's
no contract...

>>> I think os::abort probably should not be calling ::exit either, but
>>> should be raising SIGKILL if no core dump is requested.  os::exit is
>>> the path to exit functions and the like.
>> Agreed (or maybe not). If I'm in os::abort() and I call '::exit()',
>> am I calling 'os::exit()' or am I calling libc's exit()? I thought
>> I was calling libc's exit(), but I could be wrong…
> ::exit() is libc's exit(), which calls atexit handlers.
>
> Also, ::_exit() or _Exit() (the latter is in <stdlib.h>) might be
> alternatives to using SIGKILL.

I'm testing SIGKILL right now... :-)


>
>>> It's unfortunate that the Linux version of os::abort needs to do
>>> something special for DumpPrivateMappingsInCore; without that, we
>>> could have a merged posix version of os::abort.
>> I think that's a fairly recent addition to the Linux version of
>> os::abort(). It could probably be handled by a posix version of
>> os::abort() that calls a PD version of os::abort() as needed or
>> something like that.
>>
>> Again, not the problem that I'm trying to solve today.
> Agreed.
>

Thanks for continuing the discussion...

Dan



More information about the hotspot-runtime-dev mailing list