RFR(XXS): 8224793: os::die() does not honor CreateCoredumpOnCrash option

Fri May 31 15:12:44 UTC 2019

Hi Kim (and David)!

I am really glad that I spun this XXS change off into its own bug
away from JDK-8188872. Can you imagine trying to tease out this
part of the discussion from the complexity that is JDK-8188872?

Onward... I've read all three replies (2 from Kim and 1 from David)
and I think I finally see what you guys are talking about (and why).
Thanks for taking the time to continue the conversation...

Only some scattered comments below...

On 5/30/19 9:18 PM, Kim Barrett wrote:
>> On May 30, 2019, at 11:31 AM, Daniel D. Daugherty <daniel.daugherty at oracle.com> wrote:
>>
>> Hi David,
>>
>> On 5/29/19 11:05 PM, David Holmes wrote:
>>> Because we should not be changing the behaviour of os::die() because of what one very specific test that intends to end up calling die() desires.
>> Hmmmm... I'm proposing to change the behavior of os::die() because
>> we have an option that says this:
>>
>> src/hotspot/share/runtime/globals.hpp:
>>
>>>    product(bool, CreateCoredumpOnCrash, true,                                \
>>>            "Create core/mini dump on VM fatal error")                        \
>> so when we turn that option off, I'm asserting that we should not
>> create a core dump on a VM fatal error. I happened to run into the
>> issue with my work on JDK-8188872 and its specific test. The
>> messenger in this case is JDK-8188872. Don't shoot the messenger.
> [An overall, rather than point-by-point, response, so I'm not quoting
> most of the message I'm replying to.]
>
> I think David and I are viewing CreateCoredumpOnCrash as a flag for
> controlling the behavior of os::abort. That's why David mentioned its
> present usage by callers to that function. Although I see a number of
> callers that aren't using it and either defaulting or explicitly
> requesting a core dump, irrespective of that flag. There are also
> places that os::abort unconditionally without a core dump that one
> might wish could be asked for one (e.g. most, but not all, callers of
> vm_abort).

It's that hit-and-miss use of CreateCoredumpOnCrash in os::abort()
calls that made it seem okay to use it in os::die(). Something else
that should be clean up.

> But maybe those cases aren't "VM fatal errors"; maybe only code paths
> that go through VMError count as such. That's the only way I can make
> sense of the non-use of the flag by many callers of os::abort. I'm not
> sure I like that viewpoint, but that seems to be what's going on.

Hmmm...it's a plausible guess. Could be made more clear...

> I think TimeoutInErrorHandlingTest.java (the failing test from
> JDK-8188872) is unusual, in that it is carefully directing the VM down
> an ordinarily very weedy path that shouldn't be trodden even by some
> problems in "normal" crashes.

Yes, TimeoutInErrorHandlingTest.java is unusual.

> I suspect most or all (though I only looked at a few) of the other
> tests that specify -CreateCoredumpOnCrash are indeed attempting to
> control the behavior of os::abort when called from VMError.  That
> makes them different from TimeoutInErrorHandlingTest.

Good point (and I think I'm grok'ing it now).

> TimeoutInErrorHandlingTest makes use of a bespoke option to trigger
> the desired behavior, -XX:+TestUnresponsiveErrorHandler. A completely
> different way to achieve the effect you are trying for (having this
> test not generate a core dump) would be to have the handling of that
> option (either at argument processing time, or when VMError hits it)
> use setrlimit to disable core dumps for the current process. Though we
> don't appear to currently have os:: versions of the rlimit functions,
> so there's a bit of work to be done there.

Doing something special for TimeoutInErrorHandlingTest via the
-XX:+TestUnresponsiveErrorHandler is the right way forward.

> (Another test that might be similar is SecondaryErrorTest.java, but I
> think it expects to just get a recursive error that ultimately still
> ends in os::abort rather than os::die.)

Could be. I don't think I've see that test fail in my lab (thank god).

Thanks for continuing the conversation.

Dan