RFR(XXS): 8224793: os::die() does not honor CreateCoredumpOnCrash option

Thomas Stüfe thomas.stuefe at gmail.com
Mon Jun 3 19:57:02 UTC 2019


Hi guys,

thanks for the herculean discussion work on this one. I was in vacation and
missed this issue completely.

I am vaguely embarrassed on a personal level by the workarounds in os::die
:)

I like the clarified comment. For a long time it was not clear to me what
role exactly had os::die, os::abort and os::exit. It is good to have this
cleared up somewhat.

The result leaves me a bit unsatisfied though. Because we now seem to have
two level of fatal self-induced process terminations. Some which are
"normal fatal" so we honor -CreateCoreDumpOnCrash, and some which are "very
very fatal", where we disregard the users wish to suppress core file
generation. This seems overly complicated and not very intuitive. In this,
I share Dan's original arguments.

However, I do not want to restart the argument - I understand your side of
it. Just wanted to register my voice.

Cheers, Thomas




On Sat, Jun 1, 2019 at 4:48 AM Daniel D. Daugherty <
daniel.daugherty at oracle.com> wrote:

> On 5/31/19 10:28 PM, David Holmes wrote:
> > Hi Dan, Kim,
> >
> > Glad this was resolved. Code changes look good.
>
> Thanks!
>
>
> > I agree that consolidation into os_posix.cpp should be a separate RFE.
> > I also suspect that BSD comment has not been true for a long time.
>
> I'll file a new RFE (probably next week)...
>
>
> > Windows is as always a special case. None of Window's termination
> > mechanisms "dump core". If you want a minidump on windows then you
> > have to explicitly ask the debugging subsystem to produce one for you.
> > The logic we have for deciding when to do that on Windows is already
> > complex and I would not suggest trying to incorporate that into
> > os::die() - seems a lot of pain for little gain and just more things
> > that could go wrong as we try to terminate.
>
> Yup. We'll leave the Win* version of os::die() alone...
>
> Thanks again to you and Kim for sticking with this review.
>
> Dan
>
>
> >
> > Thanks,
> > David
> > -----
> >
> > On 1/06/2019 6:25 am, Daniel D. Daugherty wrote:
> >> Greetings,
> >>
> >> I've updated the fix based on the code review discussions with Kim and
> >> David H. I think it's easier to review using the full webrev, but if
> >> you're curious exactly how the code evolved, then there is also an
> >> incremental webrev.
> >>
> >> Full webrev:
> >> http://cr.openjdk.java.net/~dcubed/8224793-webrev/2-for-jdk-jdk13.full/
> >>
> >> Incremental webrev:
> >> http://cr.openjdk.java.net/~dcubed/8224793-webrev/2-for-jdk-jdk13.inc/
> >>
> >> I have Mach5 Tier[1-3] testing running right now and it is looking good.
> >>
> >> Dan
> >>
> >>
> >> On 5/29/19 3:29 PM, Daniel D. Daugherty wrote:
> >>> Greetings,
> >>>
> >>> I've updated the fix from calling os::exit(1) to calling
> >>> os::signal_raise(SIGKILL). Here's the updated webrevs:
> >>>
> >>> Full webrev:
> >>>
> http://cr.openjdk.java.net/~dcubed/8224793-webrev/1-for-jdk-jdk13.full/
> >>>
> >>> Incremental webrev:
> >>> http://cr.openjdk.java.net/~dcubed/8224793-webrev/1-for-jdk-jdk13.inc/
> >>>
> >>> I have Mach5 Tier[1-3] testing running right now and it is looking
> >>> good.
> >>> I'll follow with Mach5 Tier[456] testing. Manually running 'java' with
> >>> the options from the test in JDK-8188872 shows this snippet:
> >>>
> >>>> # Problematic frame:
> >>>> # V  [libjvm.dylib+0xc06347] VMError::controlled_crash(int)+0xd5
> >>>> #
> >>>> # CreateCoredumpOnCrash turned off, no core file dumped
> >>>> #
> >>>> # An error report file with more information is saved as:
> >>>> # /work/shared/bug_hunt/jdk_jdk_exp/hs_err_pid79552.log
> >>>> Recording reporting_start_time for TestUnresponsiveErrorHandler.
> >>>> # [ timer expired, abort... ]
> >>>> Killed: 9
> >>>>
> >>>> $ echo $?
> >>>> 137
> >>>
> >>> The "Killed: 9" is a _nice_ addition that shows we had an abnormal
> >>> termination. The exit_code == 137 is also way better than my
> >>> original exit_code == 1.
> >>>
> >>> David, Kim and I are still hashing out whether
> >>> -XX:-CreateCoredumpOnCrash
> >>> takes precedence or not.
> >>>
> >>> Dan
> >>>
> >>>
> >>>
> >>> On 5/28/19 7:37 PM, Daniel D. Daugherty wrote:
> >>>> Resending with the bug's JBS link included...
> >>>>
> >>>> Greetings,
> >>>>
> >>>> While working on another bug (JDK-8188872), I happened to notice that
> >>>> the test was generating core files even though the test is run with
> >>>> the '-XX:+CreateCoredumpOnCrash' option.
> >>>>
> >>>> After investigating, I discovered that the os::die() function does not
> >>>> honor the CreateCoredumpOnCrash option which was a surprise to me.
> >>>>
> >>>> Webrev URL:
> >>>> http://cr.openjdk.java.net/~dcubed/8224793-webrev/0-for-jdk-jdk13/
> >>>>
> >>>> Bug: JDK-8224793 os::die() does not honor CreateCoredumpOnCrash option
> >>>>      https://bugs.openjdk.java.net/browse/JDK-8224793
> >>>>
> >>>> Testing: Mach5 Tier[1-5]
> >>>>          Included the fix in my latest round of 8153224 testing
> >>>>          on Solaris-X64 where this bug reproduces quite a bit.
> >>>>
> >>>> Thanks, in advance, for any comments, suggestions, or questions.
> >>>>
> >>>> Dan
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
>
>


More information about the hotspot-runtime-dev mailing list