RFR(XXS): 8224793: os::die() does not honor CreateCoredumpOnCrash option

Daniel D. Daugherty daniel.daugherty at oracle.com
Wed Jun 5 17:55:09 UTC 2019


Thomas,

Thanks for chiming in on this review thread.

Dan


On 6/3/19 3:57 PM, Thomas Stüfe wrote:
> Hi guys,
>
> thanks for the herculean discussion work on this one. I was in 
> vacation and missed this issue completely.
>
> I am vaguely embarrassed on a personal level by the workarounds in 
> os::die :)
>
> I like the clarified comment. For a long time it was not clear to me 
> what role exactly had os::die, os::abort and os::exit. It is good to 
> have this cleared up somewhat.
>
> The result leaves me a bit unsatisfied though. Because we now seem to 
> have two level of fatal self-induced process terminations. Some which 
> are "normal fatal" so we honor -CreateCoreDumpOnCrash, and some which 
> are "very very fatal", where we disregard the users wish to suppress 
> core file generation. This seems overly complicated and not very 
> intuitive. In this, I share Dan's original arguments.
>
> However, I do not want to restart the argument - I understand your 
> side of it. Just wanted to register my voice.
>
> Cheers, Thomas
>
>
>
>
> On Sat, Jun 1, 2019 at 4:48 AM Daniel D. Daugherty 
> <daniel.daugherty at oracle.com <mailto:daniel.daugherty at oracle.com>> wrote:
>
>     On 5/31/19 10:28 PM, David Holmes wrote:
>     > Hi Dan, Kim,
>     >
>     > Glad this was resolved. Code changes look good.
>
>     Thanks!
>
>
>     > I agree that consolidation into os_posix.cpp should be a
>     separate RFE.
>     > I also suspect that BSD comment has not been true for a long time.
>
>     I'll file a new RFE (probably next week)...
>
>
>     > Windows is as always a special case. None of Window's termination
>     > mechanisms "dump core". If you want a minidump on windows then you
>     > have to explicitly ask the debugging subsystem to produce one
>     for you.
>     > The logic we have for deciding when to do that on Windows is
>     already
>     > complex and I would not suggest trying to incorporate that into
>     > os::die() - seems a lot of pain for little gain and just more
>     things
>     > that could go wrong as we try to terminate.
>
>     Yup. We'll leave the Win* version of os::die() alone...
>
>     Thanks again to you and Kim for sticking with this review.
>
>     Dan
>
>
>     >
>     > Thanks,
>     > David
>     > -----
>     >
>     > On 1/06/2019 6:25 am, Daniel D. Daugherty wrote:
>     >> Greetings,
>     >>
>     >> I've updated the fix based on the code review discussions with
>     Kim and
>     >> David H. I think it's easier to review using the full webrev,
>     but if
>     >> you're curious exactly how the code evolved, then there is also an
>     >> incremental webrev.
>     >>
>     >> Full webrev:
>     >>
>     http://cr.openjdk.java.net/~dcubed/8224793-webrev/2-for-jdk-jdk13.full/
>     >>
>     >> Incremental webrev:
>     >>
>     http://cr.openjdk.java.net/~dcubed/8224793-webrev/2-for-jdk-jdk13.inc/
>     >>
>     >> I have Mach5 Tier[1-3] testing running right now and it is
>     looking good.
>     >>
>     >> Dan
>     >>
>     >>
>     >> On 5/29/19 3:29 PM, Daniel D. Daugherty wrote:
>     >>> Greetings,
>     >>>
>     >>> I've updated the fix from calling os::exit(1) to calling
>     >>> os::signal_raise(SIGKILL). Here's the updated webrevs:
>     >>>
>     >>> Full webrev:
>     >>>
>     http://cr.openjdk.java.net/~dcubed/8224793-webrev/1-for-jdk-jdk13.full/
>     >>>
>     >>> Incremental webrev:
>     >>>
>     http://cr.openjdk.java.net/~dcubed/8224793-webrev/1-for-jdk-jdk13.inc/
>     >>>
>     >>> I have Mach5 Tier[1-3] testing running right now and it is
>     looking
>     >>> good.
>     >>> I'll follow with Mach5 Tier[456] testing. Manually running
>     'java' with
>     >>> the options from the test in JDK-8188872 shows this snippet:
>     >>>
>     >>>> # Problematic frame:
>     >>>> # V  [libjvm.dylib+0xc06347] VMError::controlled_crash(int)+0xd5
>     >>>> #
>     >>>> # CreateCoredumpOnCrash turned off, no core file dumped
>     >>>> #
>     >>>> # An error report file with more information is saved as:
>     >>>> # /work/shared/bug_hunt/jdk_jdk_exp/hs_err_pid79552.log
>     >>>> Recording reporting_start_time for TestUnresponsiveErrorHandler.
>     >>>> # [ timer expired, abort... ]
>     >>>> Killed: 9
>     >>>>
>     >>>> $ echo $?
>     >>>> 137
>     >>>
>     >>> The "Killed: 9" is a _nice_ addition that shows we had an abnormal
>     >>> termination. The exit_code == 137 is also way better than my
>     >>> original exit_code == 1.
>     >>>
>     >>> David, Kim and I are still hashing out whether
>     >>> -XX:-CreateCoredumpOnCrash
>     >>> takes precedence or not.
>     >>>
>     >>> Dan
>     >>>
>     >>>
>     >>>
>     >>> On 5/28/19 7:37 PM, Daniel D. Daugherty wrote:
>     >>>> Resending with the bug's JBS link included...
>     >>>>
>     >>>> Greetings,
>     >>>>
>     >>>> While working on another bug (JDK-8188872), I happened to
>     notice that
>     >>>> the test was generating core files even though the test is
>     run with
>     >>>> the '-XX:+CreateCoredumpOnCrash' option.
>     >>>>
>     >>>> After investigating, I discovered that the os::die() function
>     does not
>     >>>> honor the CreateCoredumpOnCrash option which was a surprise
>     to me.
>     >>>>
>     >>>> Webrev URL:
>     >>>>
>     http://cr.openjdk.java.net/~dcubed/8224793-webrev/0-for-jdk-jdk13/
>     >>>>
>     >>>> Bug: JDK-8224793 os::die() does not honor
>     CreateCoredumpOnCrash option
>     >>>> https://bugs.openjdk.java.net/browse/JDK-8224793
>     >>>>
>     >>>> Testing: Mach5 Tier[1-5]
>     >>>>          Included the fix in my latest round of 8153224 testing
>     >>>>          on Solaris-X64 where this bug reproduces quite a bit.
>     >>>>
>     >>>> Thanks, in advance, for any comments, suggestions, or questions.
>     >>>>
>     >>>> Dan
>     >>>>
>     >>>>
>     >>>>
>     >>>
>     >>>
>     >>
>



More information about the hotspot-runtime-dev mailing list