RFR (S): 8237777 "Dumping core ..." is shown despite claiming that "# No core dump will be written."
gerard ziemski
gerard.ziemski at oracle.com
Thu Apr 23 15:31:06 UTC 2020
Thank you David for more feedback.
On 4/22/20 6:21 PM, David Holmes wrote:
> On 23/04/2020 3:10 am, gerard ziemski wrote:
>> hi David,
>>
>> Thank you very much for your review!
>>
>>> Hi Gerard,
>>>
>>> On 21/04/2020 4:33 am, gerard ziemski wrote:
>>>> hi all,
>>>>
>>>> Please review this fix, where we use the knowledge available to us,
>>>> like whether the ulimit is set, before deciding whether to tell the
>>>> user in hs_err crash log that we dumped the core file, to match
>>>> what actually happens.
>>>>
>>>> Before this fix, we often assumed that the core file would be
>>>> created, when in actuality that might not be the case.
>>> General approach seems fine but some comments below ...
>>>
>>>> I also extend an existing test to go with this fix.
>>>>
>>>> bug:https://bugs.openjdk.java.net/browse/JDK-8237777
>>>> webrev:http://cr.openjdk.java.net/~gziemski/8237777_rev1
>>>> tests: tested with Mach5 tier-hs1,2,3,4,5
>>> I'm not quite following all of the changes in relation to what gets
>>> printed:
>>>
>>> src/hotspot/os/linux/os_linux.cpp
>>>
>>> - "\"%s\" (or dumping to %s/core.%d)",
>>> - &core_pattern[1], p, current_process_id());
>>> + "\"%s\", or dumping to %s/core",
>>> + &core_pattern[1], p);
>>>
>>> why did you change the pattern? I get core files of the form
>>> core.<pid> on my linux box.
>>
>> I removed it because I thought it was better to be conservative and
>> print the core file name at its minimum - on my Ubuntu OS, for
>> example, the PID is not added to the filename and the OS only uses
>> "core".
>>
>> I added a comment explaining the assumption here.
>
> The existing code shows the user there are two possible locations for
> the core file - addressing both your case and mine. There is no reason
> for that to be changed.
It's not directly related to my change, so even though I don't like the
way it is and I do like Yasumasa suggestion, even more than how I tried
to fix it, I will leave it alone.
>>
>>> src/hotspot/os/posix/os_posix.cpp
>>>
>>> Why did you move the ifdef linux case? What if I don't have infinite
>>> stack limit?
>>
>> That was one of the main issues here. We would print the message
>> about the core file before checking the getrlimit(RLIMIT_CORE), so it
>> would get printed even for the "case 0 stack limit", when no core
>> file gets created.
>
> It stated "may". If core dumps are disabled you should see:
>
> Core dumps may be processed with xxx
> Core dumps have been disabled. To enable core dumping, try \"ulimit -c
> unlimited\" before starting Java again
>
> It's not perfect but I don't see any real problem with that - and it
> avoided duplicating the message line. What you have now only reports
> the first message in the RLIM_INFINITY case and skips the case where
> its non-infinite and non-zero. So potentially useful information has
> been lost.
>
>> Having said that, I do not like that message to begin with for 2
>> reasons:
>>
>> #1 it's awkwardly phrased, i.e. "Core dump will be written. Default
>> location: Core dumps may be processed with "/usr/share/apport/apport
>> %p %s %c %d %P %E", or dumping to /home/gerard/Desktop/core.13857"
>>
>> #2 on Ubuntu "apport" (which is the program that processes the core
>> dump) is not enabled by default, so the message is noisy.
>
> It's awkward because this core processing mechanism is awkward. And
> yes broken Linux distributions do actually set the core pattern to a
> program that may not even be installed! That's not our problem. This
> was all covered in depth when this code was put in.
>
>> Not having that message in the "case default" is therefore not a big
>> deal, however, for consistency I can either add it to both or remove
>> it from both cases. Or leave it as in my fix.
>
> It needs to be kept as-is in existing code or reported in both places.
> To avoid code duplication I'd keep as-is.
>
> These message changes are not directly addressing the core (excuse the
> pun) issue of this bug, which is simply that the dump_core argument to
> os::abort() was not taking into account whether we could actually dump
> core. Your fix addresses that nicely.
The current algorithm here is:
#1 check core path
#2 if no core path, then print "no core might exist", returns TRUE
#3 if LINUX && special core path, print "core my be processed",
returns TRUE (!!!!)
#4 else check rlimit
#5 if rlimit returned error, print "core might not exist", returns TRUE
#6 if rlimit==infinite, print "core file location", returns TRUE
#7 if rlimit==0, print "no core file", returns FALSE
#8 if rlimit<infinite, warn about "ensuring full core dump", returns
TRUE
We can either get #2, #3, #5, #6, #7 or #8, but no combinations of any
of them.
The issue here is that as long as #3 succeeds we don't even get to #7,
which is the key here - the fix is in fact incomplete unless we test for
rlimit.
I think the algorithm we need should be something like:
#1 check rlimit
#2 if rlimit returned error, print "core might not exist", returns TRUE
#3 if rlimit==0, print "no core & set rlimit" warning, returns FALSE
#4 if rlimit<infinite add warning about "ensuring full core dump"
(does NOT return yet)
#5 check core path
#6 if no core path, print "no core might exist", returns TRUE
#7 if NOT LINUX, print "core file location", returns TRUE
#8 if LINUX, print print "core my be processed", returns TRUE
The main point here is that rlimit==0 check must be handled early
enough, so that we don't end up saying we will dump the core, when we
might be unable to do so, regardless of what other extra mechanism say
about the pattern of the core file name.
Before I go off and code this, however, I'd like a consensus.
cheers
More information about the hotspot-runtime-dev
mailing list