RFR: 8250637: UseOSErrorReporting times out (on Mac and Linux) [v3]

Thomas Stuefe stuefe at openjdk.java.net
Wed Oct 28 17:16:47 UTC 2020


On Wed, 28 Oct 2020 16:52:01 GMT, Gerard Ziemski <gziemski at openjdk.org> wrote:

>> hi all,
>> 
>> Please review this simple fix for POSIX platforms, which addresses a time out that occurs while handling a crash with UseOSErrorReporting turned ON.
>> 
>> It appears that "UseOSErrorReporting" flag was only ever meant to be used on Windows platform and was mistakenly left available for other platforms. In this fix we make sure to only use the flag on Windows platform and make it a NOP for other platforms.
>> 
>> Note #1: A similar hang issue occurs today even on Windows, with the only difference being that before a process times out (takes 2 minutes) it runs out of stack space in about 250 loops, so that's the only reason it doesn't linger for that long. Windows issue is tracked separately by https://bugs.openjdk.java.net/browse/JDK-8250782
>> 
>> Note #2: Creating native crash log (on macOS) is a non-trivial, research wise effort, that is tracked by https://bugs.openjdk.java.net/browse/JDK-8237727
>> 
>> Note #3 Removal of the "UseOSErrorReporting" flag will be depended on whether we can do #2 and at that time we can decide whether to keep it and implement it for other platforms or whether to remove it, provided that #2 can not be done reliably.
>
> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision:
> 
>   make UseOSErrorReporting flag Windows only

HI Gerard,

The patch is fine in its current form to me (including your last push). Whether or not to do a CSR I leave up to you and David.

As my final remark to our "return from signal handler" discussion: I'd probably be more chill if this were a simple application. Like vi :) But we do so many unusual things (including generating, then running our own code) and the VM is the base for such a large software stack that I rather be careful.

All my remaining remarks are nits. Take what you like, ignore the rest.

Thank you,

Thomas

src/hotspot/share/utilities/vmError.cpp line 1437:

> 1435:   } else {
> 1436: #if defined(_WINDOWS)
> 1437:     // If UseOsErrorReporting we call this for each level of the call stack

Could you please change this comment to refer to UseOSErrorReporting? (Note the capital s). Makes it easier to grep for it. Same goes for os_windows.cpp:2357 .

src/hotspot/share/utilities/vmError.cpp line 1631:

> 1629:   }
> 1630: 
> 1631: #if defined(_WINDOWS)

If you like you could abbreviate this Hunk with something like
if (WINDOWS_ONLY(!UseOsErrorReporting) NOT_WINDOWS(true)) {
but this is fine too, I leave it up to you.

src/hotspot/share/utilities/vmError.cpp line 1440:

> 1438:     // while searching for the exception handler.  Only the first level needs
> 1439:     // to be reported.
> 1440:     if (UseOSErrorReporting && log_done) return;

This has nothing to do with you patch, which is fine:

But the more I look at this line the more confused I get. I am not sure what the point is. log_done means we have written the hs-err file successfully and got a signal after the call to VMError::report() but before returning from this function resp. before calling abort.

That covers a whole section between lines 1545 and 1629, I am surprised how much we do there. I am almost certain some things will not behave well when secondary crashes happen and we re-enter this function, e.g.:
1556  JFR_ONLY(Jfr::on_vm_shutdown(true);)
1557
1558  if (PrintNMTStatistics) {
1559    fdStream fds(fd_out);
1560    MemTracker::final_report(&fds);
1561  }
both should be guarded against re-entering when this function is called repeatedly and they have had their song-and-dance already. Otherwise e.g. we may see the NMT output twice if a signal occurs after line 1560.

All idle musings, potentially a future cleanup.

-------------

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/813


More information about the hotspot-dev mailing list