RFR(s) (JDK10): 8166944: Hanging Error Reporting steps may lead to torn error logs.
Chris Plummer
chris.plummer at oracle.com
Thu Feb 2 23:11:49 UTC 2017
Hi Thomas,
Mostly looks good. A few minor comments:
Most files need 2017 copyright updates.
set_to_now() and get_timestamp() should have names that tie them
together better. Maybe set_step_start_time() and get_step_start_time().
Also, they should probably be methods of VMError and the statics should
all be fields.
In the first review I had mentioned that you only are allowing for one
step to hang, and still be able to continue with subsequent steps:
1509 // A step times out after half of the total timeout. Steps are
mostly fast unless they
1510 // hang for some reason, so this simple rule allows for one
hanging step and still
1511 // leaves time enough for the rest of the steps to finish.
1512 const jlong end = step_start_time_l + ErrorLogTimeout * 1000 / 2;
You mentioned that for your SAP version you divide by 4, allowing for up
to 3 steps to hang. Is there a reason not to do this for the open
version? In fact it looks like TestUnresponsiveErrorHandler is relying
on that being the case, since it will hang 3 times. Related to this,
TimeoutInErrorHandlingTest.java is only checking for one instance of
step timeout pattern. Probably it should check for 3 instances, and also
one instance of the overall timeout pattern:
1272 st->print_cr("------ Timout during error reporting after "
INT64_FORMAT "ms. ------",
thanks,
Chris
On 2/2/17 3:59 AM, Thomas Stüfe wrote:
> Dear all,
>
> please review (or re-review) this enhancement. It was already reviewed
> extensively for JDK9 in Oct 16, but pushed back to JDK10. Now that
> JDK10 repo is open, I would like to close this issue.
>
> In short, this enhancement lets the hotspot cancel hanging error
> reporting steps (when writing the hs-err file), in order to allow
> subsequent error reporting steps to run. In our JVM variant, this has
> been active for a long time and has proven to make error reporting
> more stable in the face of deadlocks or file system stalls.
>
> Issue:
> https://bugs.openjdk.java.net/browse/JDK-8166944
>
> Webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/jdk10-webrev.02/webrev/
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8166944-Hanging-Error-Reporting/jdk10-webrev.02/webrev/>
>
> Old Diskussion for JDK9:
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-October/021582.html
>
> The webrev is unchanged from version 2 discussed in october, just got
> rebased atop of JDK10/hs.
>
> Thank you,
>
> Kind Regards, Thomas
More information about the hotspot-runtime-dev
mailing list