RFR(s) (JDK10): 8166944: Hanging Error Reporting steps may lead to torn error logs.

Chris Plummer chris.plummer at oracle.com
Thu Feb 2 23:11:49 UTC 2017


Hi Thomas,

Mostly looks good. A few minor comments:

Most files need 2017 copyright updates.

set_to_now() and get_timestamp() should have names that tie them 
together better. Maybe set_step_start_time() and get_step_start_time(). 
Also, they should probably be methods of VMError and the statics should 
all be fields.

In the first review I had mentioned that you only are allowing for one 
step to hang, and still be able to continue with subsequent steps:

1509     // A step times out after half of the total timeout. Steps are 
mostly fast unless they
1510     // hang for some reason, so this simple rule allows for one 
hanging step and still
1511     // leaves time enough for the rest of the steps to finish.
1512     const jlong end = step_start_time_l + ErrorLogTimeout * 1000 / 2;

You mentioned that for your SAP version you divide by 4, allowing for up 
to 3 steps to hang. Is there a reason not to do this for the open 
version? In fact it looks like TestUnresponsiveErrorHandler is relying 
on that being the case, since it will hang 3 times. Related to this, 
TimeoutInErrorHandlingTest.java is only checking for one instance of 
step timeout pattern. Probably it should check for 3 instances, and also 
one instance of the overall timeout pattern:

1272         st->print_cr("------ Timout during error reporting after " 
INT64_FORMAT "ms. ------",

thanks,

Chris

On 2/2/17 3:59 AM, Thomas Stüfe wrote:
> Dear all,
>
> please review (or re-review) this enhancement. It was already reviewed 
> extensively for JDK9 in Oct 16, but pushed back to JDK10. Now that 
> JDK10 repo is open, I would like to close this issue.
>
> In short, this enhancement lets the hotspot cancel hanging error 
> reporting steps (when writing the hs-err file), in order to allow 
> subsequent error reporting steps to run. In our JVM variant, this has 
> been active for a long time and has proven to make error reporting 
> more stable in the face of deadlocks or file system stalls.
>
> Issue:
> https://bugs.openjdk.java.net/browse/JDK-8166944
>
> Webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/jdk10-webrev.02/webrev/ 
> <http://cr.openjdk.java.net/%7Estuefe/webrevs/8166944-Hanging-Error-Reporting/jdk10-webrev.02/webrev/>
>
> Old Diskussion for JDK9:
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-October/021582.html
>
> The webrev is unchanged from version 2 discussed in october, just got 
> rebased atop of JDK10/hs.
>
> Thank you,
>
> Kind Regards, Thomas





More information about the hotspot-runtime-dev mailing list