RFR(s) (JDK10): 8166944: Hanging Error Reporting steps may lead to torn error logs.

Thomas Stüfe thomas.stuefe at gmail.com
Fri Feb 3 12:46:41 UTC 2017


Hi Chris,

Thanks again for your review, and sorry for forgetting parts of your input.
Will prepare a new webrev and incorporate yours and Davids input.

Kind Regards, Thomas

On Fri, Feb 3, 2017 at 12:11 AM, Chris Plummer <chris.plummer at oracle.com>
wrote:

> Hi Thomas,
>
> Mostly looks good. A few minor comments:
>
> Most files need 2017 copyright updates.
>
> set_to_now() and get_timestamp() should have names that tie them together
> better. Maybe set_step_start_time() and get_step_start_time(). Also, they
> should probably be methods of VMError and the statics should all be fields.
>
> In the first review I had mentioned that you only are allowing for one
> step to hang, and still be able to continue with subsequent steps:
>
> 1509     // A step times out after half of the total timeout. Steps are
> mostly fast unless they
> 1510     // hang for some reason, so this simple rule allows for one
> hanging step and still
> 1511     // leaves time enough for the rest of the steps to finish.
> 1512     const jlong end = step_start_time_l + ErrorLogTimeout * 1000 / 2;
>
> You mentioned that for your SAP version you divide by 4, allowing for up
> to 3 steps to hang. Is there a reason not to do this for the open version?
> In fact it looks like TestUnresponsiveErrorHandler is relying on that being
> the case, since it will hang 3 times. Related to this,
> TimeoutInErrorHandlingTest.java is only checking for one instance of step
> timeout pattern. Probably it should check for 3 instances, and also one
> instance of the overall timeout pattern:
>
> 1272         st->print_cr("------ Timout during error reporting after "
> INT64_FORMAT "ms. ------",
>
> thanks,
>
> Chris
>
> On 2/2/17 3:59 AM, Thomas Stüfe wrote:
>
>> Dear all,
>>
>> please review (or re-review) this enhancement. It was already reviewed
>> extensively for JDK9 in Oct 16, but pushed back to JDK10. Now that JDK10
>> repo is open, I would like to close this issue.
>>
>> In short, this enhancement lets the hotspot cancel hanging error
>> reporting steps (when writing the hs-err file), in order to allow
>> subsequent error reporting steps to run. In our JVM variant, this has been
>> active for a long time and has proven to make error reporting more stable
>> in the face of deadlocks or file system stalls.
>>
>> Issue:
>> https://bugs.openjdk.java.net/browse/JDK-8166944
>>
>> Webrev:
>> http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-
>> Error-Reporting/jdk10-webrev.02/webrev/ <http://cr.openjdk.java.net/%7
>> Estuefe/webrevs/8166944-Hanging-Error-Reporting/jdk10-webrev.02/webrev/>
>>
>> Old Diskussion for JDK9:
>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2
>> 016-October/021582.html
>>
>> The webrev is unchanged from version 2 discussed in october, just got
>> rebased atop of JDK10/hs.
>>
>> Thank you,
>>
>> Kind Regards, Thomas
>>
>
>
>
>


More information about the hotspot-runtime-dev mailing list