RFR(XS): 8188872: runtime/ErrorHandling/TimeoutInErrorHandlingTest.java fails intermittently
Daniel D. Daugherty
daniel.daugherty at oracle.com
Fri May 31 21:38:23 UTC 2019
David H has reviewed this. I still need a second reviewer...
Dan
On 5/29/19 8:42 PM, Daniel D. Daugherty wrote:
> Ping! Anyone out there? :-)
>
> Dan
>
> On 5/28/19 8:12 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I have a fix for the following longstanding bug:
>>
>> JDK-8188872 runtime/ErrorHandling/TimeoutInErrorHandlingTest.java
>> fails intermittently
>> https://bugs.openjdk.java.net/browse/JDK-8188872
>>
>> I've include Thomas Stüfe directly since I'm modifying his code...
>>
>> This fix include changes to the error handling code, the VM parts
>> of the test (-XX:+TestUnresponsiveErrorHandler) and the test itself.
>> The changes themselves are small, but the reasons are complicated
>> so a detailed explanation is required.
>>
>> Summary of the changes:
>>
>> - src/hotspot/share/utilities/vmError.cpp
>> - add VMError::clear_step_start_time() and call it from the
>> error reporting END macro.
>> - VMError::report() is called twice: first, to generate a summary
>> for stdout and second, to generate hs_err_pid output.
>> - Adding clear_step_start_time() prevents
>> interrupt_reporting_thread()
>> from interrupting the error reporting thread between the two
>> calls to
>> VMError::report().
>> - This solves the problem where hs_err_pid file creation gets
>> interrupted
>> and the hs_err_pid file ends up being created in
>> /tmp/hs_err_pid...
>> - add a STEP in VMError::report() for setting up the 'start time' for
>> the TestUnresponsiveErrorHandler test
>> - There is a corresponding change in VMError::report_and_die() that
>> skips the call to record_reporting_start_time() when we are
>> executing TestUnresponsiveErrorHandler.
>> - This solves the problem where the error reporting thread is
>> exposed
>> to interrupt_reporting_thread() calls before it has reached the
>> first STEP in VMError::report().
>> - change VMError::check_timeout() to only call
>> interrupt_reporting_thread()
>> once per timeout detection for either a total reporting timeout or a
>> step timeout:
>> - check_timeout() is called by the WatcherThread once per second
>> once
>> it determines that errror reporting has started. This change
>> solves
>> the problem where a timeout is detected, the error reporting
>> thread
>> takes longer than a second to do its work so the WatcherThread
>> calls
>> check_timeout() (and interrupt_reporting_thread()) again which
>> restarts the STEP we were on from the beginning.
>> - src/hotspot/share/utilities/vmError.hpp
>> - add clear_step_start_time()
>> -
>> test/hotspot/jtreg/runtime/ErrorHandling/TimeoutInErrorHandlingTest.java
>> - add support for '-Dverbose=true' to get more verbose test output
>> - Default ERROR_LOG_TIMEOUT is 16 seconds; Solaris sets it to 3X.
>> - dump the cmd output if we can't find the 'hs_err_pid' file
>> - dump the cmd output if we can't open the 'hs_err_pid' file
>> - dump the hs_err_pid file if we fail to match the patterns
>>
>> Webrev URL:
>> http://cr.openjdk.java.net/~dcubed/8188872-webrev/0-for-jdk-jdk13/
>>
>> Testing: Mach5 Tier[1-5]
>> Included the fix in my latest round of 8153224 testing
>> on Solaris-X64 where this bug reproduces quite a bit.
>>
>> Thanks, in advance, for any comments, suggestions, or questions.
>>
>> Dan
>>
>
>
More information about the hotspot-runtime-dev
mailing list