RFR(XS): 8188872: runtime/ErrorHandling/TimeoutInErrorHandlingTest.java fails intermittently

Daniel D. Daugherty daniel.daugherty at oracle.com
Thu May 30 00:42:55 UTC 2019


Ping! Anyone out there? :-)

Dan

On 5/28/19 8:12 PM, Daniel D. Daugherty wrote:
> Greetings,
>
> I have a fix for the following longstanding bug:
>
>     JDK-8188872 runtime/ErrorHandling/TimeoutInErrorHandlingTest.java 
> fails intermittently
>     https://bugs.openjdk.java.net/browse/JDK-8188872
>
> I've include Thomas Stüfe directly since I'm modifying his code...
>
> This fix include changes to the error handling code, the VM parts
> of the test (-XX:+TestUnresponsiveErrorHandler) and the test itself.
> The changes themselves are small, but the reasons are complicated
> so a detailed explanation is required.
>
> Summary of the changes:
>
> - src/hotspot/share/utilities/vmError.cpp
>   - add VMError::clear_step_start_time() and call it from the
>     error reporting END macro.
>     - VMError::report() is called twice: first, to generate a summary
>       for stdout and second, to generate hs_err_pid output.
>     - Adding clear_step_start_time() prevents 
> interrupt_reporting_thread()
>       from interrupting the error reporting thread between the two 
> calls to
>       VMError::report().
>     - This solves the problem where hs_err_pid file creation gets 
> interrupted
>       and the hs_err_pid file ends up being created in /tmp/hs_err_pid...
>   - add a STEP in VMError::report() for setting up the 'start time' for
>     the TestUnresponsiveErrorHandler test
>     - There is a corresponding change in VMError::report_and_die() that
>       skips the call to record_reporting_start_time() when we are
>       executing TestUnresponsiveErrorHandler.
>     - This solves the problem where the error reporting thread is exposed
>       to interrupt_reporting_thread() calls before it has reached the
>       first STEP in VMError::report().
>   - change VMError::check_timeout() to only call 
> interrupt_reporting_thread()
>     once per timeout detection for either a total reporting timeout or a
>     step timeout:
>     - check_timeout() is called by the WatcherThread once per second once
>       it determines that errror reporting has started. This change solves
>       the problem where a timeout is detected, the error reporting thread
>       takes longer than a second to do its work so the WatcherThread 
> calls
>       check_timeout() (and interrupt_reporting_thread()) again which
>       restarts the STEP we were on from the beginning.
> - src/hotspot/share/utilities/vmError.hpp
>   - add clear_step_start_time()
> - 
> test/hotspot/jtreg/runtime/ErrorHandling/TimeoutInErrorHandlingTest.java
>   - add support for '-Dverbose=true' to get more verbose test output
>   - Default ERROR_LOG_TIMEOUT is 16 seconds; Solaris sets it to 3X.
>   - dump the cmd output if we can't find the 'hs_err_pid' file
>   - dump the cmd output if we can't open the 'hs_err_pid' file
>   - dump the hs_err_pid file if we fail to match the patterns
>
> Webrev URL: 
> http://cr.openjdk.java.net/~dcubed/8188872-webrev/0-for-jdk-jdk13/
>
> Testing: Mach5 Tier[1-5]
>          Included the fix in my latest round of 8153224 testing
>          on Solaris-X64 where this bug reproduces quite a bit.
>
> Thanks, in advance, for any comments, suggestions, or questions.
>
> Dan
>



More information about the hotspot-runtime-dev mailing list