RFR(XS): 8188872: runtime/ErrorHandling/TimeoutInErrorHandlingTest.java fails intermittently
Daniel D. Daugherty
daniel.daugherty at oracle.com
Wed May 29 00:12:05 UTC 2019
Greetings,
I have a fix for the following longstanding bug:
JDK-8188872 runtime/ErrorHandling/TimeoutInErrorHandlingTest.java
fails intermittently
https://bugs.openjdk.java.net/browse/JDK-8188872
I've include Thomas Stüfe directly since I'm modifying his code...
This fix include changes to the error handling code, the VM parts
of the test (-XX:+TestUnresponsiveErrorHandler) and the test itself.
The changes themselves are small, but the reasons are complicated
so a detailed explanation is required.
Summary of the changes:
- src/hotspot/share/utilities/vmError.cpp
- add VMError::clear_step_start_time() and call it from the
error reporting END macro.
- VMError::report() is called twice: first, to generate a summary
for stdout and second, to generate hs_err_pid output.
- Adding clear_step_start_time() prevents interrupt_reporting_thread()
from interrupting the error reporting thread between the two calls to
VMError::report().
- This solves the problem where hs_err_pid file creation gets
interrupted
and the hs_err_pid file ends up being created in /tmp/hs_err_pid...
- add a STEP in VMError::report() for setting up the 'start time' for
the TestUnresponsiveErrorHandler test
- There is a corresponding change in VMError::report_and_die() that
skips the call to record_reporting_start_time() when we are
executing TestUnresponsiveErrorHandler.
- This solves the problem where the error reporting thread is exposed
to interrupt_reporting_thread() calls before it has reached the
first STEP in VMError::report().
- change VMError::check_timeout() to only call
interrupt_reporting_thread()
once per timeout detection for either a total reporting timeout or a
step timeout:
- check_timeout() is called by the WatcherThread once per second once
it determines that errror reporting has started. This change solves
the problem where a timeout is detected, the error reporting thread
takes longer than a second to do its work so the WatcherThread calls
check_timeout() (and interrupt_reporting_thread()) again which
restarts the STEP we were on from the beginning.
- src/hotspot/share/utilities/vmError.hpp
- add clear_step_start_time()
- test/hotspot/jtreg/runtime/ErrorHandling/TimeoutInErrorHandlingTest.java
- add support for '-Dverbose=true' to get more verbose test output
- Default ERROR_LOG_TIMEOUT is 16 seconds; Solaris sets it to 3X.
- dump the cmd output if we can't find the 'hs_err_pid' file
- dump the cmd output if we can't open the 'hs_err_pid' file
- dump the hs_err_pid file if we fail to match the patterns
Webrev URL:
http://cr.openjdk.java.net/~dcubed/8188872-webrev/0-for-jdk-jdk13/
Testing: Mach5 Tier[1-5]
Included the fix in my latest round of 8153224 testing
on Solaris-X64 where this bug reproduces quite a bit.
Thanks, in advance, for any comments, suggestions, or questions.
Dan
More information about the hotspot-runtime-dev
mailing list