RFR(XS): 8188872: runtime/ErrorHandling/TimeoutInErrorHandlingTest.java fails intermittently

David Holmes david.holmes at oracle.com
Thu May 30 05:09:18 UTC 2019


Hi Dan,

Reading this email and the bug report really made my head hurt. I'm just 
left bemused. Are we really fixing something here or have we written a 
test that is just trying to be far too clever?

That said your changes seem relatively innocuous.

One aspect that I can't work through is what is supposed to happen when 
the error reporting thread is sent the SIGILL. What does the signal 
handler do in response to that? If it returns then we will stay in the 
infinite_sleep. If it jumps back into error reporting then we're now 
potentially in nested signal-handling contexts - in which case if the 
WatcherThread tries to interrupt us again (because we're still not 
responsive enough) won't the SIGILL be blocked because the handler is 
still active (unless we explicitly unblock it in the handler) ?

Thanks,
David

On 29/05/2019 10:12 am, Daniel D. Daugherty wrote:
> Greetings,
> 
> I have a fix for the following longstanding bug:
> 
>      JDK-8188872 runtime/ErrorHandling/TimeoutInErrorHandlingTest.java 
> fails intermittently
>      https://bugs.openjdk.java.net/browse/JDK-8188872
> 
> I've include Thomas Stüfe directly since I'm modifying his code...
> 
> This fix include changes to the error handling code, the VM parts
> of the test (-XX:+TestUnresponsiveErrorHandler) and the test itself.
> The changes themselves are small, but the reasons are complicated
> so a detailed explanation is required.
> 
> Summary of the changes:
> 
> - src/hotspot/share/utilities/vmError.cpp
>    - add VMError::clear_step_start_time() and call it from the
>      error reporting END macro.
>      - VMError::report() is called twice: first, to generate a summary
>        for stdout and second, to generate hs_err_pid output.
>      - Adding clear_step_start_time() prevents interrupt_reporting_thread()
>        from interrupting the error reporting thread between the two 
> calls to
>        VMError::report().
>      - This solves the problem where hs_err_pid file creation gets 
> interrupted
>        and the hs_err_pid file ends up being created in /tmp/hs_err_pid...
>    - add a STEP in VMError::report() for setting up the 'start time' for
>      the TestUnresponsiveErrorHandler test
>      - There is a corresponding change in VMError::report_and_die() that
>        skips the call to record_reporting_start_time() when we are
>        executing TestUnresponsiveErrorHandler.
>      - This solves the problem where the error reporting thread is exposed
>        to interrupt_reporting_thread() calls before it has reached the
>        first STEP in VMError::report().
>    - change VMError::check_timeout() to only call 
> interrupt_reporting_thread()
>      once per timeout detection for either a total reporting timeout or a
>      step timeout:
>      - check_timeout() is called by the WatcherThread once per second once
>        it determines that errror reporting has started. This change solves
>        the problem where a timeout is detected, the error reporting thread
>        takes longer than a second to do its work so the WatcherThread calls
>        check_timeout() (and interrupt_reporting_thread()) again which
>        restarts the STEP we were on from the beginning.
> - src/hotspot/share/utilities/vmError.hpp
>    - add clear_step_start_time()
> - test/hotspot/jtreg/runtime/ErrorHandling/TimeoutInErrorHandlingTest.java
>    - add support for '-Dverbose=true' to get more verbose test output
>    - Default ERROR_LOG_TIMEOUT is 16 seconds; Solaris sets it to 3X.
>    - dump the cmd output if we can't find the 'hs_err_pid' file
>    - dump the cmd output if we can't open the 'hs_err_pid' file
>    - dump the hs_err_pid file if we fail to match the patterns
> 
> Webrev URL: 
> http://cr.openjdk.java.net/~dcubed/8188872-webrev/0-for-jdk-jdk13/
> 
> Testing: Mach5 Tier[1-5]
>           Included the fix in my latest round of 8153224 testing
>           on Solaris-X64 where this bug reproduces quite a bit.
> 
> Thanks, in advance, for any comments, suggestions, or questions.
> 
> Dan


More information about the hotspot-runtime-dev mailing list