RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die()

David Holmes dholmes at openjdk.org
Wed Mar 8 21:34:07 UTC 2023


On Wed, 8 Mar 2023 17:18:30 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

>> The patch fixes error reporting to check timeout in the case when a user specifies OnError hander. Before VMError:check_timeout() ignored timeout in this case, and so didn't break malloc() deadlock.
>> 
>> Verification (amd64/20.04LTS): the idea of the test is to crash JVM running with error hander of 3 successive `sleep` commands for 1s, 10s, and 60s with and without specified timeout
>> 
>> 
>> 16:52:17 at alex@alex-VirtualBox>( echo "
>> public class C {
>>   public static void main(String[] args) throws Throwable {
>>>     while (true) Thread.sleep(1000);
>>>   }
>>> }
>>> " >> C.java )
>> 16:57:35 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java &
>> [2] 179574
>> 17:00:19 at alex@alex-VirtualBox>kill -s SIGSEGV 179574
>> 17:00:27 at alex@alex-VirtualBox>#
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x00007f7b1701ecd5 (sent by kill), pid=179574, tid=179574
>> #
>> # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk)
>> # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
>> # Problematic frame:
>> # C  [libpthread.so.0+0x9cd5]  __pthread_clockjoin_ex+0x255
>> #
>> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179574)
>> #
>> # An error report file with more information is saved as:
>> # /home/alex/jdk/hs_err_pid179574.log
>> #
>> # If you would like to submit a bug report, please visit:
>> #   https://bugreport.java.com/bugreport/crash.jsp
>> #
>> #
>> # -XX:OnError="sleep 1;sleep 10;sleep 60"
>> #   Executing /bin/sh -c "sleep 1" ...
>> #   Executing /bin/sh -c "sleep 10" ...
>> #   Executing /bin/sh -c "sleep 60" ...
>> 
>> [2]+  Aborted                 (core dumped) ./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java
>> 17:02:03 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:ErrorLogTimeout=5 -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java &
>> [2] 179602
>> 17:02:32 at alex@alex-VirtualBox>kill -s SIGSEGV 179602
>> 17:02:41 at alex@alex-VirtualBox>#
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x00007f9d71b18cd5 (sent by kill), pid=179602, tid=179602
>> #
>> # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk)
>> # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
>> # Problematic frame:
>> # C  [libpthread.so.0+0x9cd5]  __pthread_clockjoin_ex+0x255
>> #
>> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179602)
>> #
>> # An error report file with more information is saved as:
>> # /home/alex/jdk/hs_err_pid179602.log
>> #
>> # If you would like to submit a bug report, please visit:
>> #   https://bugreport.java.com/bugreport/crash.jsp
>> #
>> #
>> # -XX:OnError="sleep 1;sleep 10;sleep 60"
>> #   Executing /bin/sh -c "sleep 1" ...
>> #   Executing /bin/sh -c "sleep 10" ...
>> 
>> ------ Timeout during error reporting after 11 s. ------
>> 
>> 17:02:54 at alex@alex-VirtualBox>
>> 
>> 
>> Regression (amd64/20.04LTS): `test/hotspot/jtreg/runtime/ErrorHandling` with different combinations of `-vmoption:-XX:ErrorLogTimeout=10` and `-vmoption:-XX:OnError='sleep 10'`
>
> Pinging @dholmes-ora for input and opinion

I thought we were still discussing options in JBS so this PR seems premature to me. I agree with @tstuefe initial comment this seems way too complex and I'm not even sure I can figure out the control flow here.

-------------

PR: https://git.openjdk.org/jdk/pull/12925


More information about the hotspot-dev mailing list