RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die()
Alexey Pavlyutkin
duke at openjdk.org
Wed Mar 8 18:03:20 UTC 2023
On Wed, 8 Mar 2023 15:49:06 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:
>> The patch fixes error reporting to check timeout in the case when a user specifies OnError hander. Before VMError:check_timeout() ignored timeout in this case, and so didn't break malloc() deadlock.
>>
>> Verification (amd64/20.04LTS): the idea of the test is to crash JVM running with error hander of 3 successive `sleep` commands for 1s, 10s, and 60s with and without specified timeout
>>
>>
>> 16:52:17 at alex@alex-VirtualBox>( echo "
>> public class C {
>> public static void main(String[] args) throws Throwable {
>>> while (true) Thread.sleep(1000);
>>> }
>>> }
>>> " >> C.java )
>> 16:57:35 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java &
>> [2] 179574
>> 17:00:19 at alex@alex-VirtualBox>kill -s SIGSEGV 179574
>> 17:00:27 at alex@alex-VirtualBox>#
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> # SIGSEGV (0xb) at pc=0x00007f7b1701ecd5 (sent by kill), pid=179574, tid=179574
>> #
>> # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk)
>> # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
>> # Problematic frame:
>> # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255
>> #
>> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179574)
>> #
>> # An error report file with more information is saved as:
>> # /home/alex/jdk/hs_err_pid179574.log
>> #
>> # If you would like to submit a bug report, please visit:
>> # https://bugreport.java.com/bugreport/crash.jsp
>> #
>> #
>> # -XX:OnError="sleep 1;sleep 10;sleep 60"
>> # Executing /bin/sh -c "sleep 1" ...
>> # Executing /bin/sh -c "sleep 10" ...
>> # Executing /bin/sh -c "sleep 60" ...
>>
>> [2]+ Aborted (core dumped) ./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java
>> 17:02:03 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:ErrorLogTimeout=5 -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java &
>> [2] 179602
>> 17:02:32 at alex@alex-VirtualBox>kill -s SIGSEGV 179602
>> 17:02:41 at alex@alex-VirtualBox>#
>> # A fatal error has been detected by the Java Runtime Environment:
>> #
>> # SIGSEGV (0xb) at pc=0x00007f9d71b18cd5 (sent by kill), pid=179602, tid=179602
>> #
>> # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk)
>> # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
>> # Problematic frame:
>> # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255
>> #
>> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179602)
>> #
>> # An error report file with more information is saved as:
>> # /home/alex/jdk/hs_err_pid179602.log
>> #
>> # If you would like to submit a bug report, please visit:
>> # https://bugreport.java.com/bugreport/crash.jsp
>> #
>> #
>> # -XX:OnError="sleep 1;sleep 10;sleep 60"
>> # Executing /bin/sh -c "sleep 1" ...
>> # Executing /bin/sh -c "sleep 10" ...
>>
>> ------ Timeout during error reporting after 11 s. ------
>>
>> 17:02:54 at alex@alex-VirtualBox>
>>
>>
>> Regression (amd64/20.04LTS): `test/hotspot/jtreg/runtime/ErrorHandling` with different combinations of `-vmoption:-XX:ErrorLogTimeout=10` and `-vmoption:-XX:OnError='sleep 10'`
>
> src/hotspot/share/utilities/vmError.cpp line 1360:
>
>> 1358:
>> 1359: namespace {
>> 1360: class ForkAndExecCheckPoint : public StackObj {
>
> Nit: checkpoint sounds quite specific, and here is nothing checked. Also, this just guards only OnError usages of fork_and_exec, not possible other usages, so maybe "OnErrorInProgress" or something?
ok
> src/hotspot/share/utilities/vmError.cpp line 1362:
>
>> 1360: class ForkAndExecCheckPoint : public StackObj {
>> 1361: NONCOPYABLE( ForkAndExecCheckPoint );
>> 1362: static int _in_progress;
>
> Must be volatile, I think.
sure
> src/hotspot/share/utilities/vmError.cpp line 1366:
>
>> 1364: ForkAndExecCheckPoint() {
>> 1365: assert(Atomic::load(&_in_progress) == 0, "fork_and_exec() is already in progress");
>> 1366: Atomic::store(&_in_progress, 1);
>
> I'd do a CAS.
Initially I used exactly xchg(), but it causes full memory fence. On other hand report_and_die() seems the last place to care about performance
-------------
PR: https://git.openjdk.org/jdk/pull/12925
More information about the hotspot-dev
mailing list