RFR: 8253794: TestAbortVMOnSafepointTimeout never timeouts

Robbin Ehn rehn at openjdk.java.net
Fri Oct 2 07:18:08 UTC 2020


On Thu, 1 Oct 2020 19:49:20 GMT, Daniel D. Daugherty <dcubed at openjdk.org> wrote:

>> The issue is that this test doesn't consider Handshake All operation.
>> Depending if/when such operation is scheduled it can lockup the VM thread.
>> And the safepoint that should timeout never happens.
>> See issue for more information.
>> 
>> So I changed the test to "try timeout" the safepoint, but if there was no safepoint (blocked by a handshake all), we
>> retry. We sleep unsafe much longer than the interval SafepointALot generates operations, which 'guarantees' we will
>> timeout if there is no handshake all. (some extreme case of kernel scheduling causing a very long context switch could
>> also make us not timeout)  Passes t1, t3, and repeat runs of the test.
>
> src/hotspot/share/prims/whitebox.cpp line 2294:
> 
>> 2292:
>> 2293: WB_ENTRY(jboolean, WB_WaitUnsafe(JNIEnv* env, jobject wb, jint time))
>> 2294:     SafepointStateTracker tracker = SafepointSynchronize::safepoint_state_tracker();
> 
> I had to go back and reread the `SafepointStateTracker` code...
> Because this JavaThread is executing and not at a safepoint, the call
> to `SafepointSynchronize::safepoint_state_tracker()` will save state
> as not-at-a-safepoint (with some safepoint_id value).

Yes. We can never return true from this function, e.g. a safepoint happened when unsafe, if we don't have a bug.

> src/hotspot/share/prims/whitebox.cpp line 2296:
> 
>> 2294:     SafepointStateTracker tracker = SafepointSynchronize::safepoint_state_tracker();
>> 2295:     os::naked_short_sleep(time);
>> 2296:     return tracker.safepoint_state_changed();
> 
> Ahhh... returns true when we've had a state change or when the
> safepoint ID has changed, but... how can the system change the
> `SafepointSynchronize::is_at_safepoint()` return value or the
> safepoint_id value while we're sleeping? The system shouldn't
> be able to go to a safepoint or change safepoint_id values while
> this calling thread is not safepoint safe.
> 
> I would think that this function would always return false, but maybe
> I'm missing something here.

Yes it is like saying return false when everything is working.
So yes it may not be so useful.

> test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java line 30:
> 
>> 28: /*
>> 29:  * @test TestAbortVMOnSafepointTimeout
>> 30:  * @summary Check if VM can kill thread which doesn't reach safepoint.
> 
> Not your bug, but this summary is wrong. Perhaps:
> `@summary Check if VM aborts when a thread doesn't reach safepoint.`

The timeout shots a SIGILL on the 'slow' thread, it does not abort (it do abort if it can't send the signal).
Test also checks that the log says we have done this.

> test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java line 47:
> 
>> 45:                 System.out.println("Waiting for safepoint");
>> 46:             }
>> 47:             System.out.println("This message would occur after some time.");
> 
> Maybe I'm missing something, but I don't see how `wb.waitUnsafe(waitTime)` is
> ever going to return anything but false so this message should never be printed.

It's not a new line. The line was there if the VM fails timeout, so it was never printed assuming a working VM.
And now it still not printed assuming a working VM.

> test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java line 71:
> 
>> 69:         ProcessBuilder pb = ProcessTools.createJavaProcessBuilder(
>> 70:                 "-XX:+UnlockDiagnosticVMOptions",
>> 71:                 "-XX:-UseBiasedLocking",
> 
> I think "-XX:-UseBiasedLocking" is specified to make sure
> that Biased Locking is disabled even in test tasks where it
> is enabled by task specific flags.

Yes. But now this test is fine with using biased locking.

> test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java line 66:
> 
>> 64:                 "-Xms64m",
>> 65:                 "TestAbortVMOnSafepointTimeout",
>> 66:                 "" + unsafe_wait
> 
> Cheap conversion from int to String?

I followed the other ones:
"-XX:SafepointTimeoutDelay=" + timeout_delay
In this case we have no name for the 'option', it thus becomes:
"" + unsafe_wait

> test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java line 79:
> 
>> 77:             }
>> 78:         }
>> 79:         output.shouldNotHaveExitValue(0);
> 
> Looks like the test doesn't require that this mesg get printed:
> `System.out.println("This message would occur after some time.");`
> 
> And it is set up to detect that the SafepointTimeout happened
> which is what we want the test to verify at the core.

The line "This message would occur after some time." should never be print if VM is working.
If the VM fails for some reason and the timeout is not performed, line:
"Timed out while spinning to reach a safepoint." is never printed and the OutputAnalyzer fails the test.
If we did timeout and it was printed we know that we didn't print the other message, since the only thread that can
timeout is the one printing that message.

The second part verifies that the SIGILL was delivered.

-------------

PR: https://git.openjdk.java.net/jdk/pull/465


More information about the hotspot-runtime-dev mailing list