RFR: 8253794: TestAbortVMOnSafepointTimeout never timeouts

David Holmes dholmes at openjdk.java.net
Fri Oct 2 04:27:04 UTC 2020


On Thu, 1 Oct 2020 20:11:46 GMT, Daniel D. Daugherty <dcubed at openjdk.org> wrote:

>> The issue is that this test doesn't consider Handshake All operation.
>> Depending if/when such operation is scheduled it can lockup the VM thread.
>> And the safepoint that should timeout never happens.
>> See issue for more information.
>> 
>> So I changed the test to "try timeout" the safepoint, but if there was no safepoint (blocked by a handshake all), we
>> retry. We sleep unsafe much longer than the interval SafepointALot generates operations, which 'guarantees' we will
>> timeout if there is no handshake all. (some extreme case of kernel scheduling causing a very long context switch could
>> also make us not timeout)  Passes t1, t3, and repeat runs of the test.
>
> Changes requested by dcubed (Reviewer).

Hi Robbin,

So.... The old test used an "uncounted loop" (based on internal JIT knowledge) to create looping code with no safepoint
polls so that it remains safepoint-unsafe (and Patricio had to tweak the test conditions to avoid unexpected
safepoints). The new code has a WhiteBox entry that uses an internal naked_sleep which keeps the thread _thread_in_VM
IIUC, which is not safepoint-safe, but also potentially different to being _thread_in_Java. But lets just accept the
net effect is the same - the thread will prevent a safepoint from being reached until the sleep time has elapsed. If
that time is > (GuaranteedSafepointInterval + SafepointTimeoutDelay) then we should see a safepoint timeout and the VM
abort. Okay ... so how does that solve the problem the test currently experiences with handshakes ... if we are at a
handshake the handshake can't proceed until the sleep time expires, but then when we transition back to Java the thread
will see the handshake and so the handshake will proceed. As long as the WB function returns false we will repeat the
process, eventually when the expected safepoint is requested we should again trigger the safepoint timeout and abort.

But like Dan I'm unclear how the WB function can ever return true as the safepoint state can't change whilst the thread
is in the naked sleep. ??

Aside: rather than using "args.length > 0" to discriminate between the original and subsequent executions of the test
class, it can be clearer (IMO) to add a static nested class which has the main method that performs the actual test,
and you invoke that via ProcessTools.

That all said, for the record, we really should have a handshake timeout mechanism the same as we have the safepoint
timeout mechanism.

Thanks,
David

-------------

PR: https://git.openjdk.java.net/jdk/pull/465


More information about the hotspot-runtime-dev mailing list