RFR (S) 8181143: Introduce diagnostic flag to abort VM on too long VM operations

Tue Dec 11 15:18:40 UTC 2018

Hi Aleksey,

this looks good to me.

Thanks!
Roman

> Hi,
> 
> Getting back to this. Replying to all three reviews in the same mail, because they share the
> answers! New patch:
>   http://cr.openjdk.java.net/~shade/8181143/webrev.04/
> 
> Testing: x86_64 build, hotspot tier1, new test
> 
> On 11/20/18 8:07 AM, Thomas Stüfe wrote:
>> I don't know whether I need 10ms resolution though - if we limit the
>> goal to catching just hanging VMOps - which would still pretty useful
>> - 1s would even be enough.
> 
> Right. New patch implements variable delay, depending on what actual delay had been requested. New
> test makes sure it works reliably.
> 
>> I think we can do with just two flags, since VMOperationTimeout and
>> VMOperationTimeoutDelay are redundant. Just go with the delay, make -1
>> the default that does nothing. Same could go for SafepointTimeout vs
>> SafepointTimeoutDelay.
> 
> Redid like this:
> 
>   diagnostic(bool, AbortVMOnVMOperationTimeout, false,                      \
>           "Abort upon failure to complete VM operation promptly")           \
>                                                                             \
>   diagnostic(intx, AbortVMOnVMOperationTimeoutDelay, 1000,                  \
>           "Delay in milliseconds for option AbortVMOnVMOperationTimeout")   \
>           range(0, max_intx)                                                \
>                                                                             \
> 
>> I also agree with others that it would make more sense were we to kill
>> the VMThread instead (e.g. just with pthread_kill(SIGILL) or similar)
>> - we do something like that in error handling (see ErrorLogTimeout).
> 
> I looked into that, and it seems too much hassle without significant benefit. For example, it would
> require drilling new platform-specific holes to support it, and it also likely to be supported on
> Linux (pthreads) only anyway? Seems to me piggybacking on periodic tasks is much simpler. In
> development use, you'd have a core file to see what is going on in VM. In "production" use you'd
> care that circuit breaker that did its job of terminating the VM.
> 
> On 11/19/18 9:35 AM, Robbin Ehn wrote:
>>  You patch seems not to be against jdk/jdk (jdk12).
> 
> Really? The new patch is definitely against jdk/jdk.
> 
>>  Without the actual core file, I don't see the hs file being very useful
>>  containing the watcherthread stacktrace. It would be good if the 'killer'
>>  signaled the VM thread and we do the error reporting from the signal handler
>>  instead in VM thread context.
> 
> Right. Looked into that too, decided it is too complicated without significant benefit. See above.
> 
> On 11/19/18 7:13 AM, David Holmes wrote:
>> You're not just introducing one diagnostic flag, your introducing the entire VM operation
>> timeout mechanism, including two product flags and one diagnostic. So the CR needs to reflect
>> that clearly and you will need a CSR request to add the two product flags. And they will need to
>> be documented.
> 
> This was "solved" by doing two diagnostic flags, see above. I think the symmetry against other
> AbortVM* flags is important, and having the boolean flag for AbortVM* is good.
> 
>> It's not safe to access vm_safepoint_description() outside the VMThread as the _cur_vm_operation 
>> could be deleted while you're trying to access its name.
> 
> Noted, removed that part.
> 
>> And as we don't have a general timer mechanism this has to use polling so you pay for a 10ms
>> task wakeup regardless of how long the timeout is.
> 
> Redone that part, see above.
> 
>> Given the limitations of the global timeout I'm not sure I see a use for the non-aborting form.
>> This could just reduce down to:
>>
>> -XX:AbortVMOpsAfter:NN
>>
>> otherwise I don't really think this carries its weight. Of course that's just my opinion.
>> Interested to hear others.
> 
> I prefer not to introduce another shape for AbortVM* options. Two diagnostic options seems to have
> much less weight on the whole thing.
> 
> -Aleksey
>