RFR(s): 8220774: Add HandshakeALot diag option

Robbin Ehn robbin.ehn at oracle.com
Thu Mar 21 16:04:16 UTC 2019


On 2019-03-21 16:11, Daniel D. Daugherty wrote:
>> Check-out v3, just full:
>> http://cr.openjdk.java.net/~rehn/8220774/v3/webrev/ 
> 
> src/hotspot/share/runtime/globals.hpp
>      No comments.
> 
> src/hotspot/share/runtime/thread.hpp
>      No comments.
> 
> src/hotspot/share/runtime/thread.cpp
>      L2954:   assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>      L2955:          (has_last_Java_frame() && java_call_counter() > 0), "wrong 
> java_sp info!");
>         Perhaps change to this:
> 
>               assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>                      (has_last_Java_frame() && java_call_counter() > 0),
>                      "unexpected frame info: has_last_frame=%d, 
> java_call_counter=%d",
>                      has_last_Java_frame(), java_call_counter());
> 
>          and consider making the same change to the nmethod version.
> 

Fixed both!

> src/hotspot/share/runtime/vmThread.hpp
>      No comments.
> 
> src/hotspot/share/runtime/vmThread.cpp
>      L449:   // Must check for handshakes first, since ops returns.
>          Perhaps:
>              // Check for handshakes first since we may need to return a VMop.
> 
>      L454:   // Check for a true cleanup first, trying to keep stats correct.
>          Perhaps:
>              // Check for a cleanup before SafepointALot to keep stats correct.
> 
>      L635:     // We want to make sure that we get to a safepoint regularly.
>          Perhaps an addition:
>                // We want to make sure that we get to a safepoint regularly
>                // even when executing VMops that don't require safepoints.
> 

Fixed above!

Publishing v4, since I need a second review:
Full:
http://cr.openjdk.java.net/~rehn/8220774/v4/webrev/
Inc:
http://cr.openjdk.java.net/~rehn/8220774/v4/inc/webrev/

Thanks, Robbin

> Okay, I now grok why VMThread::no_op_safepoint() does not need a
> 'check_time' parameter. Thanks for restoring the no_op_safepoint()
> call at the bottom of the "while(true)" in VMThread::loop().
> 
> If you decide to tweak the semantics for SafepointALot down the road,
> it would be best to do that in its own bug rather than as part of
> another bug.
> 
> Thumbs up! Your call on whether to tweak the assert or change the comments.
> 
> Dan
> 
> 
> 
> On 3/21/19 10:28 AM, Daniel D. Daugherty wrote:
>> On 3/21/19 9:57 AM, Robbin Ehn wrote:
>>> Hi Dan,
>>>
>>> On 2019-03-20 21:21, Daniel D. Daugherty wrote:
>>>>> And here is v2 for you to consider:
>>>>> http://rehn-ws.se.oracle.com/cr_mirror/8220774/v2/webrev/ 
>>>>
>>>> It's too difficult to craft my comments relative to the incremental
>>>> webrev so I'm working with the full webrev still.
>>>>
>>>>
>>>> src/hotspot/share/runtime/globals.hpp
>>>>      No comments.
>>>>
>>>> src/hotspot/share/runtime/thread.hpp
>>>>      No comments.
>>>>
>>>> src/hotspot/share/runtime/thread.cpp
>>>>      L2953:   // This checks that the thread have a correct frame state 
>>>> during a handshake
>>>>          typo: s/have/has/
>>>>          Also please add a '.' to end of the sentence.
>>>
>>> Fixed.
>>>
>>>>
>>>>      L2954:   assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>>>>      L2955:          (has_last_Java_frame() && java_call_counter() > 0), 
>>>> "wrong java_sp info!");
>>>>          Trying to make sure I understand what you are asserting:
>>>>
>>>>          1) if the thread does not have a last Java frame then it must have
>>>>             never called Java
>>>>          2) if the thread has a last Java frame, then it must have called
>>>>             Java at least once
>>>>
>>>>          I'm good with the second expression. I have some vague doubts about
>>>>          first expression. When a JavaThread is done executing Java code and
>>>>          it is on its way toward a thread exit, is there ever a time when it
>>>>          no longer has a last Java frame? I'm thinking a handshake late in
>>>>          the JavaThread's life...
>>>
>>> The assert is a copy paste from
>>> L2959 void JavaThread::nmethods_do(CodeBlobClosure* cf) {
>>>
>>> I have seen this assert trigger twice on windows, either it's wrong and should
>>> be removed from nmethods_do, or it is correct and this will help us find when
>>> this happens.
>>
>> Discussed this with Robbin via chat. I'm good with leaving
>> the assert in place. Perhaps tweak both like this:
>>
>>     assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>>            (has_last_Java_frame() && java_call_counter() > 0),
>>            "unexpected frame info: has_last_frame=%d, java_call_counter=%d",
>>            has_last_Java_frame(), java_call_counter());
>>
>>>
>>>>
>>>> src/hotspot/share/runtime/vmThread.hpp
>>>>      No comments.
>>>>
>>>> src/hotspot/share/runtime/vmThread.cpp
>>>>      L503:           MutexUnlockerEx mul(VMOperationQueue_lock, 
>>>> Mutex::_no_safepoint_check_flag);
>>>>          Please add something like this above L503:
>>>>
>>>>                      // Have to unlock VMOperationQueue_lock just in case
>>>>                      // no_op_safepoint() has to do a handshake.
>>>>
>>>>      old L619:     // We want to make sure that we get to a safepoint 
>>>> regularly.
>>>>      old L620:     //
>>>>      old L621:     if ((_cur_vm_operation = 
>>>> VMThread::no_op_safepoint(false)) != NULL) {
>>>>          This call to no_op_safepoint() is at the bottom of the "while(true)"
>>>>          loop in VMThread::loop(). Before the fix for 8219436, this line 
>>>> used to be:
>>>>
>>>>              if (VMThread::no_op_safepoint_needed(true)) {
>>>>
>>>>          and it was the only call to no_op_safepoint_needed() that passed
>>>>          'true'. It's the 'true' parameter that made the comment on L619
>>>>          correct. Why do I say that? Well, the only way we get to the
>>>>          bottom of the "while(true)" loop is if we had a vm_operation to
>>>>          perform. Let's say we had a bunch of no-safepoint vm_operations
>>>>          to perform and we just kept executing them, one after another.
>>>>          While doing this work, if our time between safepoints exceeds
>>>>          GuaranteedSafepointInterval, then this no_op_safepoint_needed(true)
>>>>          call is what would detect that we've gone too long between safepoints
>>>>          and would force one.
>>>>
>>>>          With the fix for 8219436, that call was changed to 
>>>> no_op_safepoint(false)
>>>>          and with the v2 version of this fix that call is now gone. So we no
>>>>          longer have the ability to have GuaranteedSafepointInterval work
>>>>          right when we are doing lots of non-safepoint VM operations.
>>>>
>>>>          This no_op_safepoint_needed() probe point, either with 'true' or
>>>>          'false', also gave us the opportunity to force a safepoint when
>>>>          SafepointALot is true or when 
>>>> SafepointSynchronize::is_cleanup_needed()
>>>>          returns true. In the v2 version of this fix, we lose the ability
>>>>          to SafepointALot after each VM operation. We also lose the ability
>>>>          to do cleanup() after each VM operation.
>>>
>>> And I don't think that is an issue, ICache and Monitor, which is the cleanup we
>>> do, just have just basic heuristic. If we cleanup every second or every other
>>> second makes little difference.
>>
>> My point is that you are changing the semantics of SafepointALot without
>> stating clearly that is what you are doing. The point of SafepointALot is
>> to inject a safepoint after every VMop even those VMops that have to be
>> executed at a safepoint. SafepointALot is considered a stress option for
>> a very good reason.
>>
>>
>>> SafepointALot is just diag, which causes safepoint directly after a safepoint.
>>> I'd rather have it doing one extra safepoint per GuaranteedSafepointInterval,
>>> that way you know how many safepoint you have per second, which is really useful
>>> for benchmarking. In current form it can't be used for benchmarks.
>>
>> You would never use SafepointALot in a benchmark run. It is is
>> stress option.
>>
>>>
>>>>
>>>>          I think you need to restore the "bool check_time" parameter to
>>>>          no_op_safepoint() and you need to restore this call to as:
>>>>          VMThread::no_op_safepoint_needed(true) along with the rest
>>>>          of old L618-L626.
>>>
>>> I don't need the parameter, we can always check the time.
>>> When we timeout on queue wait we will always pass the time check.
>>
>> As I pointed out, if you are doing non-safepoint VM ops, you can miss
>> a time check.
>>
>>
>>> So the difference is that I think the previous behavior was bad/buggy and you 
>>> are trying to restore it :)
>>>
>>> Check-out v3, just full:
>>> http://cr.openjdk.java.net/~rehn/8220774/v3/webrev/
>>
>> I'll post a second message with my re-review for this webrev.
>>
>>
>>>
>>> - wait on queue GuaranteedSafepointInterval
>>> - if timeout
>>>     - do handshake a lot
>>>     - we are guaranteed to pass the time check, so cleanup if needed
>>>     - else safepoint alot
>>> - if op
>>>     - after op
>>>         - do handshake a lot
>>>         - check time, if expired, cleanup if needed
>>>         - else safepoint alot
>>>
>>> I think that is what you want :)
>>> And that's what in v3.
>>>
>>> I would rather have:
>>> - wait-time = GuaranteedSafepointInterval - 
>>> SafepointTracing::time_since_last_safepoint_ms()
>>> - if wait-time > 0
>>>     - wait on queue
>>> - else or timeout
>>>     - do handshake a lot
>>>     - cleanup if needed
>>>     - else safepoint alot
>>>
>>> That way we do check for cleanup after non-safepoint in time but we do not
>>> causes extra safepoints after safepoints with SafepointALot. (I know you 
>>> think of this as feature :) )
>>
>> Not so much a feature as a stress option. I love my stress options.
>>
>> Dan
>>
>>>
>>> Thanks, Robbin
>>
> 


More information about the hotspot-runtime-dev mailing list