RFR(s): 8220774: Add HandshakeALot diag option

Robbin Ehn robbin.ehn at oracle.com
Fri Mar 22 15:29:34 UTC 2019


Hi David,

On 3/22/19 8:05 AM, David Holmes wrote:
> Hi Robbin,
> 
> This was a little more complex than I had imagined. :) A couple of comments:
> 
> src/hotspot/share/runtime/vmThread.cpp
> 
>   508           // Have to unlock VMOperationQueue_lock just in case 
> no_op_safepoint()
>   509           // has to do a handshake.
>   510           MutexUnlockerEx mul(VMOperationQueue_lock, 
> Mutex::_no_safepoint_check_flag);
>   511           if (timedout && (_cur_vm_operation = 
> VMThread::no_op_safepoint()) != NULL) {
> 
> wouldn't it be better to check timedout first and only then use the unlocker 
> then check _cur_vm_operation?

Fixed.

> 
> ---
> 
> src/hotspot/share/runtime/thread.cpp
> 
> Can has_last_Java_frame() and java_call_counter() change values between their 
> use in the assert and their use in the assert message?

No, the JavaThread is safepoint_safe here, thus must have a stable java stack.

> 
> ---
> 
> Don't you want to add some tests that exercise this? Or update existing tests 
> that use SafepointALot to also use HandshakeALot?

Fixed!

All handshakes test passed 100 iteration on each platform.
But the "8221207: Redo JDK-8218446 - SuspendAtExit hangs" should go in first.

v5 full:
http://cr.openjdk.java.net/~rehn/8220774/v5/full/webrev/
v5 inc:
http://cr.openjdk.java.net/~rehn/8220774/v5/inc/webrev/

Thanks, Robbin

> 
> Thanks,
> David
> 
> On 22/03/2019 2:04 am, Robbin Ehn wrote:
>> On 2019-03-21 16:11, Daniel D. Daugherty wrote:
>>>> Check-out v3, just full:
>>>> http://cr.openjdk.java.net/~rehn/8220774/v3/webrev/ 
>>>
>>> src/hotspot/share/runtime/globals.hpp
>>>      No comments.
>>>
>>> src/hotspot/share/runtime/thread.hpp
>>>      No comments.
>>>
>>> src/hotspot/share/runtime/thread.cpp
>>>      L2954:   assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>>>      L2955:          (has_last_Java_frame() && java_call_counter() > 0), 
>>> "wrong java_sp info!");
>>>         Perhaps change to this:
>>>
>>>               assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>>>                      (has_last_Java_frame() && java_call_counter() > 0),
>>>                      "unexpected frame info: has_last_frame=%d, 
>>> java_call_counter=%d",
>>>                      has_last_Java_frame(), java_call_counter());
>>>
>>>          and consider making the same change to the nmethod version.
>>>
>>
>> Fixed both!
>>
>>> src/hotspot/share/runtime/vmThread.hpp
>>>      No comments.
>>>
>>> src/hotspot/share/runtime/vmThread.cpp
>>>      L449:   // Must check for handshakes first, since ops returns.
>>>          Perhaps:
>>>              // Check for handshakes first since we may need to return a VMop.
>>>
>>>      L454:   // Check for a true cleanup first, trying to keep stats correct.
>>>          Perhaps:
>>>              // Check for a cleanup before SafepointALot to keep stats correct.
>>>
>>>      L635:     // We want to make sure that we get to a safepoint regularly.
>>>          Perhaps an addition:
>>>                // We want to make sure that we get to a safepoint regularly
>>>                // even when executing VMops that don't require safepoints.
>>>
>>
>> Fixed above!
>>
>> Publishing v4, since I need a second review:
>> Full:
>> http://cr.openjdk.java.net/~rehn/8220774/v4/webrev/
>> Inc:
>> http://cr.openjdk.java.net/~rehn/8220774/v4/inc/webrev/
>>
>> Thanks, Robbin
>>
>>> Okay, I now grok why VMThread::no_op_safepoint() does not need a
>>> 'check_time' parameter. Thanks for restoring the no_op_safepoint()
>>> call at the bottom of the "while(true)" in VMThread::loop().
>>>
>>> If you decide to tweak the semantics for SafepointALot down the road,
>>> it would be best to do that in its own bug rather than as part of
>>> another bug.
>>>
>>> Thumbs up! Your call on whether to tweak the assert or change the comments.
>>>
>>> Dan
>>>
>>>
>>>
>>> On 3/21/19 10:28 AM, Daniel D. Daugherty wrote:
>>>> On 3/21/19 9:57 AM, Robbin Ehn wrote:
>>>>> Hi Dan,
>>>>>
>>>>> On 2019-03-20 21:21, Daniel D. Daugherty wrote:
>>>>>>> And here is v2 for you to consider:
>>>>>>> http://rehn-ws.se.oracle.com/cr_mirror/8220774/v2/webrev/ 
>>>>>>
>>>>>> It's too difficult to craft my comments relative to the incremental
>>>>>> webrev so I'm working with the full webrev still.
>>>>>>
>>>>>>
>>>>>> src/hotspot/share/runtime/globals.hpp
>>>>>>      No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/thread.hpp
>>>>>>      No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/thread.cpp
>>>>>>      L2953:   // This checks that the thread have a correct frame state 
>>>>>> during a handshake
>>>>>>          typo: s/have/has/
>>>>>>          Also please add a '.' to end of the sentence.
>>>>>
>>>>> Fixed.
>>>>>
>>>>>>
>>>>>>      L2954:   assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>>>>>>      L2955:          (has_last_Java_frame() && java_call_counter() > 0), 
>>>>>> "wrong java_sp info!");
>>>>>>          Trying to make sure I understand what you are asserting:
>>>>>>
>>>>>>          1) if the thread does not have a last Java frame then it must have
>>>>>>             never called Java
>>>>>>          2) if the thread has a last Java frame, then it must have called
>>>>>>             Java at least once
>>>>>>
>>>>>>          I'm good with the second expression. I have some vague doubts about
>>>>>>          first expression. When a JavaThread is done executing Java code and
>>>>>>          it is on its way toward a thread exit, is there ever a time when it
>>>>>>          no longer has a last Java frame? I'm thinking a handshake late in
>>>>>>          the JavaThread's life...
>>>>>
>>>>> The assert is a copy paste from
>>>>> L2959 void JavaThread::nmethods_do(CodeBlobClosure* cf) {
>>>>>
>>>>> I have seen this assert trigger twice on windows, either it's wrong and should
>>>>> be removed from nmethods_do, or it is correct and this will help us find when
>>>>> this happens.
>>>>
>>>> Discussed this with Robbin via chat. I'm good with leaving
>>>> the assert in place. Perhaps tweak both like this:
>>>>
>>>>     assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>>>>            (has_last_Java_frame() && java_call_counter() > 0),
>>>>            "unexpected frame info: has_last_frame=%d, java_call_counter=%d",
>>>>            has_last_Java_frame(), java_call_counter());
>>>>
>>>>>
>>>>>>
>>>>>> src/hotspot/share/runtime/vmThread.hpp
>>>>>>      No comments.
>>>>>>
>>>>>> src/hotspot/share/runtime/vmThread.cpp
>>>>>>      L503:           MutexUnlockerEx mul(VMOperationQueue_lock, 
>>>>>> Mutex::_no_safepoint_check_flag);
>>>>>>          Please add something like this above L503:
>>>>>>
>>>>>>                      // Have to unlock VMOperationQueue_lock just in case
>>>>>>                      // no_op_safepoint() has to do a handshake.
>>>>>>
>>>>>>      old L619:     // We want to make sure that we get to a safepoint 
>>>>>> regularly.
>>>>>>      old L620:     //
>>>>>>      old L621:     if ((_cur_vm_operation = 
>>>>>> VMThread::no_op_safepoint(false)) != NULL) {
>>>>>>          This call to no_op_safepoint() is at the bottom of the "while(true)"
>>>>>>          loop in VMThread::loop(). Before the fix for 8219436, this line 
>>>>>> used to be:
>>>>>>
>>>>>>              if (VMThread::no_op_safepoint_needed(true)) {
>>>>>>
>>>>>>          and it was the only call to no_op_safepoint_needed() that passed
>>>>>>          'true'. It's the 'true' parameter that made the comment on L619
>>>>>>          correct. Why do I say that? Well, the only way we get to the
>>>>>>          bottom of the "while(true)" loop is if we had a vm_operation to
>>>>>>          perform. Let's say we had a bunch of no-safepoint vm_operations
>>>>>>          to perform and we just kept executing them, one after another.
>>>>>>          While doing this work, if our time between safepoints exceeds
>>>>>>          GuaranteedSafepointInterval, then this no_op_safepoint_needed(true)
>>>>>>          call is what would detect that we've gone too long between 
>>>>>> safepoints
>>>>>>          and would force one.
>>>>>>
>>>>>>          With the fix for 8219436, that call was changed to 
>>>>>> no_op_safepoint(false)
>>>>>>          and with the v2 version of this fix that call is now gone. So we no
>>>>>>          longer have the ability to have GuaranteedSafepointInterval work
>>>>>>          right when we are doing lots of non-safepoint VM operations.
>>>>>>
>>>>>>          This no_op_safepoint_needed() probe point, either with 'true' or
>>>>>>          'false', also gave us the opportunity to force a safepoint when
>>>>>>          SafepointALot is true or when 
>>>>>> SafepointSynchronize::is_cleanup_needed()
>>>>>>          returns true. In the v2 version of this fix, we lose the ability
>>>>>>          to SafepointALot after each VM operation. We also lose the ability
>>>>>>          to do cleanup() after each VM operation.
>>>>>
>>>>> And I don't think that is an issue, ICache and Monitor, which is the 
>>>>> cleanup we
>>>>> do, just have just basic heuristic. If we cleanup every second or every other
>>>>> second makes little difference.
>>>>
>>>> My point is that you are changing the semantics of SafepointALot without
>>>> stating clearly that is what you are doing. The point of SafepointALot is
>>>> to inject a safepoint after every VMop even those VMops that have to be
>>>> executed at a safepoint. SafepointALot is considered a stress option for
>>>> a very good reason.
>>>>
>>>>
>>>>> SafepointALot is just diag, which causes safepoint directly after a safepoint.
>>>>> I'd rather have it doing one extra safepoint per GuaranteedSafepointInterval,
>>>>> that way you know how many safepoint you have per second, which is really 
>>>>> useful
>>>>> for benchmarking. In current form it can't be used for benchmarks.
>>>>
>>>> You would never use SafepointALot in a benchmark run. It is is
>>>> stress option.
>>>>
>>>>>
>>>>>>
>>>>>>          I think you need to restore the "bool check_time" parameter to
>>>>>>          no_op_safepoint() and you need to restore this call to as:
>>>>>>          VMThread::no_op_safepoint_needed(true) along with the rest
>>>>>>          of old L618-L626.
>>>>>
>>>>> I don't need the parameter, we can always check the time.
>>>>> When we timeout on queue wait we will always pass the time check.
>>>>
>>>> As I pointed out, if you are doing non-safepoint VM ops, you can miss
>>>> a time check.
>>>>
>>>>
>>>>> So the difference is that I think the previous behavior was bad/buggy and 
>>>>> you are trying to restore it :)
>>>>>
>>>>> Check-out v3, just full:
>>>>> http://cr.openjdk.java.net/~rehn/8220774/v3/webrev/
>>>>
>>>> I'll post a second message with my re-review for this webrev.
>>>>
>>>>
>>>>>
>>>>> - wait on queue GuaranteedSafepointInterval
>>>>> - if timeout
>>>>>     - do handshake a lot
>>>>>     - we are guaranteed to pass the time check, so cleanup if needed
>>>>>     - else safepoint alot
>>>>> - if op
>>>>>     - after op
>>>>>         - do handshake a lot
>>>>>         - check time, if expired, cleanup if needed
>>>>>         - else safepoint alot
>>>>>
>>>>> I think that is what you want :)
>>>>> And that's what in v3.
>>>>>
>>>>> I would rather have:
>>>>> - wait-time = GuaranteedSafepointInterval - 
>>>>> SafepointTracing::time_since_last_safepoint_ms()
>>>>> - if wait-time > 0
>>>>>     - wait on queue
>>>>> - else or timeout
>>>>>     - do handshake a lot
>>>>>     - cleanup if needed
>>>>>     - else safepoint alot
>>>>>
>>>>> That way we do check for cleanup after non-safepoint in time but we do not
>>>>> causes extra safepoints after safepoints with SafepointALot. (I know you 
>>>>> think of this as feature :) )
>>>>
>>>> Not so much a feature as a stress option. I love my stress options.
>>>>
>>>> Dan
>>>>
>>>>>
>>>>> Thanks, Robbin
>>>>
>>>


More information about the hotspot-runtime-dev mailing list