RFR(s): 8220774: Add HandshakeALot diag option
Robbin Ehn
robbin.ehn at oracle.com
Thu Mar 21 16:04:16 UTC 2019
On 2019-03-21 16:11, Daniel D. Daugherty wrote:
>> Check-out v3, just full:
>> http://cr.openjdk.java.net/~rehn/8220774/v3/webrev/
>
> src/hotspot/share/runtime/globals.hpp
> No comments.
>
> src/hotspot/share/runtime/thread.hpp
> No comments.
>
> src/hotspot/share/runtime/thread.cpp
> L2954: assert((!has_last_Java_frame() && java_call_counter() == 0) ||
> L2955: (has_last_Java_frame() && java_call_counter() > 0), "wrong
> java_sp info!");
> Perhaps change to this:
>
> assert((!has_last_Java_frame() && java_call_counter() == 0) ||
> (has_last_Java_frame() && java_call_counter() > 0),
> "unexpected frame info: has_last_frame=%d,
> java_call_counter=%d",
> has_last_Java_frame(), java_call_counter());
>
> and consider making the same change to the nmethod version.
>
Fixed both!
> src/hotspot/share/runtime/vmThread.hpp
> No comments.
>
> src/hotspot/share/runtime/vmThread.cpp
> L449: // Must check for handshakes first, since ops returns.
> Perhaps:
> // Check for handshakes first since we may need to return a VMop.
>
> L454: // Check for a true cleanup first, trying to keep stats correct.
> Perhaps:
> // Check for a cleanup before SafepointALot to keep stats correct.
>
> L635: // We want to make sure that we get to a safepoint regularly.
> Perhaps an addition:
> // We want to make sure that we get to a safepoint regularly
> // even when executing VMops that don't require safepoints.
>
Fixed above!
Publishing v4, since I need a second review:
Full:
http://cr.openjdk.java.net/~rehn/8220774/v4/webrev/
Inc:
http://cr.openjdk.java.net/~rehn/8220774/v4/inc/webrev/
Thanks, Robbin
> Okay, I now grok why VMThread::no_op_safepoint() does not need a
> 'check_time' parameter. Thanks for restoring the no_op_safepoint()
> call at the bottom of the "while(true)" in VMThread::loop().
>
> If you decide to tweak the semantics for SafepointALot down the road,
> it would be best to do that in its own bug rather than as part of
> another bug.
>
> Thumbs up! Your call on whether to tweak the assert or change the comments.
>
> Dan
>
>
>
> On 3/21/19 10:28 AM, Daniel D. Daugherty wrote:
>> On 3/21/19 9:57 AM, Robbin Ehn wrote:
>>> Hi Dan,
>>>
>>> On 2019-03-20 21:21, Daniel D. Daugherty wrote:
>>>>> And here is v2 for you to consider:
>>>>> http://rehn-ws.se.oracle.com/cr_mirror/8220774/v2/webrev/
>>>>
>>>> It's too difficult to craft my comments relative to the incremental
>>>> webrev so I'm working with the full webrev still.
>>>>
>>>>
>>>> src/hotspot/share/runtime/globals.hpp
>>>> No comments.
>>>>
>>>> src/hotspot/share/runtime/thread.hpp
>>>> No comments.
>>>>
>>>> src/hotspot/share/runtime/thread.cpp
>>>> L2953: // This checks that the thread have a correct frame state
>>>> during a handshake
>>>> typo: s/have/has/
>>>> Also please add a '.' to end of the sentence.
>>>
>>> Fixed.
>>>
>>>>
>>>> L2954: assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>>>> L2955: (has_last_Java_frame() && java_call_counter() > 0),
>>>> "wrong java_sp info!");
>>>> Trying to make sure I understand what you are asserting:
>>>>
>>>> 1) if the thread does not have a last Java frame then it must have
>>>> never called Java
>>>> 2) if the thread has a last Java frame, then it must have called
>>>> Java at least once
>>>>
>>>> I'm good with the second expression. I have some vague doubts about
>>>> first expression. When a JavaThread is done executing Java code and
>>>> it is on its way toward a thread exit, is there ever a time when it
>>>> no longer has a last Java frame? I'm thinking a handshake late in
>>>> the JavaThread's life...
>>>
>>> The assert is a copy paste from
>>> L2959 void JavaThread::nmethods_do(CodeBlobClosure* cf) {
>>>
>>> I have seen this assert trigger twice on windows, either it's wrong and should
>>> be removed from nmethods_do, or it is correct and this will help us find when
>>> this happens.
>>
>> Discussed this with Robbin via chat. I'm good with leaving
>> the assert in place. Perhaps tweak both like this:
>>
>> assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>> (has_last_Java_frame() && java_call_counter() > 0),
>> "unexpected frame info: has_last_frame=%d, java_call_counter=%d",
>> has_last_Java_frame(), java_call_counter());
>>
>>>
>>>>
>>>> src/hotspot/share/runtime/vmThread.hpp
>>>> No comments.
>>>>
>>>> src/hotspot/share/runtime/vmThread.cpp
>>>> L503: MutexUnlockerEx mul(VMOperationQueue_lock,
>>>> Mutex::_no_safepoint_check_flag);
>>>> Please add something like this above L503:
>>>>
>>>> // Have to unlock VMOperationQueue_lock just in case
>>>> // no_op_safepoint() has to do a handshake.
>>>>
>>>> old L619: // We want to make sure that we get to a safepoint
>>>> regularly.
>>>> old L620: //
>>>> old L621: if ((_cur_vm_operation =
>>>> VMThread::no_op_safepoint(false)) != NULL) {
>>>> This call to no_op_safepoint() is at the bottom of the "while(true)"
>>>> loop in VMThread::loop(). Before the fix for 8219436, this line
>>>> used to be:
>>>>
>>>> if (VMThread::no_op_safepoint_needed(true)) {
>>>>
>>>> and it was the only call to no_op_safepoint_needed() that passed
>>>> 'true'. It's the 'true' parameter that made the comment on L619
>>>> correct. Why do I say that? Well, the only way we get to the
>>>> bottom of the "while(true)" loop is if we had a vm_operation to
>>>> perform. Let's say we had a bunch of no-safepoint vm_operations
>>>> to perform and we just kept executing them, one after another.
>>>> While doing this work, if our time between safepoints exceeds
>>>> GuaranteedSafepointInterval, then this no_op_safepoint_needed(true)
>>>> call is what would detect that we've gone too long between safepoints
>>>> and would force one.
>>>>
>>>> With the fix for 8219436, that call was changed to
>>>> no_op_safepoint(false)
>>>> and with the v2 version of this fix that call is now gone. So we no
>>>> longer have the ability to have GuaranteedSafepointInterval work
>>>> right when we are doing lots of non-safepoint VM operations.
>>>>
>>>> This no_op_safepoint_needed() probe point, either with 'true' or
>>>> 'false', also gave us the opportunity to force a safepoint when
>>>> SafepointALot is true or when
>>>> SafepointSynchronize::is_cleanup_needed()
>>>> returns true. In the v2 version of this fix, we lose the ability
>>>> to SafepointALot after each VM operation. We also lose the ability
>>>> to do cleanup() after each VM operation.
>>>
>>> And I don't think that is an issue, ICache and Monitor, which is the cleanup we
>>> do, just have just basic heuristic. If we cleanup every second or every other
>>> second makes little difference.
>>
>> My point is that you are changing the semantics of SafepointALot without
>> stating clearly that is what you are doing. The point of SafepointALot is
>> to inject a safepoint after every VMop even those VMops that have to be
>> executed at a safepoint. SafepointALot is considered a stress option for
>> a very good reason.
>>
>>
>>> SafepointALot is just diag, which causes safepoint directly after a safepoint.
>>> I'd rather have it doing one extra safepoint per GuaranteedSafepointInterval,
>>> that way you know how many safepoint you have per second, which is really useful
>>> for benchmarking. In current form it can't be used for benchmarks.
>>
>> You would never use SafepointALot in a benchmark run. It is is
>> stress option.
>>
>>>
>>>>
>>>> I think you need to restore the "bool check_time" parameter to
>>>> no_op_safepoint() and you need to restore this call to as:
>>>> VMThread::no_op_safepoint_needed(true) along with the rest
>>>> of old L618-L626.
>>>
>>> I don't need the parameter, we can always check the time.
>>> When we timeout on queue wait we will always pass the time check.
>>
>> As I pointed out, if you are doing non-safepoint VM ops, you can miss
>> a time check.
>>
>>
>>> So the difference is that I think the previous behavior was bad/buggy and you
>>> are trying to restore it :)
>>>
>>> Check-out v3, just full:
>>> http://cr.openjdk.java.net/~rehn/8220774/v3/webrev/
>>
>> I'll post a second message with my re-review for this webrev.
>>
>>
>>>
>>> - wait on queue GuaranteedSafepointInterval
>>> - if timeout
>>> - do handshake a lot
>>> - we are guaranteed to pass the time check, so cleanup if needed
>>> - else safepoint alot
>>> - if op
>>> - after op
>>> - do handshake a lot
>>> - check time, if expired, cleanup if needed
>>> - else safepoint alot
>>>
>>> I think that is what you want :)
>>> And that's what in v3.
>>>
>>> I would rather have:
>>> - wait-time = GuaranteedSafepointInterval -
>>> SafepointTracing::time_since_last_safepoint_ms()
>>> - if wait-time > 0
>>> - wait on queue
>>> - else or timeout
>>> - do handshake a lot
>>> - cleanup if needed
>>> - else safepoint alot
>>>
>>> That way we do check for cleanup after non-safepoint in time but we do not
>>> causes extra safepoints after safepoints with SafepointALot. (I know you
>>> think of this as feature :) )
>>
>> Not so much a feature as a stress option. I love my stress options.
>>
>> Dan
>>
>>>
>>> Thanks, Robbin
>>
>
More information about the hotspot-runtime-dev
mailing list