RFR(s): 8220774: Add HandshakeALot diag option
David Holmes
david.holmes at oracle.com
Thu Mar 21 22:01:45 UTC 2019
I will review now that the dust has settled :)
Might be a few hours.
David
On 22/03/2019 2:04 am, Robbin Ehn wrote:
> On 2019-03-21 16:11, Daniel D. Daugherty wrote:
>>> Check-out v3, just full:
>>> http://cr.openjdk.java.net/~rehn/8220774/v3/webrev/
>>
>> src/hotspot/share/runtime/globals.hpp
>> No comments.
>>
>> src/hotspot/share/runtime/thread.hpp
>> No comments.
>>
>> src/hotspot/share/runtime/thread.cpp
>> L2954: assert((!has_last_Java_frame() && java_call_counter() ==
>> 0) ||
>> L2955: (has_last_Java_frame() && java_call_counter() >
>> 0), "wrong java_sp info!");
>> Perhaps change to this:
>>
>> assert((!has_last_Java_frame() && java_call_counter() ==
>> 0) ||
>> (has_last_Java_frame() && java_call_counter() > 0),
>> "unexpected frame info: has_last_frame=%d,
>> java_call_counter=%d",
>> has_last_Java_frame(), java_call_counter());
>>
>> and consider making the same change to the nmethod version.
>>
>
> Fixed both!
>
>> src/hotspot/share/runtime/vmThread.hpp
>> No comments.
>>
>> src/hotspot/share/runtime/vmThread.cpp
>> L449: // Must check for handshakes first, since ops returns.
>> Perhaps:
>> // Check for handshakes first since we may need to return
>> a VMop.
>>
>> L454: // Check for a true cleanup first, trying to keep stats
>> correct.
>> Perhaps:
>> // Check for a cleanup before SafepointALot to keep stats
>> correct.
>>
>> L635: // We want to make sure that we get to a safepoint
>> regularly.
>> Perhaps an addition:
>> // We want to make sure that we get to a safepoint
>> regularly
>> // even when executing VMops that don't require
>> safepoints.
>>
>
> Fixed above!
>
> Publishing v4, since I need a second review:
> Full:
> http://cr.openjdk.java.net/~rehn/8220774/v4/webrev/
> Inc:
> http://cr.openjdk.java.net/~rehn/8220774/v4/inc/webrev/
>
> Thanks, Robbin
>
>> Okay, I now grok why VMThread::no_op_safepoint() does not need a
>> 'check_time' parameter. Thanks for restoring the no_op_safepoint()
>> call at the bottom of the "while(true)" in VMThread::loop().
>>
>> If you decide to tweak the semantics for SafepointALot down the road,
>> it would be best to do that in its own bug rather than as part of
>> another bug.
>>
>> Thumbs up! Your call on whether to tweak the assert or change the
>> comments.
>>
>> Dan
>>
>>
>>
>> On 3/21/19 10:28 AM, Daniel D. Daugherty wrote:
>>> On 3/21/19 9:57 AM, Robbin Ehn wrote:
>>>> Hi Dan,
>>>>
>>>> On 2019-03-20 21:21, Daniel D. Daugherty wrote:
>>>>>> And here is v2 for you to consider:
>>>>>> http://rehn-ws.se.oracle.com/cr_mirror/8220774/v2/webrev/
>>>>>
>>>>> It's too difficult to craft my comments relative to the incremental
>>>>> webrev so I'm working with the full webrev still.
>>>>>
>>>>>
>>>>> src/hotspot/share/runtime/globals.hpp
>>>>> No comments.
>>>>>
>>>>> src/hotspot/share/runtime/thread.hpp
>>>>> No comments.
>>>>>
>>>>> src/hotspot/share/runtime/thread.cpp
>>>>> L2953: // This checks that the thread have a correct frame
>>>>> state during a handshake
>>>>> typo: s/have/has/
>>>>> Also please add a '.' to end of the sentence.
>>>>
>>>> Fixed.
>>>>
>>>>>
>>>>> L2954: assert((!has_last_Java_frame() && java_call_counter()
>>>>> == 0) ||
>>>>> L2955: (has_last_Java_frame() && java_call_counter()
>>>>> > 0), "wrong java_sp info!");
>>>>> Trying to make sure I understand what you are asserting:
>>>>>
>>>>> 1) if the thread does not have a last Java frame then it
>>>>> must have
>>>>> never called Java
>>>>> 2) if the thread has a last Java frame, then it must have
>>>>> called
>>>>> Java at least once
>>>>>
>>>>> I'm good with the second expression. I have some vague
>>>>> doubts about
>>>>> first expression. When a JavaThread is done executing Java
>>>>> code and
>>>>> it is on its way toward a thread exit, is there ever a
>>>>> time when it
>>>>> no longer has a last Java frame? I'm thinking a handshake
>>>>> late in
>>>>> the JavaThread's life...
>>>>
>>>> The assert is a copy paste from
>>>> L2959 void JavaThread::nmethods_do(CodeBlobClosure* cf) {
>>>>
>>>> I have seen this assert trigger twice on windows, either it's wrong
>>>> and should
>>>> be removed from nmethods_do, or it is correct and this will help us
>>>> find when
>>>> this happens.
>>>
>>> Discussed this with Robbin via chat. I'm good with leaving
>>> the assert in place. Perhaps tweak both like this:
>>>
>>> assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>>> (has_last_Java_frame() && java_call_counter() > 0),
>>> "unexpected frame info: has_last_frame=%d,
>>> java_call_counter=%d",
>>> has_last_Java_frame(), java_call_counter());
>>>
>>>>
>>>>>
>>>>> src/hotspot/share/runtime/vmThread.hpp
>>>>> No comments.
>>>>>
>>>>> src/hotspot/share/runtime/vmThread.cpp
>>>>> L503: MutexUnlockerEx mul(VMOperationQueue_lock,
>>>>> Mutex::_no_safepoint_check_flag);
>>>>> Please add something like this above L503:
>>>>>
>>>>> // Have to unlock VMOperationQueue_lock just
>>>>> in case
>>>>> // no_op_safepoint() has to do a handshake.
>>>>>
>>>>> old L619: // We want to make sure that we get to a
>>>>> safepoint regularly.
>>>>> old L620: //
>>>>> old L621: if ((_cur_vm_operation =
>>>>> VMThread::no_op_safepoint(false)) != NULL) {
>>>>> This call to no_op_safepoint() is at the bottom of the
>>>>> "while(true)"
>>>>> loop in VMThread::loop(). Before the fix for 8219436, this
>>>>> line used to be:
>>>>>
>>>>> if (VMThread::no_op_safepoint_needed(true)) {
>>>>>
>>>>> and it was the only call to no_op_safepoint_needed() that
>>>>> passed
>>>>> 'true'. It's the 'true' parameter that made the comment on
>>>>> L619
>>>>> correct. Why do I say that? Well, the only way we get to the
>>>>> bottom of the "while(true)" loop is if we had a
>>>>> vm_operation to
>>>>> perform. Let's say we had a bunch of no-safepoint
>>>>> vm_operations
>>>>> to perform and we just kept executing them, one after
>>>>> another.
>>>>> While doing this work, if our time between safepoints exceeds
>>>>> GuaranteedSafepointInterval, then this
>>>>> no_op_safepoint_needed(true)
>>>>> call is what would detect that we've gone too long between
>>>>> safepoints
>>>>> and would force one.
>>>>>
>>>>> With the fix for 8219436, that call was changed to
>>>>> no_op_safepoint(false)
>>>>> and with the v2 version of this fix that call is now gone.
>>>>> So we no
>>>>> longer have the ability to have
>>>>> GuaranteedSafepointInterval work
>>>>> right when we are doing lots of non-safepoint VM operations.
>>>>>
>>>>> This no_op_safepoint_needed() probe point, either with
>>>>> 'true' or
>>>>> 'false', also gave us the opportunity to force a safepoint
>>>>> when
>>>>> SafepointALot is true or when
>>>>> SafepointSynchronize::is_cleanup_needed()
>>>>> returns true. In the v2 version of this fix, we lose the
>>>>> ability
>>>>> to SafepointALot after each VM operation. We also lose the
>>>>> ability
>>>>> to do cleanup() after each VM operation.
>>>>
>>>> And I don't think that is an issue, ICache and Monitor, which is the
>>>> cleanup we
>>>> do, just have just basic heuristic. If we cleanup every second or
>>>> every other
>>>> second makes little difference.
>>>
>>> My point is that you are changing the semantics of SafepointALot without
>>> stating clearly that is what you are doing. The point of
>>> SafepointALot is
>>> to inject a safepoint after every VMop even those VMops that have to be
>>> executed at a safepoint. SafepointALot is considered a stress option for
>>> a very good reason.
>>>
>>>
>>>> SafepointALot is just diag, which causes safepoint directly after a
>>>> safepoint.
>>>> I'd rather have it doing one extra safepoint per
>>>> GuaranteedSafepointInterval,
>>>> that way you know how many safepoint you have per second, which is
>>>> really useful
>>>> for benchmarking. In current form it can't be used for benchmarks.
>>>
>>> You would never use SafepointALot in a benchmark run. It is is
>>> stress option.
>>>
>>>>
>>>>>
>>>>> I think you need to restore the "bool check_time"
>>>>> parameter to
>>>>> no_op_safepoint() and you need to restore this call to as:
>>>>> VMThread::no_op_safepoint_needed(true) along with the rest
>>>>> of old L618-L626.
>>>>
>>>> I don't need the parameter, we can always check the time.
>>>> When we timeout on queue wait we will always pass the time check.
>>>
>>> As I pointed out, if you are doing non-safepoint VM ops, you can miss
>>> a time check.
>>>
>>>
>>>> So the difference is that I think the previous behavior was
>>>> bad/buggy and you are trying to restore it :)
>>>>
>>>> Check-out v3, just full:
>>>> http://cr.openjdk.java.net/~rehn/8220774/v3/webrev/
>>>
>>> I'll post a second message with my re-review for this webrev.
>>>
>>>
>>>>
>>>> - wait on queue GuaranteedSafepointInterval
>>>> - if timeout
>>>> - do handshake a lot
>>>> - we are guaranteed to pass the time check, so cleanup if needed
>>>> - else safepoint alot
>>>> - if op
>>>> - after op
>>>> - do handshake a lot
>>>> - check time, if expired, cleanup if needed
>>>> - else safepoint alot
>>>>
>>>> I think that is what you want :)
>>>> And that's what in v3.
>>>>
>>>> I would rather have:
>>>> - wait-time = GuaranteedSafepointInterval -
>>>> SafepointTracing::time_since_last_safepoint_ms()
>>>> - if wait-time > 0
>>>> - wait on queue
>>>> - else or timeout
>>>> - do handshake a lot
>>>> - cleanup if needed
>>>> - else safepoint alot
>>>>
>>>> That way we do check for cleanup after non-safepoint in time but we
>>>> do not
>>>> causes extra safepoints after safepoints with SafepointALot. (I know
>>>> you think of this as feature :) )
>>>
>>> Not so much a feature as a stress option. I love my stress options.
>>>
>>> Dan
>>>
>>>>
>>>> Thanks, Robbin
>>>
>>
More information about the hotspot-runtime-dev
mailing list