RFR(s): 8220774: Add HandshakeALot diag option
Robbin Ehn
robbin.ehn at oracle.com
Mon Mar 25 08:22:11 UTC 2019
Hi Dan,
> test/hotspot/jtreg/ProblemList.txt
> JDK-8214174 is still open. Are you going to list it in this changeset
> or close it as a dup of this bug?
I was going to close it.
>
> test/hotspot/jtreg/runtime/handshake/HandshakeTransitionTest.java
> Please update the copyright year before you push.
>
> test/hotspot/jtreg/runtime/handshake/HandshakeSuspendExitTest.java
> L45: if (i++ > _suspend_threads.length -2) {
> s/-2/- 2/
>
> L60: // Fire-up suspend thread.
> s/thread/threads/
>
> L67: // Wait for all suspend thread starting to loop.
> s/thread/threads/
Fixed.
>
> L91: for (Thread thr : _suspend_threads) {
> L92: thr.resume();
> L93: }
> So why is there an extra resume per _suspend_threads element?
> You have matched suspend() and resume() calls so what am I
> missing?
If the two threads never suspended exit first, the other threads can cause a
circular suspend and they hanging forever. One of the issues in 8214174.
>
> test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
> So you're replacing this test with HandshakeSuspendExitTest.java.
Since we are not using withebox to walk thread stack, instead using
HandshakeALot I removed Walk from name.
>
> Thumbs up!
Thanks, Robbin
>
> Dan
>
>>
>> Thanks, Robbin
>>
>>> Thanks,
>>> David
>>>
>>> On 22/03/2019 2:04 am, Robbin Ehn wrote:
>>>> On 2019-03-21 16:11, Daniel D. Daugherty wrote:
>>>>>> Check-out v3, just full:
>>>>>> http://cr.openjdk.java.net/~rehn/8220774/v3/webrev/
>>>>> src/hotspot/share/runtime/globals.hpp
>>>>> No comments.
>>>>>
>>>>> src/hotspot/share/runtime/thread.hpp
>>>>> No comments.
>>>>>
>>>>> src/hotspot/share/runtime/thread.cpp
>>>>> L2954: assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>>>>> L2955: (has_last_Java_frame() && java_call_counter() > 0),
>>>>> "wrong java_sp info!");
>>>>> Perhaps change to this:
>>>>>
>>>>> assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>>>>> (has_last_Java_frame() && java_call_counter() > 0),
>>>>> "unexpected frame info: has_last_frame=%d,
>>>>> java_call_counter=%d",
>>>>> has_last_Java_frame(), java_call_counter());
>>>>>
>>>>> and consider making the same change to the nmethod version.
>>>>>
>>>> Fixed both!
>>>>
>>>>> src/hotspot/share/runtime/vmThread.hpp
>>>>> No comments.
>>>>>
>>>>> src/hotspot/share/runtime/vmThread.cpp
>>>>> L449: // Must check for handshakes first, since ops returns.
>>>>> Perhaps:
>>>>> // Check for handshakes first since we may need to return a
>>>>> VMop.
>>>>>
>>>>> L454: // Check for a true cleanup first, trying to keep stats correct.
>>>>> Perhaps:
>>>>> // Check for a cleanup before SafepointALot to keep stats
>>>>> correct.
>>>>>
>>>>> L635: // We want to make sure that we get to a safepoint regularly.
>>>>> Perhaps an addition:
>>>>> // We want to make sure that we get to a safepoint regularly
>>>>> // even when executing VMops that don't require safepoints.
>>>>>
>>>> Fixed above!
>>>>
>>>> Publishing v4, since I need a second review:
>>>> Full:
>>>> http://cr.openjdk.java.net/~rehn/8220774/v4/webrev/
>>>> Inc:
>>>> http://cr.openjdk.java.net/~rehn/8220774/v4/inc/webrev/
>>>>
>>>> Thanks, Robbin
>>>>
>>>>> Okay, I now grok why VMThread::no_op_safepoint() does not need a
>>>>> 'check_time' parameter. Thanks for restoring the no_op_safepoint()
>>>>> call at the bottom of the "while(true)" in VMThread::loop().
>>>>>
>>>>> If you decide to tweak the semantics for SafepointALot down the road,
>>>>> it would be best to do that in its own bug rather than as part of
>>>>> another bug.
>>>>>
>>>>> Thumbs up! Your call on whether to tweak the assert or change the comments.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>
>>>>> On 3/21/19 10:28 AM, Daniel D. Daugherty wrote:
>>>>>> On 3/21/19 9:57 AM, Robbin Ehn wrote:
>>>>>>> Hi Dan,
>>>>>>>
>>>>>>> On 2019-03-20 21:21, Daniel D. Daugherty wrote:
>>>>>>>>> And here is v2 for you to consider:
>>>>>>>>> http://rehn-ws.se.oracle.com/cr_mirror/8220774/v2/webrev/
>>>>>>>> It's too difficult to craft my comments relative to the incremental
>>>>>>>> webrev so I'm working with the full webrev still.
>>>>>>>>
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/globals.hpp
>>>>>>>> No comments.
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/thread.hpp
>>>>>>>> No comments.
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/thread.cpp
>>>>>>>> L2953: // This checks that the thread have a correct frame state
>>>>>>>> during a handshake
>>>>>>>> typo: s/have/has/
>>>>>>>> Also please add a '.' to end of the sentence.
>>>>>>> Fixed.
>>>>>>>
>>>>>>>> L2954: assert((!has_last_Java_frame() && java_call_counter() ==
>>>>>>>> 0) ||
>>>>>>>> L2955: (has_last_Java_frame() && java_call_counter() > 0),
>>>>>>>> "wrong java_sp info!");
>>>>>>>> Trying to make sure I understand what you are asserting:
>>>>>>>>
>>>>>>>> 1) if the thread does not have a last Java frame then it must
>>>>>>>> have
>>>>>>>> never called Java
>>>>>>>> 2) if the thread has a last Java frame, then it must have called
>>>>>>>> Java at least once
>>>>>>>>
>>>>>>>> I'm good with the second expression. I have some vague doubts
>>>>>>>> about
>>>>>>>> first expression. When a JavaThread is done executing Java
>>>>>>>> code and
>>>>>>>> it is on its way toward a thread exit, is there ever a time
>>>>>>>> when it
>>>>>>>> no longer has a last Java frame? I'm thinking a handshake late in
>>>>>>>> the JavaThread's life...
>>>>>>> The assert is a copy paste from
>>>>>>> L2959 void JavaThread::nmethods_do(CodeBlobClosure* cf) {
>>>>>>>
>>>>>>> I have seen this assert trigger twice on windows, either it's wrong and
>>>>>>> should
>>>>>>> be removed from nmethods_do, or it is correct and this will help us find
>>>>>>> when
>>>>>>> this happens.
>>>>>> Discussed this with Robbin via chat. I'm good with leaving
>>>>>> the assert in place. Perhaps tweak both like this:
>>>>>>
>>>>>> assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>>>>>> (has_last_Java_frame() && java_call_counter() > 0),
>>>>>> "unexpected frame info: has_last_frame=%d, java_call_counter=%d",
>>>>>> has_last_Java_frame(), java_call_counter());
>>>>>>
>>>>>>>> src/hotspot/share/runtime/vmThread.hpp
>>>>>>>> No comments.
>>>>>>>>
>>>>>>>> src/hotspot/share/runtime/vmThread.cpp
>>>>>>>> L503: MutexUnlockerEx mul(VMOperationQueue_lock,
>>>>>>>> Mutex::_no_safepoint_check_flag);
>>>>>>>> Please add something like this above L503:
>>>>>>>>
>>>>>>>> // Have to unlock VMOperationQueue_lock just in case
>>>>>>>> // no_op_safepoint() has to do a handshake.
>>>>>>>>
>>>>>>>> old L619: // We want to make sure that we get to a safepoint
>>>>>>>> regularly.
>>>>>>>> old L620: //
>>>>>>>> old L621: if ((_cur_vm_operation =
>>>>>>>> VMThread::no_op_safepoint(false)) != NULL) {
>>>>>>>> This call to no_op_safepoint() is at the bottom of the
>>>>>>>> "while(true)"
>>>>>>>> loop in VMThread::loop(). Before the fix for 8219436, this line
>>>>>>>> used to be:
>>>>>>>>
>>>>>>>> if (VMThread::no_op_safepoint_needed(true)) {
>>>>>>>>
>>>>>>>> and it was the only call to no_op_safepoint_needed() that passed
>>>>>>>> 'true'. It's the 'true' parameter that made the comment on L619
>>>>>>>> correct. Why do I say that? Well, the only way we get to the
>>>>>>>> bottom of the "while(true)" loop is if we had a vm_operation to
>>>>>>>> perform. Let's say we had a bunch of no-safepoint vm_operations
>>>>>>>> to perform and we just kept executing them, one after another.
>>>>>>>> While doing this work, if our time between safepoints exceeds
>>>>>>>> GuaranteedSafepointInterval, then this
>>>>>>>> no_op_safepoint_needed(true)
>>>>>>>> call is what would detect that we've gone too long between
>>>>>>>> safepoints
>>>>>>>> and would force one.
>>>>>>>>
>>>>>>>> With the fix for 8219436, that call was changed to
>>>>>>>> no_op_safepoint(false)
>>>>>>>> and with the v2 version of this fix that call is now gone. So
>>>>>>>> we no
>>>>>>>> longer have the ability to have GuaranteedSafepointInterval work
>>>>>>>> right when we are doing lots of non-safepoint VM operations.
>>>>>>>>
>>>>>>>> This no_op_safepoint_needed() probe point, either with 'true' or
>>>>>>>> 'false', also gave us the opportunity to force a safepoint when
>>>>>>>> SafepointALot is true or when
>>>>>>>> SafepointSynchronize::is_cleanup_needed()
>>>>>>>> returns true. In the v2 version of this fix, we lose the ability
>>>>>>>> to SafepointALot after each VM operation. We also lose the
>>>>>>>> ability
>>>>>>>> to do cleanup() after each VM operation.
>>>>>>> And I don't think that is an issue, ICache and Monitor, which is the
>>>>>>> cleanup we
>>>>>>> do, just have just basic heuristic. If we cleanup every second or every
>>>>>>> other
>>>>>>> second makes little difference.
>>>>>> My point is that you are changing the semantics of SafepointALot without
>>>>>> stating clearly that is what you are doing. The point of SafepointALot is
>>>>>> to inject a safepoint after every VMop even those VMops that have to be
>>>>>> executed at a safepoint. SafepointALot is considered a stress option for
>>>>>> a very good reason.
>>>>>>
>>>>>>
>>>>>>> SafepointALot is just diag, which causes safepoint directly after a
>>>>>>> safepoint.
>>>>>>> I'd rather have it doing one extra safepoint per
>>>>>>> GuaranteedSafepointInterval,
>>>>>>> that way you know how many safepoint you have per second, which is really
>>>>>>> useful
>>>>>>> for benchmarking. In current form it can't be used for benchmarks.
>>>>>> You would never use SafepointALot in a benchmark run. It is is
>>>>>> stress option.
>>>>>>
>>>>>>>> I think you need to restore the "bool check_time" parameter to
>>>>>>>> no_op_safepoint() and you need to restore this call to as:
>>>>>>>> VMThread::no_op_safepoint_needed(true) along with the rest
>>>>>>>> of old L618-L626.
>>>>>>> I don't need the parameter, we can always check the time.
>>>>>>> When we timeout on queue wait we will always pass the time check.
>>>>>> As I pointed out, if you are doing non-safepoint VM ops, you can miss
>>>>>> a time check.
>>>>>>
>>>>>>
>>>>>>> So the difference is that I think the previous behavior was bad/buggy and
>>>>>>> you are trying to restore it :)
>>>>>>>
>>>>>>> Check-out v3, just full:
>>>>>>> http://cr.openjdk.java.net/~rehn/8220774/v3/webrev/
>>>>>> I'll post a second message with my re-review for this webrev.
>>>>>>
>>>>>>
>>>>>>> - wait on queue GuaranteedSafepointInterval
>>>>>>> - if timeout
>>>>>>> - do handshake a lot
>>>>>>> - we are guaranteed to pass the time check, so cleanup if needed
>>>>>>> - else safepoint alot
>>>>>>> - if op
>>>>>>> - after op
>>>>>>> - do handshake a lot
>>>>>>> - check time, if expired, cleanup if needed
>>>>>>> - else safepoint alot
>>>>>>>
>>>>>>> I think that is what you want :)
>>>>>>> And that's what in v3.
>>>>>>>
>>>>>>> I would rather have:
>>>>>>> - wait-time = GuaranteedSafepointInterval -
>>>>>>> SafepointTracing::time_since_last_safepoint_ms()
>>>>>>> - if wait-time > 0
>>>>>>> - wait on queue
>>>>>>> - else or timeout
>>>>>>> - do handshake a lot
>>>>>>> - cleanup if needed
>>>>>>> - else safepoint alot
>>>>>>>
>>>>>>> That way we do check for cleanup after non-safepoint in time but we do not
>>>>>>> causes extra safepoints after safepoints with SafepointALot. (I know you
>>>>>>> think of this as feature :) )
>>>>>> Not so much a feature as a stress option. I love my stress options.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>> Thanks, Robbin
>
More information about the hotspot-runtime-dev
mailing list