RFR 8191890: Biased locking still uses the inferior stop the world safepoint for revocation

Thu Jun 13 20:36:30 UTC 2019

Hi Markus,

On 6/12/19 9:30 AM, Markus Gronlund wrote:
> Hi Patricio and David, Sorry for a bit of belated reply, nice work 
> here. I will comment on how it relates to JFR, specifically with the 
> Safepoint ID relation: <cut>
>>> src/hotspot/share/jfr/metadata/metadata.xml
>>>
>>> Is it the case that this event is now never generated from a
>>> safepoint? Or have you just deleted the safepoint_id from the event
>>> because it might not be at a safepoint? If the latter can't we keep
>>> it and use 0 to indicate "not at a safepoint"? I think the JFR folk
>>> need to comment on this part of the change anyway.
>> This event will be created and commited only from
>> BiasedLocking::single_revoke_with_handshake(). Now, the actual
>> handshake that revoked the bias could be executed at a safepoint only
>> if ThreadLocalHandshakes is false. But I understand that this is true
>> for all platforms so the handshake should always be executed outside
>> safepoints.
> Ok.
>
>> It would be great if JFR folks review this part.
> [MG] I support the suggestion put forth by David to keep the 
> safepoint_id field of the event (at least for now). 0 could be 
> interpreted by a client as "outside of a safepoint / not in relation 
> to a safepoint". Three main reasons for not removing it immediately: 
> 1. This event structure is already semi-published in the public 
> domain. Although our Javadoc include a disclaimer that fields in event 
> structures *could* change (recommending defensive programming on field 
> accesses), it would be civilized to not disrupt if not absolutely 
> necessary. 2. We might see issues with handshakes that are currently 
> unknown, so we might still have a fallback path to use the safepoint 
> mechanism again. 3. I would think we plan to move additional, 
> currently safepoint-based operations, to use peer-to-peer handshaking 
> in the near future. We would need to address more generically how to 
> transition the safepoint_id field for those events as well. One 
> evolution path here could be the suggestion to overlay an 
> interpretation on safepoint_id == 0 -> outside / not safepoint 
> related. Let's keep the safepoint_id field for now, please set the 
> value to 0. 
Ok, makes sense, I'll keep the safepoint_id field then. I will send v03 
with this fix after I finish testing some other changes I might make.

Thanks for looking into this Markus!

Patricio
> Thanks
> Markus
>
>
> -----Original Message-----
> From: David Holmes
> Sent: den 7 juni 2019 01:37
> To: Patricio Chilano<patricio.chilano.mateo at oracle.com>;hotspot-runtime-dev at openjdk.java.net  runtime<hotspot-runtime-dev at openjdk.java.net>
> Subject: Re: RFR 8191890: Biased locking still uses the inferior stop the world safepoint for revocation
>
> Hi Patricio,
>
> On 7/06/2019 9:19 am, Patricio Chilano wrote:
>> Hi David,
>>
>>
>> On 6/6/19 3:37 AM, David Holmes wrote:
>>> Hi Patricio,
>>>
>>> First thanks for taking this on!
>>>
>>> I have some higher-level general discussion around this before deep
>>> diving into the actual code review (not that I have much there either
>>> :)).
>>>
>>> First to clarify how biased-locking works. I'm unclear when an object
>>> can be rebiased after having its bias revoked? This particularly
>>> relates to some of your assertions (as Markus has queried) after the
>>> CAS to update the mark so that the bias is revoked, and you then
>>> re-read the mark and assert the bias has been revoked - what stops
>>> another thread from rebiasing the object in between those two
>>> statements? Is it that rebiasing cannot happen, or that it could only
>>> happen if there were an intervening safepoint which in turn cannot
>>> happen?
>> Once the bias of the object is revoked it will stay like that forever,
>> it cannot happen that it goes back to having the 0x5 pattern.
>> Also, once the bias pattern in the prototype header for a class is
>> revoked during a bulk revocation operation, if there is an object of
>> that class that still has the bias pattern, a JavaThread that wants to
>> synchronize on that object will always revoke the bias first. This is
>> why I don't check if the CAS succeeded if the prototype of the class
>> does not has the bias pattern, I just assert that the object is not
>> biased anymore.
>>
>> Below I describe the cases where an object can be rebiased.
>>
>> Once a JavaThread biases an object for the first time, there are two
>> cases that allows for that object to be rebiased:
>> 1) If the epoch in the markword becomes invalid. For this to happen a
>> bulk rebias operation is needed. This is why I do check if the CAS
>> succeeded or not for these cases, since some other JavaThread could
>> have rebiased it.
>> 2) During a full GC, objects that are biased ( some JavaThread is set
>> in the biaser bits) could have their markword be reset to 0x5. This
>> means they will become anonymously biased again and so will look as if
>> they were not biased yet. As to how this logic works: At the beginning
>> of the full GC, BiasedLocking::preserve_marks() saves all the
>> markwords for those objects that are currently locked and have a bias
>> pattern. After that, markOopDesc::must_be_preserved_with_bias() will
>> be called to decide if the markword of an object should be preserved
>> or not. If the markword contains the bias pattern it is never
>> preserved. At the end
>> BiasedLocking::restore_marks() is called to restore the marks for
>> those objects that we saved before. So this means that even if an
>> object has a valid biaser, with valid epoch, if the object is not
>> currently locked it could be reset during the GC. I'm not sure though
>> if whenever
>> markOopDesc::must_be_preserved_with_bias() returns false the garbage
>> collector always does the reset or it just means it could reset it if
>> it wants to. In any case I've seen that reset happening when doing
>> handshakes. In fact, this is one of the reasons why the handshake
>> could return that the bias was not revoked, since I don't check for
>> the anonymously biased case in RevokeOneBias.
> Thanks for that very detailed set of descriptions. I won't pretend to fully grok all the details as I'm not completely clear on the role of the "epoch" or being anonymously biased, but I'm convinced you have a full understanding of such things. :) In revoke_and_rebias it was always a struggle for me to figure out exactly when the "rebias" part could come into play.
>
>>> The main concern with a change like this (as with all the handshake
>>> changes) is what new races this may allow and whether they have all
>>> been accounted for. IIUC the handshake will still be conducted by the
>>> VMThread so that still ensures serialization wrt. safepoints (which
>>> makes it simpler to reason about things). I've looked at some of the
>>> races you anticipated (like the "resurrected" thread) and they seem
>>> to be handled correctly. I'm unable to construct other races that
>>> might be problematic (but that isn't saying a lot :) ).
>> I agree that since we are now doing the revocation outside safepoints
>> there is potential for additional races. But also one thing to note is
>> that RevokeOneBias, which contains the logic of the handshake and is
>> now replacing what we used to do at a safepoint, is not really
>> different from the initial code in revoke_and_rebias() which is done
>> outside safepoints. The handshake logic is like executing that initial
>> part but with the right JavaThread so that if the object has a valid
>> biaser, then that biaser is either ourselves or we are the VMThread
>> while the biaser is blocked, so that we can execute revoke_own_lock().
>> In fact I was thinking at some point to combine them in some method
>> (maybe try_fast_revoke()). The attempt_rebias flag and the
>> update_heuristics() in revoke_and_rebias() complicated things so I kept them separate.
>> I have also tried to think on all possible racy scenarios and couldn't
>> find additional problems beside the "resurrected thread" one (although
>> it's also not a guarantee of anything). But that's why I was thinking
>> to check this in 14, so that if there are any problems we have plenty
>> of testing time to detect them.
> Yes that is a good idea. No need to rush this into 13.
>
>>> src/hotspot/share/jfr/metadata/metadata.xml
>>>
>>> Is it the case that this event is now never generated from a
>>> safepoint? Or have you just deleted the safepoint_id from the event
>>> because it might not be at a safepoint? If the latter can't we keep
>>> it and use 0 to indicate "not at a safepoint"? I think the JFR folk
>>> need to comment on this part of the change anyway.
>> This event will be created and commited only from
>> BiasedLocking::single_revoke_with_handshake(). Now, the actual
>> handshake that revoked the bias could be executed at a safepoint only
>> if ThreadLocalHandshakes is false. But I understand that this is true
>> for all platforms so the handshake should always be executed outside
>> safepoints.
> Ok.
>
>> It would be great if JFR folks review this part.
> Try to grab Markus :)
>
> [MG]
>
>
>
>>> src/hotspot/share/runtime/biasedLocking.cpp
>>>
>>> I second Dan's comment about combining cleanup and code motion in a
>>> big change like this - it does make it much harder to spot the real
>>> difference.
>> Ok, already two objections on this so I'll revert moving the
>> heuristics part. I think I also moved clean_up_cached_monitor_info()
>> and I will double check any other movements.
>>
>>
>>> I note Dan picked up on the lack of p2i and other stuff related to
>>> the logging statements, and that you indicated they were fixed. I
>>> note that all that stuff is pre-existing so I'm unclear now whether
>>> you have fixed all the logging in the file or only the statements in
>>> the code you have changed or added? Again such cleanup may be best
>>> done separately.
>> I haven't fixed the existing log statements, only the ones Dan
>> mentioned which are in single_revoke_with_handshake(),
>> revoke_own_lock(), and in VM_HandshakeOneThread(). Ok, I can fix the
>> other ones in a cleanup later along with code movement and the removal
>> of the attemp_rebias flag which we are not using.
> Okay. To be clear I don't expect you to fix all the existing uses I just wanted to clarify which ones you had fixed.
>
>>> 640 void BiasedLocking::revoke_own_lock(oop obj, JavaThread*
>>> biased_locker) {
>>> 641   assert(!SafepointSynchronize::is_at_safepoint() ||
>>> !ThreadLocalHandshakes,
>>> 642          "if ThreadLocalHandshakes is enabled this should always
>>> be executed outside safepoints");
>>> 643   assert(Thread::current() == biased_locker ||
>>> Thread::current()->is_VM_thread(), "wrong thread");
>>> 644
>>>
>>> This is called "revoke_own_lock" but it can also be executed by the
>>> VMThread - so its not its own lock. Also we don't revoke anything
>>> related to a "lock" - we revoke a bias from the markword of an oop. I
>>> think a better name is needed.
>> Yes, I didn't really like it either. How about walk_stack_and_revoke() ?
> That sounds good to me. Roll on v2 :)
>
> Thanks,
> David
> -----
>
>>> src/hotspot/share/runtime/handshake.cpp
>>>
>>>   125       log_trace(handshake)("JavaThread " INTPTR_FORMAT " is not
>>> alive", (intptr_t)_target);
>>>
>>> Use p2i(_target) rather than cast to intptr_t.
>> Fixed.
>>
>>
>>> That's all from me.
>> Thanks for looking into this David! If you are okay with the
>> "walk_stack_and_revoke()" name then I can send v2.
>>
>>
>> Thanks!
>>
>>
>> Patricio
>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>
>>> On 30/05/2019 2:29 am, Patricio Chilano wrote:
>>>> Hi all,
>>>>
>>>> Could you review this patch that uses thread local handshakes
>>>> instead of safepoints to revoke the biases of locked objects?
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~pchilanomate/8191890/v01/webrev/
>>>> Bug:
>>>> https://bugs.openjdk.java.net/browse/JDK-8191890
>>>>
>>>>
>>>> Today whenever a JavaThread needs to revoke the bias of an object
>>>> that has been biased by another JavaThread (and where the epoch is
>>>> still valid and the prototype header of the class still has the bias
>>>> pattern) it needs to request a safepoint operation. The VMThread
>>>> inside the safepoint walks the stack of the biaser looking for lock
>>>> records associated with the biased object, and converts them to thin
>>>> locks if any are found.
>>>> This patch uses thread local handshakes instead, since we actually
>>>> only need to be able to safely walk the stack of the JavaThread that
>>>> biased the object and not other JavaThreads.
>>>>
>>>> Some notes about the patch:
>>>>
>>>> - Thanks to Robbin for initial work on this patch and for advice and
>>>> feedback!
>>>> - We still execute bulk rebias and bulk revoke operations inside
>>>> safepoints, since in those cases all the JavaThread's stacks need to
>>>> be walked to potentially update lock records.
>>>> - The method revoke_bias() was renamed to
>>>> single_revoke_at_safepoint(). This method is still kept because
>>>> there are places where we check whether we are already at safepoint
>>>> when trying to revoke. In those cases, if we are already at a
>>>> safepoint we simply end up calling this method.
>>>> - Handshakes are executed as VMOperations so the VMThread is still
>>>> involved in the revocation. This means we cannot have different
>>>> revocations being executed in parallel (same as with safepoints).
>>>> Ideally we would like to execute thread local handshakes without
>>>> needing for the VMThread to participate. However, now other
>>>> JavaThreads that do not participate in the revocation are allow to
>>>> continue making progress.
>>>>
>>>> Run several benchmarks and mostly performance seems unaffected.
>>>> Measured the average time it takes for revoking bias with a
>>>> handshake and with a safepoint and numbers are pretty similar
>>>> varying between benchmarks. Some numbers are shown below:
>>>>
>>>> specjbb2015
>>>>                        Handshakes      Safepoints Linux
>>>> 4ms            4.6ms Windows                 11ms             19ms
>>>>
>>>> startup benchmarks
>>>>                       Handshakes      Safepoints Linux
>>>> 159us             248us Windows               150us
>>>> 111us
>>>>
>>>> Overall the variation is not enough to show significant difference
>>>> in performance, considering also that revocations of a valid biaser
>>>> are usually a fraction of the overall running time of a benchmark
>>>> (specially jbb2015). In any case using handshakes allows other
>>>> JavaThreads to make progress during that time, minimizing STW
>>>> operations.
>>>>
>>>> In terms of testing, the patch passed several runs of tiers1-6 in
>>>> mach5 on Windows, Linux, MacOS and Solaris.
>>>>
>>>> Thanks!
>>>>
>>>> Patricio
>>>>