[master] RFR: 8302209: [Lilliput] Optimize fix-anon monitor owner path [v2]

Roman Kennke rkennke at openjdk.org
Tue Mar 7 18:07:42 UTC 2023


On Tue, 7 Mar 2023 17:43:15 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> > > What is the advantage of using a stub here? The code section between aarch64 and x64 are different and only used in one place each. So you don't really save much by reuse here. Seems you could have just written them out as a conditional path into the various xxfastlock() functions. Then you would have saved the jumps into and out of the stub in the case owner==ANONYMOUS case.
> > 
> > 
> > This is not about saving memory. The very common path here is that owner is not ANONYMOUS. Using a stub means that we forward-jump only in the very uncommon path, and not branch in the common path.
> 
> wait... now I'm confused. Sorry, I'm no compiler guy. But:
> 
> ```
> __ tst(disp_hdr, (uint64_t)(intptr_t) ANONYMOUS_OWNER);
> ...
>  __ br(Assembler::NE, stub->entry());
> ```
> Why NE? we jump into the stub if owner != ANONYMOUS?

tst is an and instruction in disguise: it masks the value and sets ZF according to the result: it is ZF=1 if the bit is 0, and ZF=0 if the bit is 1. NE is ZF=0, that is when the ANONYMOUS bit is set.

> > If we were not using a stub, we would have to forward-jump in the common path and not branch in the uncommon path. In my experience, forward-jumping in the common path throws off static branch prediction and performs significantly worse. This is also mentioned in the Intel optimization guide as important consideration.
> 
> I understand that. But you could have this done in-place too, right? Place a label at the end of the function and jump to it in the uncommon case, before the label do a ret for the common case. This is more to make sure I understand the coding, if you prefer to do it via stub that's fine.

There is no ret here. This whole code section is expanded when C2 emits FastLockNode and FastUnlockNode. (To make it even more confusing, FastLockNode and FastUnlockNode are subclasses of CmpNode, and the code following it will do different things depending on whether we come out with ZF=1 or =0. But this is not relevant for this discussion.) Lacking a ret, I don't see a reasonable way to achieve what I described without using a stub.


> > > Anonymous means some other thread T2 inflated my lock while I (T1) was thin-lock-waiting on the lock, right?
> > 
> > 
> > Yes, exactly.
> 
> Okay, non-anonymous is the case where a thread leaves an inflated monitor it inflated itself.

It doesn't have to have inflated that monitor itself, it would only have locked the (possibly pre-existing) monitor itself.

> And that is the hot path? Interesting. Under which circumstances does a thread inflate its own thread? Is this because your fastlocking does not work with recursion yet, right? So every recursion would inflate the lock right away?

Once 2 or more threads compete for a lock, that lock gets inflated. From there on, all locks and unlocks are done on the monitor. Yes, this is very common and hot (in workloads that use locking properly - as opposed to code that uses uncontended and/or unshared locks, which is what fast-locking is good at). Yes, recursion would inflate the lock immediately, but this is not the point here. (I do have an experimental patch to implement recursive fast-locking in https://github.com/rkennke/jdk/pull/2, I am just not sure if it's worth to have this extra complexity for little gain in somewhat bad workloads.)

Roman

-------------

PR: https://git.openjdk.org/lilliput/pull/74


More information about the lilliput-dev mailing list