[master] RFR: 8302209: [Lilliput] Optimize fix-anon monitor owner path [v2]

Wed Mar 8 07:06:41 UTC 2023

On Tue, 7 Mar 2023 18:03:46 GMT, Roman Kennke <rkennke at openjdk.org> wrote:

> > 
> > wait... now I'm confused. Sorry, I'm no compiler guy. But:
> > ```
> > __ tst(disp_hdr, (uint64_t)(intptr_t) ANONYMOUS_OWNER);
> > ...
> >  __ br(Assembler::NE, stub->entry());
> > ```
 >   
> > Why NE? we jump into the stub if owner != ANONYMOUS?
> 
> tst is an and instruction in disguise: it masks the value and sets ZF according to the result: it is ZF=1 if the bit is 0, and ZF=0 if the bit is 1. NE is ZF=0, that is when the ANONYMOUS bit is set.

Ah, okay, thanks for clarifying.

> 
> > > If we were not using a stub, we would have to forward-jump in the common path and not branch in the uncommon path. In my experience, forward-jumping in the common path throws off static branch prediction and performs significantly worse. This is also mentioned in the Intel optimization guide as important consideration.
> > 
> > I understand that. But you could have this done in-place too, right? Place a label at the end of the function and jump to it in the uncommon case, before the label do a ret for the common case. This is more to make sure I understand the coding, if you prefer to do it via stub that's fine.
> 
> There is no ret here. This whole code section is expanded when C2 emits FastLockNode and FastUnlockNode. (To make it even more confusing, FastLockNode and FastUnlockNode are subclasses of CmpNode, and the code following it will do different things depending on whether we come out with ZF=1 or =0. But this is not relevant for this discussion.) Lacking a ret, I don't see a reasonable way to achieve what I described without using a stub.

Okay, I get it. There is no "end of function" to push your uncommon branch to, so the only way to do this without a stub would be to create a branch in-place, and that you don't want that for performance reasons. 

> 
> > > > Anonymous means some other thread T2 inflated my lock while I (T1) was thin-lock-waiting on the lock, right?
> > > 
> > > 
> > > Yes, exactly.
> > 
> > 
> > Okay, non-anonymous is the case where a thread leaves an inflated monitor it inflated itself.
> 
> It doesn't have to have inflated that monitor itself, it would only have locked the (possibly pre-existing) monitor itself.
> 
> > And that is the hot path? Interesting. Under which circumstances does a thread inflate its own thread? Is this because your fastlocking does not work with recursion yet, right? So every recursion would inflate the lock right away?
> 
> Once 2 or more threads compete for a lock, that lock gets inflated. From there on, all locks and unlocks are done on the monitor. Yes, this is very common and hot (in workloads that use locking properly - as opposed to code that uses uncontended and/or unshared locks, which is what fast-locking is good at). Yes, recursion would inflate the lock immediately, but this is not the point here. (I do have an experimental patch to implement recursive fast-locking in [rkennke/jdk#2](https://github.com/rkennke/jdk/pull/2), I am just not sure if it's worth to have this extra complexity for little gain in somewhat bad workloads.)

Again, thanks for explaining.
> 
> Roman

-------------

PR: https://git.openjdk.org/lilliput/pull/74