Restrictions for lock coarsening?

Sun Jan 4 18:39:08 PST 2009

Hi Clemens,

As Dave correctly recalled, locks are not hoisted out of a loop, so that 
kind of coarsening does not occur. Hence your benchmark will not 
experience any lock coarsening.

See callNode.cpp 1378:

Node *LockNode::Ideal(PhaseGVN *phase, bool can_reshape) {

for the code that does the lock elision/coarsening.

Cheers,
David Holmes

Clemens Eisserer said the following on 01/05/09 09:09:
> Hi Dave,
> 
> Thanks again for taking the time to anwer my neverending locking questions ;)
> 
>> Locks and monitors have precisely the same semantics with respect to the
>> java memory model.
> Glad to hear that :)
> 
>> Coarsening is a bit tricky.   IIRC we only coarsen abutting or nearly
>> abutting critical sections under the same lock but I don't think we'll hoist
>> the critical section outside of a loop.   Coarsening is, for the most part,
>> an optimization that attempts to avoid CAS / lock:cmpxchg which have a
>> _local latency penalty.   In addition it gives the JIT more latitude in the
>> fact of the memory model (we can avoid spilling variables back to memory
>> with larger critical sections).   That having been said,  you don't really
>> want to coarsen critical sections if there's contention.  (Recall that we
>> can capture code that's _not part of a critical section inside a coarsened
>> section, thus artificially lengthening the critical section).   Ideally we'd
>> have runtime feedback and decoarsen contended critical sections but that's
>> not implemented as of today, so instead the mechanism is fairly
>> conservative.
> Well, my benchmark does execute the method in question in a loop,
> there's only one monitor in action and the benchmark is
> single-threaded so there is no contention. Could it be that coarsening
> does not occur because the synchronized method is too large or has too
> complex control flow (if/else, maybe even loops)? Or maybe coarening
> is limited to non-OSR compilation?
> 
> In reallity there won't be contention for almost all of the use-cases,
> but the api is specified to be thread-safe and therefor there is no
> way arround synchronization. The reason why I am interested in
> coarsening is, that CAS means quite a lot of local latency - on my
> Core2Duo 50% of the total time is spent in locking. I guess the
> problem will vanish with better CAS implementations ;)
> 
> Thanks again, Clemens