Restrictions for lock coarsening?
David Holmes - Sun Microsystems
David.Holmes at Sun.COM
Sun Jan 4 18:39:08 PST 2009
Hi Clemens,
As Dave correctly recalled, locks are not hoisted out of a loop, so that
kind of coarsening does not occur. Hence your benchmark will not
experience any lock coarsening.
See callNode.cpp 1378:
Node *LockNode::Ideal(PhaseGVN *phase, bool can_reshape) {
for the code that does the lock elision/coarsening.
Cheers,
David Holmes
Clemens Eisserer said the following on 01/05/09 09:09:
> Hi Dave,
>
> Thanks again for taking the time to anwer my neverending locking questions ;)
>
>> Locks and monitors have precisely the same semantics with respect to the
>> java memory model.
> Glad to hear that :)
>
>> Coarsening is a bit tricky. IIRC we only coarsen abutting or nearly
>> abutting critical sections under the same lock but I don't think we'll hoist
>> the critical section outside of a loop. Coarsening is, for the most part,
>> an optimization that attempts to avoid CAS / lock:cmpxchg which have a
>> _local latency penalty. In addition it gives the JIT more latitude in the
>> fact of the memory model (we can avoid spilling variables back to memory
>> with larger critical sections). That having been said, you don't really
>> want to coarsen critical sections if there's contention. (Recall that we
>> can capture code that's _not part of a critical section inside a coarsened
>> section, thus artificially lengthening the critical section). Ideally we'd
>> have runtime feedback and decoarsen contended critical sections but that's
>> not implemented as of today, so instead the mechanism is fairly
>> conservative.
> Well, my benchmark does execute the method in question in a loop,
> there's only one monitor in action and the benchmark is
> single-threaded so there is no contention. Could it be that coarsening
> does not occur because the synchronized method is too large or has too
> complex control flow (if/else, maybe even loops)? Or maybe coarening
> is limited to non-OSR compilation?
>
> In reallity there won't be contention for almost all of the use-cases,
> but the api is specified to be thread-safe and therefor there is no
> way arround synchronization. The reason why I am interested in
> coarsening is, that CAS means quite a lot of local latency - on my
> Core2Duo 50% of the total time is spent in locking. I guess the
> problem will vanish with better CAS implementations ;)
>
> Thanks again, Clemens
More information about the hotspot-runtime-dev
mailing list