CMS parallel initial mark; vm thread sneaking in Thread::try_lock()

Thu Jun 13 11:45:09 UTC 2013

Hi,

On Thu, 2013-06-13 at 20:33 +1000, David Holmes wrote:
> Thomas,
> 
> lock-sneaking is something very specific to the VMThread to avoid 
> deadlocks at safepoints. I really don't think we want to see it promoted 
> into the API for mutexes/monitors.
> 
> I hadn't been paying this thread too much attention but now I'm starting 
> to think I need to.

  the situation is (and I should have asked the runtime team more
directly, and cc'ed the thread earlier, apologies) that in CMS there is
the need to frequently record object boundaries into a buffer for
performance reasons.

There is a contribution that changes this sampling from being done
concurrently to being done every time after a tlab refill. The problem
is that this operation involves a few reads that must be atomic; so the
change guards that operation with a critical section. It is also not
critical that the operation is executed always, it may be skipped some
times.

So the change manually implements a try_lock()/unlock() idiom using a
CAS and a global variable.

The suggestion that tripped this thread has been to use existing VM
code, in particular monitors, as they seem to have the requested
functionality. There does not seem to be need for rolling own code here.

The code in that critical section is very short, and straight line code;
the critical section boiler plate code is larger than that...

Then there has been the concern about that try_lock() may be sneaked by
the vm thread; and some suggestions to put these concerns to rest.

See
http://cr.openjdk.java.net/~hiroshi/webrevs/edenchunks/webrev.00/src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp.udiff.html ,
in the sample_eden_chunk() method.

So maybe could you confirm the statements made so far:
- Monitor::try_lock()/::unlock() is equivalent to what the change does
in this case.

(I had a look at the unlock() code, and it's a lot more complicated than
the equivalent code in the change. It boils down to the same behavior
however?)

- that new sampling code is supposed to be only every executed while the
mutator is running. Sneaking can only occur during safepoints; so it's
use here would be "safe" in that there will be no sneaking at all. There
is no case when the suggested assert(!
SafepointSynchronize::is_at_safepoint()) would fire?

Other than that some other ideas to avoid execution of the sneaking code
were thrown into the room. The assert should be fine, but it would be
better to avoid the possibility of sneaking directly. The latest
suggestion involved adding a no-sneaking-variant of try_lock()
("try_lock_no_sneak()") that is called by the original one to keep exact
semantics (except for the additional call).
Nobody intends to change anything in that code without your input :)
everyone is aware that this is tricky code with possible unintended side
effects.

The relevant thread starts here, most of it having been cc'ed to
runtime-dev:
http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-June/007510.html

Thanks for looking at this issue,
  Thomas