[Fwd: Deadlocked Thread State is RUNNABLE?]
David Holmes - Sun Microsystems
David.Holmes at Sun.COM
Tue Nov 17 17:58:18 PST 2009
Mandy Chung said the following on 11/18/09 11:36:
> It's a known bug:
>
> 6501158: Thread state is incorrect during class initialization
> procedure
>
> I recalled the discussion for this bug but don't remember if we
> discussed enhancing the java.lang.management spec to cover "waiting"
> on VM internal actions.
>
> David will probably have more information about this.
I have nothing really to add save what is stated in the CR, but as my
main comment was not public I've moved it to being public (and dropped
myself as RE) and reproduce it below.
Quite simply the code that does the "wait" is low-level in the VM and
does not come through the normal Object.wait() path that would set the
Thread.State. It can be "fixed" but there are a couple of additional
issues that also need to be addressed due to the fact that the monitor
used is not associated with Java-level object. (The JLS was updated in
this regard.)
The meta-discussion was whether we should introduce a new Thread.State
to cover this special case (waiting for class initialization), and that
discussion seemed to lean towards doing this (I suggested it and Mandy
agreed it seemed like a good idea :) ) But things did not progress from
there.
Cheers,
David
-----
From 6501158:
The submitter quotes the JLS with regard to the class initialization
procedure and how synchronization is employed. In fact hotspot does not
synchronize using the Class object monitor during class initialization -
this is to avoid denial-of-service style attacks just by explicitly
locking a Class object. The JLS is in the process of being updated to
say that a "unique initialization lock " is used for class
initialization, not necessarily the Class object's lock. This brings the
spec into line with the hotspot implementation.
The reason I mention this is that the monitor that hotspot uses is
associated with the klassOop for the class. The monitor code sets
current_waiting_monitor() or current_pending_monitor() as appropriate
during wait() or monitor entry. The management code, via the
ThreadService::ThreadSnapShot gets a hold of the object associated with
the monitor for a blocked thread and assumes that the object is in fact
the oop for a java.lang.Object. When the klassOop is treated as in
instance oop and queried for its own class etc then we end up crashing
the VM.
The suggested fix correctly sets the thread state to "WAITING":
Full thread dump Java HotSpot(TM) Tiered VM
(1.7.0-internal-dh198349-fastdebug mixed mode):
"Runner" prio=3 tid=0x08171800 nid=0xb in Object.wait()
[0xcb99d000..0xcb99dbb0]
java.lang.Thread.State: WAITING (on object monitor)
but additional changes are need in ThreadSnapShot to discard the
non-instance oop. (It seems JvmtiEnvBase::get_current_contended_monitor
would need a similar modification). This seems to work and getThreadInfo
simply reports eg:
Current thread info: "Runner" Id=8 WAITING
which seems okay. And getLockInfo() returns null.
It is unclear however whether reporting this information actually
violates the specification for these management API's. A thread is only
WAITING when performing Object.wait(), in which case there must be an
Object being waited upon and so LockInfo must return non-null
information. Yet that is not the case here.
It seems to me that while we can report the information above, it might
be better to see whether the management specification can be enhanced to
cover "waiting" on VM internal actions and to then report this
circumstance as one of those.
Note also that the existing hotspot code could already be susceptible to
a crash due to the use of the klassOop monitor for class initialization.
If the timing were just right, a call to getThreadInfo could see a
thread blocked trying to acquire this monitor (not wait upon it) and
that would be captured by the ThreadSnapshot and eventually cause a
crash. The fact that the snapshot requires a safepoint makes it less
likely that you would encounter the target thread while blocked on the
monitor, as the monitor is only held for a short period during class
initialization.
I will await discussion with the management/monitoring folk before
deciding how best to proceed with this CR.
More information about the serviceability-dev
mailing list