[Fwd: Deadlocked Thread State is RUNNABLE?]

Tue Nov 17 17:58:18 PST 2009

Mandy Chung said the following on 11/18/09 11:36:
 > It's a known bug:
 >
 > 6501158: Thread state is incorrect during class initialization
 > procedure
 >
 > I recalled the discussion for this bug but don't remember if we
 > discussed enhancing the java.lang.management spec to cover "waiting"
 > on VM internal actions.
 >
 > David will probably have more information about this.

I have nothing really to add save what is stated in the CR, but as my 
main comment was not public I've moved it to being public (and dropped 
myself as RE) and reproduce it below.

Quite simply the code that does the "wait" is low-level in the VM and 
does not come through the normal Object.wait() path that would set the 
Thread.State. It can be "fixed" but there are a couple of additional 
issues that also need to be addressed due to the fact that the monitor 
used is not associated with Java-level object. (The JLS was updated in 
this regard.)

The meta-discussion was whether we should introduce a new Thread.State 
to cover this special case (waiting for class initialization), and that 
discussion seemed to lean towards doing this (I suggested it and Mandy 
agreed it seemed like a good idea :) ) But things did not progress from 
there.

Cheers,
David
-----

 From 6501158:

The submitter quotes the JLS with regard to the class initialization 
procedure and how synchronization is employed. In fact hotspot does not 
synchronize using the Class object monitor during class initialization - 
this is to avoid denial-of-service style attacks just by explicitly 
locking a Class object. The JLS is in the process of being updated to 
say that a "unique initialization lock " is used for class 
initialization, not necessarily the Class object's lock. This brings the 
spec into line with the hotspot implementation.

The reason I mention this is that the monitor that hotspot uses is 
associated with the klassOop for the class. The monitor code sets 
current_waiting_monitor() or current_pending_monitor() as appropriate 
during wait() or monitor entry. The management code, via the 
ThreadService::ThreadSnapShot gets a hold of the object associated with 
the monitor for a blocked thread and assumes that the object is in fact 
the oop for a java.lang.Object. When the klassOop is treated as in 
instance oop and queried for its own class etc then we end up crashing 
the VM.

The suggested fix correctly sets the thread state to "WAITING":

Full thread dump Java HotSpot(TM) Tiered VM 
(1.7.0-internal-dh198349-fastdebug mixed mode):

"Runner" prio=3 tid=0x08171800 nid=0xb in Object.wait() 
[0xcb99d000..0xcb99dbb0]
    java.lang.Thread.State: WAITING (on object monitor)

but additional changes are need in ThreadSnapShot to discard the 
non-instance oop. (It seems JvmtiEnvBase::get_current_contended_monitor 
would need a similar modification). This seems to work and getThreadInfo 
simply reports eg:

Current thread info: "Runner" Id=8 WAITING

which seems okay. And getLockInfo() returns null.

It is unclear however whether reporting this information actually 
violates the specification for these management API's. A thread is only 
WAITING when performing Object.wait(), in which case there must be an 
Object being waited upon and so LockInfo must return non-null 
information. Yet that is not the case here.

It seems to me that while we can report the information above, it might 
be better to see whether the management specification can be enhanced to 
cover "waiting" on VM internal actions and to then report this 
circumstance as one of those.

Note also that the existing hotspot code could already be susceptible to 
a crash due to the use of the klassOop monitor for class initialization. 
If the timing were just right, a call to getThreadInfo could see a 
thread blocked trying to acquire this monitor (not wait upon it) and 
that would be captured by the ThreadSnapshot and eventually cause a 
crash. The fact that the snapshot requires a safepoint makes it less 
likely that you would encounter the target thread while blocked on the 
monitor, as the monitor is only held for a short period during class 
initialization.

I will await discussion with the management/monitoring folk before 
deciding how best to proceed with this CR.