deadlock in initialization of instanceKlass

Thu Mar 22 17:09:08 PDT 2012

Hi Kris,

The parallel-classloading changes likely remove this problem from 7+

I'm aware of other deadlock scenarios when agents get involved in the 
classloading process. There are some things that agents are allowed to 
do that simply should not be permitted <sigh>. Fixes tend not to be 
generic or else morph into parallel class loading :)

 > P.S. There's a second bug. When using "jstack -l" to diagnose the
 > problem, a fastdebug build of the VM hit an assertion [4]. The code in
 > DeadlockCycle assumed that all ObjectMonitors correspond to
 > instanceOops, which is not (yet) the case. A instanceKlass or
 > constantPool could also be directly locked with a ObjectMonitor, and
 > that's not an instanceOop.
 > I've made a patch against the current jdk8/jdk8/hotspot to fix this
 > second bug [5].

While there are a few ObjectMonitors used that are not associated with 
oops, they should not (barring bugs) be able to become part of a 
deadlock - hence the deadlock code assumes all monitors have an 
associated oop. I know we have had similar issues with JVMTI code in the 
past. I'll take a closer look and file a bug if there isn't already one. 
That said: the perm-gen removal work changes this again as the class 
initialization objectMonitor is attached to an int[0] created just for 
this purpose.

Thanks,
David

On 23/03/2012 9:44 AM, Krystal Mok wrote:
> Hi,
>
> I've run into a corner case where deadlock happens during instanceKlass
> linking, which I think could have been avoided by patching the VM. It's
> mostly a JDK6-only issue, as I haven't been able to reproduce it on JDK7
> or JDK8.
>
> In the original problem, the scenario was to start a Java process with a
> BTrace script tracing from the beginning [1]. As the system started up,
> two threads were trying to load two different classes with different
> class loaders at about the same time [2]. Both of these class load
> operations were delegated to BTrace's agent for instrumentation, where
> both threads tried to create a new instance of a class called
> "com.sun.btrace.runtime.Instrumentor". This class was actually loaded
> already, but not linked/initialized yet, so this was the time to do it.
> And then a deadlock happened: One of the threads has locked the system
> class loader before trying to link the class of
> "com.sun.btrace.runtime.Instrumentor"; where as the other thread went
> ahead and started linking the class of
> "com.sun.btrace.runtime.Instrumentor" first, and for verification it
> needed to load new classes, which in turn needed to lock the system
> class loader.
>
> A simplified minimal repro of the original problem is available [3].
> Caveat: it may take a few runs to actually see the deadlock. I've had a
> better chance of reproducing it on JDK6/HS20, but harder on JDK6/HS24,
> where everything seem to be faster. Haven't been able to reproduce it
> with JDK7/HS23 or JDK8/HS24.
>
> The race in this case involves locking both an instanceKlass and the
> lock object of its defining class loader. Since an instanceKlass isn't a
> Java-level object, Java code couldn't have locked it on its own.
> So I think there's a chance of getting around this deadlock by locking
> the loader before locking the instanceKlass, for places where class
> loading may happen during the period the instanceKlass is locked. Or
> perhaps the complete_exit()/reenter() dance trick.
>
> A pragmatic way of solving the problem is to modify the application code
> to get rid of the race in the first place. In this particular scenario,
> that'd be modifying BTrace, forcing the Instrumentor class to link early
> with a Class.forName() call. But as in the repro case, the problem seems
> pretty generic, and could hit innocent-looking Java code.
> (Of course, upgrading to JDK7 will get rid of the problem, too. Another
> good case to push for an upgrade :-)
>
> I haven't made a patch to fix this just yet. Any suggestions on whether
> it'd be worthwhile to fix it in the VM or not?
>
> P.S. There's a second bug. When using "jstack -l" to diagnose the
> problem, a fastdebug build of the VM hit an assertion [4]. The code in
> DeadlockCycle assumed that all ObjectMonitors correspond to
> instanceOops, which is not (yet) the case. A instanceKlass or
> constantPool could also be directly locked with a ObjectMonitor, and
> that's not an instanceOop.
> I've made a patch against the current jdk8/jdk8/hotspot to fix this
> second bug [5].
>
> When the Permanent Generation removal work completes, this would no
> longer be a problem, because the klass hierarchy won't be oops anymore,
> and the klassKlass's will be gone. I can see that in Jon's latest patch
> [6] already, but there should still be some time before the work is
> complete. Meanwhile people may still run into this issue.
>
> Regards,
> Kris Mok
>
> [1]: https://gist.github.com/2000950#file_trace_system_gc_call.java
> [2]: https://gist.github.com/2158975#file_deadlock_stack_trace.log
> [3]: https://gist.github.com/2163070#file_test_deadlock.java
> [4]: https://gist.github.com/2163070#file_hit_assertion_in_fastdebug
> [5]: https://gist.github.com/2163070#file_fix_assertion_jdk8.patch
> [6]:
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2012-March/005441.html
>