deadlock in initialization of instanceKlass

Thu Mar 22 18:13:33 PDT 2012

Hi David,

Thanks for the quick reply. Comments inline:

On Fri, Mar 23, 2012 at 8:09 AM, David Holmes <david.holmes at oracle.com>wrote:

> Hi Kris,
>
> The parallel-classloading changes likely remove this problem from 7+
>
> I did check the value of the flag AllowParallelDefineClass, which
defaulted to false when I ran on JDK7/HS23 and JDK8/HS24. Haven't gotten
into the details of the parallel classloading changes yet, I wonder how
this flag affects the loading behavior?
I'll do some more experiments and see what change actually got rid of the
bug.

> I'm aware of other deadlock scenarios when agents get involved in the
> classloading process. There are some things that agents are allowed to do
> that simply should not be permitted <sigh>. Fixes tend not to be generic or
> else morph into parallel class loading :)
>
> I've seen some, too. Agents are powerful; sometimes folks just had to have
the way to shoot themselves in the foot ;-)
My min repro didn't use an agent. But it's easier to fall into this trap
with an agent.

So I'm not going to stick my head in and try to "fix" this bug, since the
problem doesn't show up in 7+ anyway.

>
> > P.S. There's a second bug. When using "jstack -l" to diagnose the
> > problem, a fastdebug build of the VM hit an assertion [4]. The code in
> > DeadlockCycle assumed that all ObjectMonitors correspond to
> > instanceOops, which is not (yet) the case. A instanceKlass or
> > constantPool could also be directly locked with a ObjectMonitor, and
> > that's not an instanceOop.
> > I've made a patch against the current jdk8/jdk8/hotspot to fix this
> > second bug [5].
>
> While there are a few ObjectMonitors used that are not associated with
> oops, they should not (barring bugs) be able to become part of a deadlock -
> hence the deadlock code assumes all monitors have an associated oop. I know
> we have had similar issues with JVMTI code in the past. I'll take a closer
> look and file a bug if there isn't already one. That said: the perm-gen
> removal work changes this again as the class initialization objectMonitor
> is attached to an int[0] created just for this purpose.
>
> On the int[0] stuff: yep, I saw it. The locker object for bootstrap class
loader is also an array, a "fake" int[0][]:

_system_loader_lock_obj = oopFactory::new_system_objArray(0, CHECK);

It's been like that since duke at 0...from what I can tell. Wonder why an
array was favored over something else, say, a java.lang.Byte instance,
which carries pretty much the same weight (in terms of object size)?

Thanks,
Kris Mok

> Thanks,
> David
>
>
> On 23/03/2012 9:44 AM, Krystal Mok wrote:
>
>> Hi,
>>
>> I've run into a corner case where deadlock happens during instanceKlass
>> linking, which I think could have been avoided by patching the VM. It's
>> mostly a JDK6-only issue, as I haven't been able to reproduce it on JDK7
>> or JDK8.
>>
>> In the original problem, the scenario was to start a Java process with a
>> BTrace script tracing from the beginning [1]. As the system started up,
>> two threads were trying to load two different classes with different
>> class loaders at about the same time [2]. Both of these class load
>> operations were delegated to BTrace's agent for instrumentation, where
>> both threads tried to create a new instance of a class called
>> "com.sun.btrace.runtime.**Instrumentor". This class was actually loaded
>> already, but not linked/initialized yet, so this was the time to do it.
>> And then a deadlock happened: One of the threads has locked the system
>> class loader before trying to link the class of
>> "com.sun.btrace.runtime.**Instrumentor"; where as the other thread went
>> ahead and started linking the class of
>> "com.sun.btrace.runtime.**Instrumentor" first, and for verification it
>> needed to load new classes, which in turn needed to lock the system
>> class loader.
>>
>> A simplified minimal repro of the original problem is available [3].
>> Caveat: it may take a few runs to actually see the deadlock. I've had a
>> better chance of reproducing it on JDK6/HS20, but harder on JDK6/HS24,
>> where everything seem to be faster. Haven't been able to reproduce it
>> with JDK7/HS23 or JDK8/HS24.
>>
>> The race in this case involves locking both an instanceKlass and the
>> lock object of its defining class loader. Since an instanceKlass isn't a
>> Java-level object, Java code couldn't have locked it on its own.
>> So I think there's a chance of getting around this deadlock by locking
>> the loader before locking the instanceKlass, for places where class
>> loading may happen during the period the instanceKlass is locked. Or
>> perhaps the complete_exit()/reenter() dance trick.
>>
>> A pragmatic way of solving the problem is to modify the application code
>> to get rid of the race in the first place. In this particular scenario,
>> that'd be modifying BTrace, forcing the Instrumentor class to link early
>> with a Class.forName() call. But as in the repro case, the problem seems
>> pretty generic, and could hit innocent-looking Java code.
>> (Of course, upgrading to JDK7 will get rid of the problem, too. Another
>> good case to push for an upgrade :-)
>>
>> I haven't made a patch to fix this just yet. Any suggestions on whether
>> it'd be worthwhile to fix it in the VM or not?
>>
>> P.S. There's a second bug. When using "jstack -l" to diagnose the
>> problem, a fastdebug build of the VM hit an assertion [4]. The code in
>> DeadlockCycle assumed that all ObjectMonitors correspond to
>> instanceOops, which is not (yet) the case. A instanceKlass or
>> constantPool could also be directly locked with a ObjectMonitor, and
>> that's not an instanceOop.
>> I've made a patch against the current jdk8/jdk8/hotspot to fix this
>> second bug [5].
>>
>> When the Permanent Generation removal work completes, this would no
>> longer be a problem, because the klass hierarchy won't be oops anymore,
>> and the klassKlass's will be gone. I can see that in Jon's latest patch
>> [6] already, but there should still be some time before the work is
>> complete. Meanwhile people may still run into this issue.
>>
>> Regards,
>> Kris Mok
>>
>> [1]: https://gist.github.com/**2000950#file_trace_system_gc_**call.java<https://gist.github.com/2000950#file_trace_system_gc_call.java>
>> [2]: https://gist.github.com/**2158975#file_deadlock_stack_**trace.log<https://gist.github.com/2158975#file_deadlock_stack_trace.log>
>> [3]: https://gist.github.com/**2163070#file_test_deadlock.**java<https://gist.github.com/2163070#file_test_deadlock.java>
>> [4]: https://gist.github.com/**2163070#file_hit_assertion_in_**fastdebug<https://gist.github.com/2163070#file_hit_assertion_in_fastdebug>
>> [5]: https://gist.github.com/**2163070#file_fix_assertion_**jdk8.patch<https://gist.github.com/2163070#file_fix_assertion_jdk8.patch>
>> [6]:
>> http://mail.openjdk.java.net/**pipermail/hotspot-dev/2012-**
>> March/005441.html<http://mail.openjdk.java.net/pipermail/hotspot-dev/2012-March/005441.html>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-dev/attachments/20120323/9cd86382/attachment.html