[13] RFR(M): 8163511: Allocation of compile task fails with assert: "Leaking compilation tasks?"
Tobias Hartmann
tobias.hartmann at oracle.com
Wed Feb 27 12:13:50 UTC 2019
Hi Vladimir,
thanks for the review!
On 26.02.19 19:36, Vladimir Kozlov wrote:
> My only concern is CompileTask::select_for_compilation() where you may have small window between
> releasing weak reference and adding strong. Can we add strong reference fist before removing weak?
>
> With concurrent class unloading we may hit the assert in select_for_compilation() I think. May be
> make the method get task as argument and return NULL if is_unloaded() is true. What do you think?
I thought we should be safe because we can't have a safepoint after the is_unloaded() check in
CompilationPolicy::policy()->select_task() and this assert (there is a NoSafepointVerifier in the
caller). But I'm not sure what the guarantees are with concurrent class unloading.
I've changed the code to create the (local) handle before removing the weak handles:
http://cr.openjdk.java.net/~thartmann/8163511/webrev.01/
Now I'm wondering what will happen if concurrent class unloading kicks in right after the
is_unloaded() check and before we create the method_holder handle.
I think someone from GC/runtime (Erik Österlund?) should have a look at this.
> Do you kept output format for flag -XX:+PrintCompilation and changed only hs_err output? Or you
> changed both?
I've changed both because both are using CompileTask::print -> CompileTask::print_impl. What about
guarding the additional output by -XX:+Verbose (see new webrev)?
Best regards,
Tobias
P.S. I'll be out until Wednesday next week.
> On 2/26/19 8:39 AM, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8163511
>> http://cr.openjdk.java.net/~thartmann/8163511/webrev.00/
>>
>> We hit an assert while trying to allocate a compile task because the compile queue already contains
>> 10.000 tasks. The root cause is that the compiler threads are not able to keep up with the high
>> number of compilation requests. The (internal) test that triggers this runs many different tests in
>> the same VM by using different class loaders. Many of these tests use the same framework and
>> therefore the same methods are enqueued for compilation again and again (up to 100x).
>>
>> The bug shows up more often when running with -XX:-TieredCompilation which is due to a lower number
>> of compiler threads and a different threshold policy that does not remove stale (unused) methods
>> from the compile queue. But the same problem can and does happen with TieredCompilation enabled (for
>> example, we never remove "old" methods that were executed a lot, even if they are stale).
>>
>> After a test has been executed, the class loader becomes dead and therefore all test classes/methods
>> should be unloaded. However, a compile task in the queue keeps the referenced method alive through a
>> global handle (see CompileTask::initialize -> JNIHandles::make_global) until compilation finished or
>> the task was removed from the queue for another reason. This causes compilation of thousands of dead
>> methods.
>>
>> The regression test TestOverloadCompileQueues.java triggers this reliably for the C1 and C2 compile
>> queues. The blue lines show the growth of the number of tasks in the queue until we hit the assert,
>> the red lines show the growth with the fix:
>> http://cr.openjdk.java.net/~thartmann/8163511/8163511_results.pdf
>>
>> The proposed fix uses weak handles for the compile tasks that don't prevent unloading but can be
>> used to check if the referenced method is still alive (CompileTask::is_unloaded()). Unloaded tasks
>> are aggressively removed with both threshold policies. If the method is still alive when the task is
>> selected for compilation, the weak references are replaced by strong references (to avoid unloading
>> during compilation which would crash the compiler).
>>
>> Of course, overloading the queue is still possible if classes are not unloaded fast enough or not at
>> all (in the regression test, simply remove the System.gc() call). But I think this is very unlikely
>> for real world applications and this assert does not affect product builds.
>>
>> The fix also changes the printing of compile tasks such that the time when they were enqueued and
>> the time when compilation started is shown in the hs_err file:
>>
>> Threads with active compile tasks:
>> C2 CompilerThread0 22446 18824 200 5372 4 java.lang.reflect.Method::<init> (68 bytes)
>> C1 CompilerThread0 22446 21474 199 3377 2 jdk.internal.loader.URLClassPath$3::run (172 bytes)
>>
>> I'm currently running extended testing.
>>
>> Thanks,
>> Tobias
>>
More information about the hotspot-dev
mailing list