[13] RFR(M): 8163511: Allocation of compile task fails with assert: "Leaking compilation tasks?"

Tue Mar 5 08:09:41 UTC 2019

Hi Tobias,

> On 27 Feb 2019, at 13:13, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
> 
> Hi Vladimir,
> 
> thanks for the review!
> 
>> On 26.02.19 19:36, Vladimir Kozlov wrote:
>> My only concern is CompileTask::select_for_compilation() where you may have small window between
>> releasing weak reference and adding strong. Can we add strong reference fist before removing weak?
>> 
>> With concurrent class unloading we may hit the assert in select_for_compilation() I think. May be
>> make the method get task as argument and return NULL if is_unloaded() is true. What do you think?
> 
> I thought we should be safe because we can't have a safepoint after the is_unloaded() check in
> CompilationPolicy::policy()->select_task() and this assert (there is a NoSafepointVerifier in the
> caller). But I'm not sure what the guarantees are with concurrent class unloading.
> 
> I've changed the code to create the (local) handle before removing the weak handles:
> http://cr.openjdk.java.net/~thartmann/8163511/webrev.01/
> 
> Now I'm wondering what will happen if concurrent class unloading kicks in right after the
> is_unloaded() check and before we create the method_holder handle.
> 
> I think someone from GC/runtime (Erik Österlund?) should have a look at this.

Liveness of objects only change in safepoints. So if you are in the vm thread state, and ask if a holder is alive, its result is stable.

With concurrent class unloading, marking terminates in a safepoint. At that snapshot in time, it is determined what is alive (strongly and through finalizers) and dead (although it might be lazily evaluated during concurrent execution). When that safepoint is released, that view will be consistent.

As there are seemingly no safepoint checks in this code, all liveness information will be consistent and none of the imagined races exist with concurrent class unloading. It behaves as if it was all done in a safepoint, even though it was not.

The key is that the klass_holder liveness (as well as the jweaks you use) is determined using a phantom load via the access API. That phantom load gives consistent results in the vm thread state as-if reference processing was done in a safepoint. The phantom loads can lazily compute what the value would have been, had reference processing been done in a safepoint, as it has stable liveness information from that safepoint. Then concurrent reference processing races for producing the same values.

Basically there is nothing to worry about here as far as I can see. Looks good. Although freeing the jweaks in one place instead of two would make the code easier to follow. Up to you though and I don’t need another webrev for that.

Thanks,
/Erik

> 
>> Do you kept output format for flag -XX:+PrintCompilation and changed only hs_err output? Or you
>> changed both?
> 
> I've changed both because both are using CompileTask::print -> CompileTask::print_impl. What about
> guarding the additional output by -XX:+Verbose (see new webrev)?
> 
> Best regards,
> Tobias
> 
> P.S. I'll be out until Wednesday next week.
> 
> 
>>> On 2/26/19 8:39 AM, Tobias Hartmann wrote:
>>> Hi,
>>> 
>>> please review the following patch:
>>> https://bugs.openjdk.java.net/browse/JDK-8163511
>>> http://cr.openjdk.java.net/~thartmann/8163511/webrev.00/
>>> 
>>> We hit an assert while trying to allocate a compile task because the compile queue already contains
>>> 10.000 tasks. The root cause is that the compiler threads are not able to keep up with the high
>>> number of compilation requests. The (internal) test that triggers this runs many different tests in
>>> the same VM by using different class loaders. Many of these tests use the same framework and
>>> therefore the same methods are enqueued for compilation again and again (up to 100x).
>>> 
>>> The bug shows up more often when running with -XX:-TieredCompilation which is due to a lower number
>>> of compiler threads and a different threshold policy that does not remove stale (unused) methods
>>> from the compile queue. But the same problem can and does happen with TieredCompilation enabled (for
>>> example, we never remove "old" methods that were executed a lot, even if they are stale).
>>> 
>>> After a test has been executed, the class loader becomes dead and therefore all test classes/methods
>>> should be unloaded. However, a compile task in the queue keeps the referenced method alive through a
>>> global handle (see CompileTask::initialize -> JNIHandles::make_global) until compilation finished or
>>> the task was removed from the queue for another reason. This causes compilation of thousands of dead
>>> methods.
>>> 
>>> The regression test TestOverloadCompileQueues.java triggers this reliably for the C1 and C2 compile
>>> queues. The blue lines show the growth of the number of tasks in the queue until we hit the assert,
>>> the red lines show the growth with the fix:
>>> http://cr.openjdk.java.net/~thartmann/8163511/8163511_results.pdf
>>> 
>>> The proposed fix uses weak handles for the compile tasks that don't prevent unloading but can be
>>> used to check if the referenced method is still alive (CompileTask::is_unloaded()). Unloaded tasks
>>> are aggressively removed with both threshold policies. If the method is still alive when the task is
>>> selected for compilation, the weak references are replaced by strong references (to avoid unloading
>>> during compilation which would crash the compiler).
>>> 
>>> Of course, overloading the queue is still possible if classes are not unloaded fast enough or not at
>>> all (in the regression test, simply remove the System.gc() call). But I think this is very unlikely
>>> for real world applications and this assert does not affect product builds.
>>> 
>>> The fix also changes the printing of compile tasks such that the time when they were enqueued and
>>> the time when compilation started is shown in the hs_err file:
>>> 
>>> Threads with active compile tasks:
>>> C2 CompilerThread0    22446 18824 200 5372   4   java.lang.reflect.Method::<init> (68 bytes)
>>> C1 CompilerThread0    22446 21474 199 3377   2   jdk.internal.loader.URLClassPath$3::run (172 bytes)
>>> 
>>> I'm currently running extended testing.
>>> 
>>> Thanks,
>>> Tobias
>>>