[15] RFR(S): 8230402: Allocation of compile task fails with assert: "Leaking compilation tasks?"

Wed Apr 29 01:43:02 UTC 2020

On 4/28/20 8:36 AM, Christian Hagedorn wrote:
> Hi Vladimir
> 
>>>> May be we should mark methods which are removed from queue or use counters decay or use other mechanisms to prevent 
>>>> methods be put back into queue immediately because their counters are high. You may not need to remove half of queue 
>>>> in such case.
>>>
>>> You mean we could, for example, just reset the invocation and backedge counters of removed methods from the queue? 
>>> This would probably be beneficial in a more general case than in my test case where each method is only executed 
>>> twice. About the number of tasks to drop, it was just a best guess. We can also choose to drop fewer. But it is 
>>> probably hard to determine a best value in general.
>>
>> An other thought. Instead of removing tasks from queue may be we should not put new tasks on queue when it become 
>> almost full (but continue profiling methods). For that we need a parameter (or diagnostic flag) instead of 10000 number.
> 
> That also sounds reasonable. But then we might miss on new hot methods while the queue could contain many cold methods.

The hotter a method the sooner it will be put on queue for compilation. The only case I can think of is recompilation of 
hot method due to deoptimization. May be spending more time in Interpreter is not bad thing.

> 
>> We are not using counters decay in Tiered mode because we are loosing/corrupting profiling data with it. We should 
>> avoid this. I just gave an example of what could be done.
> 
> Okay.
> 
>> One concern I have is that before it was check in debug VM. Now we putting limitation on compilations in product VM 
>> which may affect performance in some cases. We should check that.
> 
> I ran some standard benchmarks and have not observed any regressions. However, I also ran these benchmarks with the 
> original code and substituted the assert by a guarantee. It was never hit (i.e. the new code never executed and thus had 
> no influence). It could still affect other benchmarks and programs in some unexpected way. But I think it is very 
> unlikely to hit the threshold in a normal program.
> 
> Therefore, I think it is kinda a trade-off between complexity of the solution and likelihood that those special cases 

I completely agree with you on this.

> occur. So, to summarize all the current options:
> 1) Just remove the assert. But then we miss cases where we actually have duplicates in the queue
> 2) webrev.01 to drop half of the tasks. We can check for duplicates before dropping. Can also change the number of tasks 
> to drop.
>      a) do it only in debug builds. Performance would not be significant.
>      b) do it also in product builds. New limitation that might impact performance on some other benchmarks/programs but 
> I think it's not very likely.
> 3a/b) Do the same without excluding WhiteBoxAPI and/or UseJVMCICompiler
> 4) Use your suggestion to stop enqueuing tasks at a certain threshold with a parameter or flag. Simple solution but 
> might miss some new hot methods and it needs a different trigger to check for duplicates to avoid checking it too many 
> times.
>      a) do it only in debug builds. Performance would not be significant.
>      b) do it also in product builds. Performance might be impacted on some other benchmarks/programs but I think not 
> very likely.
> 5a/b) Do the same without excluding WhiteBoxAPI and/or UseJVMCICompiler

There should be no duplicates in queue - we check for that:
http://hg.openjdk.java.net/jdk/jdk/file/06745527c7b8/src/hotspot/share/compiler/compileBroker.cpp#l1120
Unless we screwed up JVM_ACC_QUEUED bit setting.

I think original assert was added in 8040798 to check that a task is always put into _task_free_list when we finish 
compilation (or abort compilation).

The real case is when all C2 compiler threads hangs (or takes very long time to compile) and no progress is done on 
compiling other methods in queue. But it should be very rare that all compiling threads hangs. And we can catch such 
cases by other checks.

I don't think we should have different behavior (remove tasks from queue or not put task in queue) in product and debug 
builds. If we do that we have to do in both. We use mostly fastdebug build for testing - we should execute the same code 
as in product as close as possible.

Based on all that I would go with 1).  As you pointed in bug report we not observing this assert anymore (only with hand 
made test).

Thanks,
Vladimir

> 
> What do you think?
> 
> Best regards,
> Christian
> 
>>>
>>>>
>>>> On 4/24/20 7:37 AM, Christian Hagedorn wrote:
>>>>> Hi
>>>>>
>>>>> Please review the following patch:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8230402
>>>>> http://cr.openjdk.java.net/~chagedorn/8230402/webrev.00/
>>>>>
>>>>> This assert was hit very intermittently in an internal test until jdk-14+19. The test was changed afterwards and 
>>>>> the assert was not observed to fail anymore. However, the problem of having too many tasks in the queue is still 
>>>>> present (i.e. the compile queue is growing too quickly and the compiler(s) too slow to catch up). This assert can 
>>>>> easily be hit by creating many class loaders which load many methods which are immediately compiled by setting a 
>>>>> low compilation threshold as used in runA() in the testcase.
>>>>>
>>>>> Therefore, I suggest to tackle this problem with a general solution to drop half of the compilation tasks in 
>>>>> CompileQueue::add() when a queue size of 10000 is reached and none of the other conditions of this assert hold (no 
>>>>> Whitebox or JVMCI compiler). For tiered compilation, the tasks with the lowest method weight() or which are 
>>>>> unloaded are removed from the queue (without altering the order of the remaining tasks in the queue). Without 
>>>>> tiered compilation (i.e. SimpleCompPolicy), the tasks from the tail of the queue are removed. An additional 
>>>>> verification in debug builds should ensure that there are no duplicated tasks. I assume that part of the reason of 
>>>>> the original assert was to detect such duplicates.
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Best regards,
>>>>> Christian
>>>>>