[15] RFR(S): 8230402: Allocation of compile task fails with assert: "Leaking compilation tasks?"
Vladimir Kozlov
vladimir.kozlov at oracle.com
Wed Apr 29 17:07:39 UTC 2020
webrev.02 looks good to me.
Thanks,
Vladimir
On 4/29/20 2:26 AM, Christian Hagedorn wrote:
> Hi Vladimir
>
> On 29.04.20 03:43, Vladimir Kozlov wrote:
>> On 4/28/20 8:36 AM, Christian Hagedorn wrote:
>>> Hi Vladimir
>>>
>>>>>> May be we should mark methods which are removed from queue or use counters decay or use other mechanisms to
>>>>>> prevent methods be put back into queue immediately because their counters are high. You may not need to remove
>>>>>> half of queue in such case.
>>>>>
>>>>> You mean we could, for example, just reset the invocation and backedge counters of removed methods from the queue?
>>>>> This would probably be beneficial in a more general case than in my test case where each method is only executed
>>>>> twice. About the number of tasks to drop, it was just a best guess. We can also choose to drop fewer. But it is
>>>>> probably hard to determine a best value in general.
>>>>
>>>> An other thought. Instead of removing tasks from queue may be we should not put new tasks on queue when it become
>>>> almost full (but continue profiling methods). For that we need a parameter (or diagnostic flag) instead of 10000
>>>> number.
>>>
>>> That also sounds reasonable. But then we might miss on new hot methods while the queue could contain many cold methods.
>>
>> The hotter a method the sooner it will be put on queue for compilation. The only case I can think of is recompilation
>> of hot method due to deoptimization. May be spending more time in Interpreter is not bad thing.
>
> Yes, that seems okay.
>
>>>
>>>> We are not using counters decay in Tiered mode because we are loosing/corrupting profiling data with it. We should
>>>> avoid this. I just gave an example of what could be done.
>>>
>>> Okay.
>>>
>>>> One concern I have is that before it was check in debug VM. Now we putting limitation on compilations in product VM
>>>> which may affect performance in some cases. We should check that.
>>>
>>> I ran some standard benchmarks and have not observed any regressions. However, I also ran these benchmarks with the
>>> original code and substituted the assert by a guarantee. It was never hit (i.e. the new code never executed and thus
>>> had no influence). It could still affect other benchmarks and programs in some unexpected way. But I think it is very
>>> unlikely to hit the threshold in a normal program.
>>>
>>> Therefore, I think it is kinda a trade-off between complexity of the solution and likelihood that those special cases
>>
>> I completely agree with you on this.
>>
>>> occur. So, to summarize all the current options:
>>> 1) Just remove the assert. But then we miss cases where we actually have duplicates in the queue
>>> 2) webrev.01 to drop half of the tasks. We can check for duplicates before dropping. Can also change the number of
>>> tasks to drop.
>>> a) do it only in debug builds. Performance would not be significant.
>>> b) do it also in product builds. New limitation that might impact performance on some other benchmarks/programs
>>> but I think it's not very likely.
>>> 3a/b) Do the same without excluding WhiteBoxAPI and/or UseJVMCICompiler
>>> 4) Use your suggestion to stop enqueuing tasks at a certain threshold with a parameter or flag. Simple solution but
>>> might miss some new hot methods and it needs a different trigger to check for duplicates to avoid checking it too
>>> many times.
>>> a) do it only in debug builds. Performance would not be significant.
>>> b) do it also in product builds. Performance might be impacted on some other benchmarks/programs but I think not
>>> very likely.
>>> 5a/b) Do the same without excluding WhiteBoxAPI and/or UseJVMCICompiler
>>
>> There should be no duplicates in queue - we check for that:
>> http://hg.openjdk.java.net/jdk/jdk/file/06745527c7b8/src/hotspot/share/compiler/compileBroker.cpp#l1120
>> Unless we screwed up JVM_ACC_QUEUED bit setting.
>
> I see, then it might not really be necessary to check for duplicates yet again.
>
>> I think original assert was added in 8040798 to check that a task is always put into _task_free_list when we finish
>> compilation (or abort compilation).
>
> Thanks for clearing that up. I was not aware of that intention. I first thought it had only the purpose of finding
> strange things inside the compile queue.
>
>> I don't think we should have different behavior (remove tasks from queue or not put task in queue) in product and
>> debug builds. If we do that we have to do in both. We use mostly fastdebug build for testing - we should execute the
>> same code as in product as close as possible.
>
> I agree with that. Doing it only in debug builds leads to a too different behavior.
>
>> The real case is when all C2 compiler threads hangs (or takes very long time to compile) and no progress is done on
>> compiling other methods in queue. But it should be very rare that all compiling threads hangs. And we can catch such
>> cases by other checks.
>> > Based on all that I would go with 1). As you pointed in bug report we
>> not observing this assert anymore (only with hand made test).
>
> Thank you for explaining it in more detail. When we have other checks that can detect such a situation where all
> compiling threads are hanging then we are probably fine by just removing that assert.
>
> I updated my webrev with that option. I left the updated stress test there:
> http://cr.openjdk.java.net/~chagedorn/8230402/webrev.02/
>
> What do others think?
>
> Best regards,
> Christian
>
>> Thanks,
>> Vladimir
>>
>>>
>>> What do you think?
>>>
>>> Best regards,
>>> Christian
>>>
>>>>>
>>>>>>
>>>>>> On 4/24/20 7:37 AM, Christian Hagedorn wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>> Please review the following patch:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8230402
>>>>>>> http://cr.openjdk.java.net/~chagedorn/8230402/webrev.00/
>>>>>>>
>>>>>>> This assert was hit very intermittently in an internal test until jdk-14+19. The test was changed afterwards and
>>>>>>> the assert was not observed to fail anymore. However, the problem of having too many tasks in the queue is still
>>>>>>> present (i.e. the compile queue is growing too quickly and the compiler(s) too slow to catch up). This assert can
>>>>>>> easily be hit by creating many class loaders which load many methods which are immediately compiled by setting a
>>>>>>> low compilation threshold as used in runA() in the testcase.
>>>>>>>
>>>>>>> Therefore, I suggest to tackle this problem with a general solution to drop half of the compilation tasks in
>>>>>>> CompileQueue::add() when a queue size of 10000 is reached and none of the other conditions of this assert hold
>>>>>>> (no Whitebox or JVMCI compiler). For tiered compilation, the tasks with the lowest method weight() or which are
>>>>>>> unloaded are removed from the queue (without altering the order of the remaining tasks in the queue). Without
>>>>>>> tiered compilation (i.e. SimpleCompPolicy), the tasks from the tail of the queue are removed. An additional
>>>>>>> verification in debug builds should ensure that there are no duplicated tasks. I assume that part of the reason
>>>>>>> of the original assert was to detect such duplicates.
>>>>>>>
>>>>>>> Thank you!
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Christian
>>>>>>>
More information about the hotspot-compiler-dev
mailing list