[15] RFR(S): 8230402: Allocation of compile task fails with assert: "Leaking compilation tasks?"

Thu Apr 30 06:46:05 UTC 2020

Thank you Vladimir for your review!

Best regards,
Christian

On 29.04.20 19:07, Vladimir Kozlov wrote:
> webrev.02 looks good to me.
> 
> Thanks,
> Vladimir
> 
> On 4/29/20 2:26 AM, Christian Hagedorn wrote:
>> Hi Vladimir
>>
>> On 29.04.20 03:43, Vladimir Kozlov wrote:
>>> On 4/28/20 8:36 AM, Christian Hagedorn wrote:
>>>> Hi Vladimir
>>>>
>>>>>>> May be we should mark methods which are removed from queue or use 
>>>>>>> counters decay or use other mechanisms to prevent methods be put 
>>>>>>> back into queue immediately because their counters are high. You 
>>>>>>> may not need to remove half of queue in such case.
>>>>>>
>>>>>> You mean we could, for example, just reset the invocation and 
>>>>>> backedge counters of removed methods from the queue? This would 
>>>>>> probably be beneficial in a more general case than in my test case 
>>>>>> where each method is only executed twice. About the number of 
>>>>>> tasks to drop, it was just a best guess. We can also choose to 
>>>>>> drop fewer. But it is probably hard to determine a best value in 
>>>>>> general.
>>>>>
>>>>> An other thought. Instead of removing tasks from queue may be we 
>>>>> should not put new tasks on queue when it become almost full (but 
>>>>> continue profiling methods). For that we need a parameter (or 
>>>>> diagnostic flag) instead of 10000 number.
>>>>
>>>> That also sounds reasonable. But then we might miss on new hot 
>>>> methods while the queue could contain many cold methods.
>>>
>>> The hotter a method the sooner it will be put on queue for 
>>> compilation. The only case I can think of is recompilation of hot 
>>> method due to deoptimization. May be spending more time in 
>>> Interpreter is not bad thing.
>>
>> Yes, that seems okay.
>>
>>>>
>>>>> We are not using counters decay in Tiered mode because we are 
>>>>> loosing/corrupting profiling data with it. We should avoid this. I 
>>>>> just gave an example of what could be done.
>>>>
>>>> Okay.
>>>>
>>>>> One concern I have is that before it was check in debug VM. Now we 
>>>>> putting limitation on compilations in product VM which may affect 
>>>>> performance in some cases. We should check that.
>>>>
>>>> I ran some standard benchmarks and have not observed any 
>>>> regressions. However, I also ran these benchmarks with the original 
>>>> code and substituted the assert by a guarantee. It was never hit 
>>>> (i.e. the new code never executed and thus had no influence). It 
>>>> could still affect other benchmarks and programs in some unexpected 
>>>> way. But I think it is very unlikely to hit the threshold in a 
>>>> normal program.
>>>>
>>>> Therefore, I think it is kinda a trade-off between complexity of the 
>>>> solution and likelihood that those special cases 
>>>
>>> I completely agree with you on this.
>>>
>>>> occur. So, to summarize all the current options:
>>>> 1) Just remove the assert. But then we miss cases where we actually 
>>>> have duplicates in the queue
>>>> 2) webrev.01 to drop half of the tasks. We can check for duplicates 
>>>> before dropping. Can also change the number of tasks to drop.
>>>>      a) do it only in debug builds. Performance would not be 
>>>> significant.
>>>>      b) do it also in product builds. New limitation that might 
>>>> impact performance on some other benchmarks/programs but I think 
>>>> it's not very likely.
>>>> 3a/b) Do the same without excluding WhiteBoxAPI and/or UseJVMCICompiler
>>>> 4) Use your suggestion to stop enqueuing tasks at a certain 
>>>> threshold with a parameter or flag. Simple solution but might miss 
>>>> some new hot methods and it needs a different trigger to check for 
>>>> duplicates to avoid checking it too many times.
>>>>      a) do it only in debug builds. Performance would not be 
>>>> significant.
>>>>      b) do it also in product builds. Performance might be impacted 
>>>> on some other benchmarks/programs but I think not very likely.
>>>> 5a/b) Do the same without excluding WhiteBoxAPI and/or UseJVMCICompiler
>>>
>>> There should be no duplicates in queue - we check for that:
>>> http://hg.openjdk.java.net/jdk/jdk/file/06745527c7b8/src/hotspot/share/compiler/compileBroker.cpp#l1120 
>>>
>>> Unless we screwed up JVM_ACC_QUEUED bit setting.
>>
>> I see, then it might not really be necessary to check for duplicates 
>> yet again.
>>
>>> I think original assert was added in 8040798 to check that a task is 
>>> always put into _task_free_list when we finish compilation (or abort 
>>> compilation).
>>
>> Thanks for clearing that up. I was not aware of that intention. I 
>> first thought it had only the purpose of finding strange things inside 
>> the compile queue.
>>
>>> I don't think we should have different behavior (remove tasks from 
>>> queue or not put task in queue) in product and debug builds. If we do 
>>> that we have to do in both. We use mostly fastdebug build for testing 
>>> - we should execute the same code as in product as close as possible.
>>
>> I agree with that. Doing it only in debug builds leads to a too 
>> different behavior.
>>
>>> The real case is when all C2 compiler threads hangs (or takes very 
>>> long time to compile) and no progress is done on compiling other 
>>> methods in queue. But it should be very rare that all compiling 
>>> threads hangs. And we can catch such cases by other checks.
>>> > Based on all that I would go with 1).  As you pointed in bug report we
>>> not observing this assert anymore (only with hand made test).
>>
>> Thank you for explaining it in more detail. When we have other checks 
>> that can detect such a situation where all compiling threads are 
>> hanging then we are probably fine by just removing that assert.
>>
>> I updated my webrev with that option. I left the updated stress test 
>> there:
>> http://cr.openjdk.java.net/~chagedorn/8230402/webrev.02/
>>
>> What do others think?
>>
>> Best regards,
>> Christian
>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> What do you think?
>>>>
>>>> Best regards,
>>>> Christian
>>>>
>>>>>>
>>>>>>>
>>>>>>> On 4/24/20 7:37 AM, Christian Hagedorn wrote:
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> Please review the following patch:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8230402
>>>>>>>> http://cr.openjdk.java.net/~chagedorn/8230402/webrev.00/
>>>>>>>>
>>>>>>>> This assert was hit very intermittently in an internal test 
>>>>>>>> until jdk-14+19. The test was changed afterwards and the assert 
>>>>>>>> was not observed to fail anymore. However, the problem of having 
>>>>>>>> too many tasks in the queue is still present (i.e. the compile 
>>>>>>>> queue is growing too quickly and the compiler(s) too slow to 
>>>>>>>> catch up). This assert can easily be hit by creating many class 
>>>>>>>> loaders which load many methods which are immediately compiled 
>>>>>>>> by setting a low compilation threshold as used in runA() in the 
>>>>>>>> testcase.
>>>>>>>>
>>>>>>>> Therefore, I suggest to tackle this problem with a general 
>>>>>>>> solution to drop half of the compilation tasks in 
>>>>>>>> CompileQueue::add() when a queue size of 10000 is reached and 
>>>>>>>> none of the other conditions of this assert hold (no Whitebox or 
>>>>>>>> JVMCI compiler). For tiered compilation, the tasks with the 
>>>>>>>> lowest method weight() or which are unloaded are removed from 
>>>>>>>> the queue (without altering the order of the remaining tasks in 
>>>>>>>> the queue). Without tiered compilation (i.e. SimpleCompPolicy), 
>>>>>>>> the tasks from the tail of the queue are removed. An additional 
>>>>>>>> verification in debug builds should ensure that there are no 
>>>>>>>> duplicated tasks. I assume that part of the reason of the 
>>>>>>>> original assert was to detect such duplicates.
>>>>>>>>
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Christian
>>>>>>>>