[15] RFR(S): 8230402: Allocation of compile task fails with assert: "Leaking compilation tasks?"
Christian Hagedorn
christian.hagedorn at oracle.com
Thu Apr 30 06:46:05 UTC 2020
Thank you Vladimir for your review!
Best regards,
Christian
On 29.04.20 19:07, Vladimir Kozlov wrote:
> webrev.02 looks good to me.
>
> Thanks,
> Vladimir
>
> On 4/29/20 2:26 AM, Christian Hagedorn wrote:
>> Hi Vladimir
>>
>> On 29.04.20 03:43, Vladimir Kozlov wrote:
>>> On 4/28/20 8:36 AM, Christian Hagedorn wrote:
>>>> Hi Vladimir
>>>>
>>>>>>> May be we should mark methods which are removed from queue or use
>>>>>>> counters decay or use other mechanisms to prevent methods be put
>>>>>>> back into queue immediately because their counters are high. You
>>>>>>> may not need to remove half of queue in such case.
>>>>>>
>>>>>> You mean we could, for example, just reset the invocation and
>>>>>> backedge counters of removed methods from the queue? This would
>>>>>> probably be beneficial in a more general case than in my test case
>>>>>> where each method is only executed twice. About the number of
>>>>>> tasks to drop, it was just a best guess. We can also choose to
>>>>>> drop fewer. But it is probably hard to determine a best value in
>>>>>> general.
>>>>>
>>>>> An other thought. Instead of removing tasks from queue may be we
>>>>> should not put new tasks on queue when it become almost full (but
>>>>> continue profiling methods). For that we need a parameter (or
>>>>> diagnostic flag) instead of 10000 number.
>>>>
>>>> That also sounds reasonable. But then we might miss on new hot
>>>> methods while the queue could contain many cold methods.
>>>
>>> The hotter a method the sooner it will be put on queue for
>>> compilation. The only case I can think of is recompilation of hot
>>> method due to deoptimization. May be spending more time in
>>> Interpreter is not bad thing.
>>
>> Yes, that seems okay.
>>
>>>>
>>>>> We are not using counters decay in Tiered mode because we are
>>>>> loosing/corrupting profiling data with it. We should avoid this. I
>>>>> just gave an example of what could be done.
>>>>
>>>> Okay.
>>>>
>>>>> One concern I have is that before it was check in debug VM. Now we
>>>>> putting limitation on compilations in product VM which may affect
>>>>> performance in some cases. We should check that.
>>>>
>>>> I ran some standard benchmarks and have not observed any
>>>> regressions. However, I also ran these benchmarks with the original
>>>> code and substituted the assert by a guarantee. It was never hit
>>>> (i.e. the new code never executed and thus had no influence). It
>>>> could still affect other benchmarks and programs in some unexpected
>>>> way. But I think it is very unlikely to hit the threshold in a
>>>> normal program.
>>>>
>>>> Therefore, I think it is kinda a trade-off between complexity of the
>>>> solution and likelihood that those special cases
>>>
>>> I completely agree with you on this.
>>>
>>>> occur. So, to summarize all the current options:
>>>> 1) Just remove the assert. But then we miss cases where we actually
>>>> have duplicates in the queue
>>>> 2) webrev.01 to drop half of the tasks. We can check for duplicates
>>>> before dropping. Can also change the number of tasks to drop.
>>>> a) do it only in debug builds. Performance would not be
>>>> significant.
>>>> b) do it also in product builds. New limitation that might
>>>> impact performance on some other benchmarks/programs but I think
>>>> it's not very likely.
>>>> 3a/b) Do the same without excluding WhiteBoxAPI and/or UseJVMCICompiler
>>>> 4) Use your suggestion to stop enqueuing tasks at a certain
>>>> threshold with a parameter or flag. Simple solution but might miss
>>>> some new hot methods and it needs a different trigger to check for
>>>> duplicates to avoid checking it too many times.
>>>> a) do it only in debug builds. Performance would not be
>>>> significant.
>>>> b) do it also in product builds. Performance might be impacted
>>>> on some other benchmarks/programs but I think not very likely.
>>>> 5a/b) Do the same without excluding WhiteBoxAPI and/or UseJVMCICompiler
>>>
>>> There should be no duplicates in queue - we check for that:
>>> http://hg.openjdk.java.net/jdk/jdk/file/06745527c7b8/src/hotspot/share/compiler/compileBroker.cpp#l1120
>>>
>>> Unless we screwed up JVM_ACC_QUEUED bit setting.
>>
>> I see, then it might not really be necessary to check for duplicates
>> yet again.
>>
>>> I think original assert was added in 8040798 to check that a task is
>>> always put into _task_free_list when we finish compilation (or abort
>>> compilation).
>>
>> Thanks for clearing that up. I was not aware of that intention. I
>> first thought it had only the purpose of finding strange things inside
>> the compile queue.
>>
>>> I don't think we should have different behavior (remove tasks from
>>> queue or not put task in queue) in product and debug builds. If we do
>>> that we have to do in both. We use mostly fastdebug build for testing
>>> - we should execute the same code as in product as close as possible.
>>
>> I agree with that. Doing it only in debug builds leads to a too
>> different behavior.
>>
>>> The real case is when all C2 compiler threads hangs (or takes very
>>> long time to compile) and no progress is done on compiling other
>>> methods in queue. But it should be very rare that all compiling
>>> threads hangs. And we can catch such cases by other checks.
>>> > Based on all that I would go with 1). As you pointed in bug report we
>>> not observing this assert anymore (only with hand made test).
>>
>> Thank you for explaining it in more detail. When we have other checks
>> that can detect such a situation where all compiling threads are
>> hanging then we are probably fine by just removing that assert.
>>
>> I updated my webrev with that option. I left the updated stress test
>> there:
>> http://cr.openjdk.java.net/~chagedorn/8230402/webrev.02/
>>
>> What do others think?
>>
>> Best regards,
>> Christian
>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> What do you think?
>>>>
>>>> Best regards,
>>>> Christian
>>>>
>>>>>>
>>>>>>>
>>>>>>> On 4/24/20 7:37 AM, Christian Hagedorn wrote:
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> Please review the following patch:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8230402
>>>>>>>> http://cr.openjdk.java.net/~chagedorn/8230402/webrev.00/
>>>>>>>>
>>>>>>>> This assert was hit very intermittently in an internal test
>>>>>>>> until jdk-14+19. The test was changed afterwards and the assert
>>>>>>>> was not observed to fail anymore. However, the problem of having
>>>>>>>> too many tasks in the queue is still present (i.e. the compile
>>>>>>>> queue is growing too quickly and the compiler(s) too slow to
>>>>>>>> catch up). This assert can easily be hit by creating many class
>>>>>>>> loaders which load many methods which are immediately compiled
>>>>>>>> by setting a low compilation threshold as used in runA() in the
>>>>>>>> testcase.
>>>>>>>>
>>>>>>>> Therefore, I suggest to tackle this problem with a general
>>>>>>>> solution to drop half of the compilation tasks in
>>>>>>>> CompileQueue::add() when a queue size of 10000 is reached and
>>>>>>>> none of the other conditions of this assert hold (no Whitebox or
>>>>>>>> JVMCI compiler). For tiered compilation, the tasks with the
>>>>>>>> lowest method weight() or which are unloaded are removed from
>>>>>>>> the queue (without altering the order of the remaining tasks in
>>>>>>>> the queue). Without tiered compilation (i.e. SimpleCompPolicy),
>>>>>>>> the tasks from the tail of the queue are removed. An additional
>>>>>>>> verification in debug builds should ensure that there are no
>>>>>>>> duplicated tasks. I assume that part of the reason of the
>>>>>>>> original assert was to detect such duplicates.
>>>>>>>>
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Christian
>>>>>>>>
More information about the hotspot-compiler-dev
mailing list