[15] RFR(S): 8230402: Allocation of compile task fails with assert: "Leaking compilation tasks?"
Christian Hagedorn
christian.hagedorn at oracle.com
Wed Apr 29 09:26:30 UTC 2020
Hi Vladimir
On 29.04.20 03:43, Vladimir Kozlov wrote:
> On 4/28/20 8:36 AM, Christian Hagedorn wrote:
>> Hi Vladimir
>>
>>>>> May be we should mark methods which are removed from queue or use
>>>>> counters decay or use other mechanisms to prevent methods be put
>>>>> back into queue immediately because their counters are high. You
>>>>> may not need to remove half of queue in such case.
>>>>
>>>> You mean we could, for example, just reset the invocation and
>>>> backedge counters of removed methods from the queue? This would
>>>> probably be beneficial in a more general case than in my test case
>>>> where each method is only executed twice. About the number of tasks
>>>> to drop, it was just a best guess. We can also choose to drop fewer.
>>>> But it is probably hard to determine a best value in general.
>>>
>>> An other thought. Instead of removing tasks from queue may be we
>>> should not put new tasks on queue when it become almost full (but
>>> continue profiling methods). For that we need a parameter (or
>>> diagnostic flag) instead of 10000 number.
>>
>> That also sounds reasonable. But then we might miss on new hot methods
>> while the queue could contain many cold methods.
>
> The hotter a method the sooner it will be put on queue for compilation.
> The only case I can think of is recompilation of hot method due to
> deoptimization. May be spending more time in Interpreter is not bad thing.
Yes, that seems okay.
>>
>>> We are not using counters decay in Tiered mode because we are
>>> loosing/corrupting profiling data with it. We should avoid this. I
>>> just gave an example of what could be done.
>>
>> Okay.
>>
>>> One concern I have is that before it was check in debug VM. Now we
>>> putting limitation on compilations in product VM which may affect
>>> performance in some cases. We should check that.
>>
>> I ran some standard benchmarks and have not observed any regressions.
>> However, I also ran these benchmarks with the original code and
>> substituted the assert by a guarantee. It was never hit (i.e. the new
>> code never executed and thus had no influence). It could still affect
>> other benchmarks and programs in some unexpected way. But I think it
>> is very unlikely to hit the threshold in a normal program.
>>
>> Therefore, I think it is kinda a trade-off between complexity of the
>> solution and likelihood that those special cases
>
> I completely agree with you on this.
>
>> occur. So, to summarize all the current options:
>> 1) Just remove the assert. But then we miss cases where we actually
>> have duplicates in the queue
>> 2) webrev.01 to drop half of the tasks. We can check for duplicates
>> before dropping. Can also change the number of tasks to drop.
>> a) do it only in debug builds. Performance would not be significant.
>> b) do it also in product builds. New limitation that might impact
>> performance on some other benchmarks/programs but I think it's not
>> very likely.
>> 3a/b) Do the same without excluding WhiteBoxAPI and/or UseJVMCICompiler
>> 4) Use your suggestion to stop enqueuing tasks at a certain threshold
>> with a parameter or flag. Simple solution but might miss some new hot
>> methods and it needs a different trigger to check for duplicates to
>> avoid checking it too many times.
>> a) do it only in debug builds. Performance would not be significant.
>> b) do it also in product builds. Performance might be impacted on
>> some other benchmarks/programs but I think not very likely.
>> 5a/b) Do the same without excluding WhiteBoxAPI and/or UseJVMCICompiler
>
> There should be no duplicates in queue - we check for that:
> http://hg.openjdk.java.net/jdk/jdk/file/06745527c7b8/src/hotspot/share/compiler/compileBroker.cpp#l1120
>
> Unless we screwed up JVM_ACC_QUEUED bit setting.
I see, then it might not really be necessary to check for duplicates yet
again.
> I think original assert was added in 8040798 to check that a task is
> always put into _task_free_list when we finish compilation (or abort
> compilation).
Thanks for clearing that up. I was not aware of that intention. I first
thought it had only the purpose of finding strange things inside the
compile queue.
> I don't think we should have different behavior (remove tasks from queue
> or not put task in queue) in product and debug builds. If we do that we
> have to do in both. We use mostly fastdebug build for testing - we
> should execute the same code as in product as close as possible.
I agree with that. Doing it only in debug builds leads to a too
different behavior.
> The real case is when all C2 compiler threads hangs (or takes very long
> time to compile) and no progress is done on compiling other methods in
> queue. But it should be very rare that all compiling threads hangs. And
> we can catch such cases by other checks.
> > Based on all that I would go with 1). As you pointed in bug report we
> not observing this assert anymore (only with hand made test).
Thank you for explaining it in more detail. When we have other checks
that can detect such a situation where all compiling threads are hanging
then we are probably fine by just removing that assert.
I updated my webrev with that option. I left the updated stress test there:
http://cr.openjdk.java.net/~chagedorn/8230402/webrev.02/
What do others think?
Best regards,
Christian
> Thanks,
> Vladimir
>
>>
>> What do you think?
>>
>> Best regards,
>> Christian
>>
>>>>
>>>>>
>>>>> On 4/24/20 7:37 AM, Christian Hagedorn wrote:
>>>>>> Hi
>>>>>>
>>>>>> Please review the following patch:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8230402
>>>>>> http://cr.openjdk.java.net/~chagedorn/8230402/webrev.00/
>>>>>>
>>>>>> This assert was hit very intermittently in an internal test until
>>>>>> jdk-14+19. The test was changed afterwards and the assert was not
>>>>>> observed to fail anymore. However, the problem of having too many
>>>>>> tasks in the queue is still present (i.e. the compile queue is
>>>>>> growing too quickly and the compiler(s) too slow to catch up).
>>>>>> This assert can easily be hit by creating many class loaders which
>>>>>> load many methods which are immediately compiled by setting a low
>>>>>> compilation threshold as used in runA() in the testcase.
>>>>>>
>>>>>> Therefore, I suggest to tackle this problem with a general
>>>>>> solution to drop half of the compilation tasks in
>>>>>> CompileQueue::add() when a queue size of 10000 is reached and none
>>>>>> of the other conditions of this assert hold (no Whitebox or JVMCI
>>>>>> compiler). For tiered compilation, the tasks with the lowest
>>>>>> method weight() or which are unloaded are removed from the queue
>>>>>> (without altering the order of the remaining tasks in the queue).
>>>>>> Without tiered compilation (i.e. SimpleCompPolicy), the tasks from
>>>>>> the tail of the queue are removed. An additional verification in
>>>>>> debug builds should ensure that there are no duplicated tasks. I
>>>>>> assume that part of the reason of the original assert was to
>>>>>> detect such duplicates.
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> Best regards,
>>>>>> Christian
>>>>>>
More information about the hotspot-compiler-dev
mailing list