RFR: 8348960: [leyden] compiler/c1/TestConcurrentPatching.java is stuck [v2]
Vladimir Ivanov
vlivanov at openjdk.org
Thu Jan 30 20:06:30 UTC 2025
On Thu, 30 Jan 2025 18:49:40 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
>> This is seen in GHA, and reproduces well on my machine as well:
>>
>>
>> $ CONF=linux-x86_64-server-fastdebug make images test TEST=compiler/c1/TestConcurrentPatching.java
>> <stuck, timeout>
>>
>>
>> Test runs with `-Xcomp`. gdb "thread apply all bt" shows the compilers are idle. Supplying `-XX:-UseLockFreeCompileQueues` makes the test pass. I believe there is a bug in `UseLockFreeCompileQueues` in leyden repo.
>>
>> The comment hopefully explains what happens here. This is a corner case that seems to reproduce on the test that runs `-Xcomp` with a very few compilations.
>>
>> Additional testing:
>> - [x] GHA
>> - [x] Linux x86_64 server fastdebug, `compiler/c1/TestConcurrentPatching.java`, 100x
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>
> Avoid recursion in more bullet-proof way
Thinking more about it, does the bug arise in `CompileQueue::free_all()` due to absence of `purge_stale_task()` call? The only other place where `transfer_pending()` is used is `CompileQueue::get()`, but I don't see how it can avoid calling `purge_stale_task()`.
Would a `purge_stale_task()` call in `CompileQueue::free_all()` fix the problem as well?
Or are compiler threads simply stuck in `while (_first == nullptr) { ... }` in `CompileQueue::get()` waiting for more compilations while stale task queue remains non-empty?
Overall, I'd prefer an explicit call to `purge_stale_task()` rather than making all `transfer_pending()` calls becoming a point where MCQ lock can be released.
-------------
PR Comment: https://git.openjdk.org/leyden/pull/30#issuecomment-2625427093
PR Comment: https://git.openjdk.org/leyden/pull/30#issuecomment-2625448784
More information about the leyden-dev
mailing list