RFR: 8263551: Provide shared lock-free FIFO queue implementation [v2]
Man Cao
manc at openjdk.java.net
Tue Mar 30 08:27:47 UTC 2021
On Mon, 29 Mar 2021 07:56:19 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
>> Man Cao has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Address comment and add a gtest.
>
> src/hotspot/share/utilities/lockFreeQueue.inline.hpp line 100:
>
>> 98: template<typename T, T* volatile* (*next_ptr)(T&)>
>> 99: template<bool use_rcu>
>> 100: T* LockFreeQueue<T, next_ptr>::pop() {
>
> On further consideration I don't think this `use_rcu` conditionalized `pop` is the right path. The current behavior (with the embedded critical section) was for the specific use case in `G1DirtyCardQueueSet`. But for a general tool, I think a different approach is needed. I think better would be to not provide `pop()` at all, and instead provide `try_pop()`, which has a tri-status result: success, lost a race, or lost to an in-progress operation. So something like:
> enum class LockFreeQueuePopStatus {
> success,
> lost_race,
> operation_in_progress
> };
>
> // Member of LockFreeQueue<T, ...>
> // Executes the body of the old pop loop once, with appropriate
> // adjustments to return value and returning rather than retrying.
> Pair<LockFreeQueuePopStatus, T*> try_pop();
> Then let the specific use-case determine the context in which try_pop should be called and how to handle the various possible results.
>
> This eliminates `ConditionalCriticalSection` (which seems strange). This also eliminates the `G1DirtyCardQueueSet`-specific subclass of `LockFreeQueue`. Instead we have (private) `G1DirtyCardQueueSet::pop_queue()`:
> BufferNode* G1DirtyCardQueueSet::pop_queue() {
> using Status = LockFreeQueuePopStatus;
> Thread* current_thread = Thread::current();
> while (true) {
> GlobalCounter::CriticalSection cs(current_thread);
> Pair<Status, BufferNode*> pop_result = _completed.try_pop();
> switch (pop_result.first) {
> case Status::success: return pop_result.second;
> case Status::operation_in_progress: return nullptr;
> case Status::lost_race: break; // Try again.
> }
> }
> }
> I'm also not sure whether the G1 case actually needs the critical section inside the loop anymore. That might be a holdover from an earlier version where the operation-in-progress case did just loop to try again. It definitely does need a critical section though; the life cycle management for the BufferNodes depends on it.
Thanks for the suggestion, and yes this makes sense. Done. Could you double-check if the updated comments are appropriate?
My only concern is that the difference between operation_in_progress and lost_race may be too subtle for most client code. I suppose most client code can just do "retry until succeeded" like in the test.
I don't expect these two cases to differ much in run-time. Is there any performance data to show that the operation_in_progress case indeed takes much longer for retrying? I checked review threads for JDK-8237143 and JDK-8238867 but didn't find any.
Anyway, this CR should use the tri-status approach (and the intra-loop critical section approach), in order to avoid behavior change.
-------------
PR: https://git.openjdk.java.net/jdk/pull/2986
More information about the hotspot-gc-dev
mailing list