RFR: 8263551: Provide shared lock-free FIFO queue implementation [v2]

Tue Mar 30 19:11:42 UTC 2021

On Tue, 30 Mar 2021 14:44:11 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:

>> Thanks for the suggestion, and yes this makes sense. Done. Could you double-check if the updated comments are appropriate?
>> 
>> My only concern is that the difference between operation_in_progress and lost_race may be too subtle for most client code. I suppose most client code can just do "retry until succeeded" like in the test.
>> I don't expect these two cases to differ much in run-time. Is there any performance data to show that the operation_in_progress case indeed takes much longer for retrying? I checked review threads for JDK-8237143 and JDK-8238867 but didn't find any.
>> 
>> Anyway, this CR should use the tri-status approach (and the intra-loop critical section approach), in order to avoid behavior change.
>
> In the "lost-race" cases, there was cmpxchg contention that the current thread lost. Retry is a reasonable thing to do, though that's up to the application. Success on retry is of course not guaranteed, because there could again be contention with some other thread, but at least somebody is making progress.
> 
> In the "operation-in-progress" cases the queue is in a state where the current operation cannot make progress until a different thread changes the state. If that other thread happens to have been descheduled or something like that, it could be a (comparatively) very long time before that happens, with lots of expensive spinning by a retrying try_pop.
> 
> So I think the comment for that state needs some more work.  Maybe something like this?
> 
> "An in-progress concurrent operation interfered with taking what had been the only remaining element in the queue. A concurrent try_pop may have already claimed it, but not completely updated the queue.  Alternatively, a concurrent push/append may have not yet linked the new entry(s) to the former sole entry. Retrying the try_pop will continue to fail in this way until that other thread has updated the queue's internal structure."
> 
> That was sufficient for the G1DCQS usage, so I stopped trying to do better.  (I don't even know if it's possible to do better, though I have some old notes on an idea.) But it's pretty ugly for a general utility, which is why I left this queue class buried inside G1DCQS rather than hoisting it out and generalizing it.

Thanks for the explanation. Updated as suggested.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2986