RFR (S): 8165859: gcTaskThread is missing volatile qualifier and barriers for _time_stamps

Tue Sep 20 19:47:03 UTC 2016

> On Sep 20, 2016, at 10:36 AM, Carsten Varming <varming at gmail.com> wrote:
> 
> Dear Erik,
> 
> See inline.
> 
> On Tue, Sep 20, 2016 at 4:49 AM, Erik Österlund <erik.osterlund at oracle.com> wrote:
>> I believe Kim is referring precisely to the release in the fence before the compare-and-exchange in that comment you refer to, rather than the membar Store* after the fence.
>> 
> In this case there is no data, written prior to a successful exchange on _time_stamps, being passed to the reader of _time_stamps, so the fence before the exchange is not needed. In fact, it seems to me that a relaxed atomic cmpxchg (using memory_order_relaxed) is sufficient for the update and a plain read is sufficient to read _time_stamps.

There is the initialization of the contents of the new time_stamps
array.  Presently that seems to be nothing (so nothing that I suspect
undefined behavior), but that could/should change.

On the other hand, now that I'm looking at this again, why is this
code using cmpxchg_ptr at all?  [Carsten - Thanks for making me
look again.]

The _time_stamps is per-GCTaskThread.  The only potential concurrent
access is in print_task_time_stamps; indeed, there is a comment in
GCTaskThread::run about that, right before the (plain) increment of
_time_stamp_index, indicating the order there is important.  If that's
actually true then the plain increment is also wrong, as is the
corresponding read.

But it doesn't look to me as if print_task_time_stamps is actually
called concurrently with a running task.  The only caller I found is
GCTaskManager::print_task_time_stamps, which seems to only be called
at the end of a scavange or parallel compaction to report statistics,
presumably with all the tasks complete.

The more I look at this, the more confused I become.  At this point I
think I have to retract my "looks good" and send this back to Erik for
re-examination.

If there isn't any concurrent access, then cmpxchg_ptr isn't needed.

If there is concurrent access, using locks would substantially
simplify the analysis and implementation, and since its a one-time
(per thread) lock, there's no real benefit to being lock-free.

I also wonder why print_task_time_stamps calls time_stamp_at, given
that it's already got the _time_stamps and could just iterate over it.

Unfortunately, I'm coming to believe there's some questionable code
here, and throwing a volatile and maybe a barrier or two at it doesn't
look like an improvement to me.

>> I would prefer to have explicit memory ordering between the linked producer and consumer pair, as Kim suggests. This is how the JMM implements sequential consistency - cooperatively with release_store_fence and load_acquire pairs on the same data.
>> 
> Hotspot is a C++ program. Mimicking the JMM in Hospot code is misguided (IMHO). In April, migrating the barriers towards the java model was suggested (http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019079.html), but the conclusion was a small move towards C++ 11 barriers (https://bugs.openjdk.java.net/browse/JDK-8155949). It would be great if there could be general consensus about the direction of changes to the memory barriers and their use in Hotspot (cc hotspot-runtime-dev as this was discussed there recently).

I’m hoping C++11 atomics is the direction we take.