RFR: 8247213: G1: Reduce usage of volatile in favour of Atomic operations

Tue Jun 23 13:06:38 UTC 2020

On 14/06/2020 10:51, Kim Barrett wrote:

 >
 > I think having a class for atomic members and variables is good, using
 > the compiler to help keep us honest about what we're doing. So I
 > generally like this change, despite the number of comments below.
 >
 > First a few general comments, then a bunch of detailed comments on
 > specific changes.
 >
 > 
------------------------------------------------------------------------------
 >
 > I kind of wish this had been broken up into smaller chunks.  Looking
 > at a large number of changes involving atomic operations all in one go
 > is pretty tiring.
 >

I could have broken it down in smaller chunks, sorry. I wanted to show 
it as a whole.

 > 
------------------------------------------------------------------------------
 >
 > I hope that this will not be a G1-specific usage, nor even a
 > GC-specific usage, but rather will eventually be used throughout
 > HotSpot. If not, then I'm not sure this change should be made; I
 > wouldn't like having the differing styles of volatile members in some
 > code and AtomicValue in other code (other than as a temporary state as
 > changes are incrementally made). (That doesn't mean that I think all
 > atomic operations should involve AtomicValue, and the static Atomic
 > functions are no longer needed. I think there are some places where
 > forcing things into the former mold is problematic and the Atomic
 > functions will still have direct uses.)

I have a change waiting for zgc as well. I want feedback on this review 
before sending that change out. I am planning to fix this for parallel 
and serial as well, but I do not plan to do anything outside the "gc" 
folder.

 >
 > 
------------------------------------------------------------------------------
 >
 > The name of the class has generated some discussion. (Full disclosure:
 > I suggested AtomicValue.) "AtomicValue" follows the style guide of
 > using CamelCase for type names. But there are lots of
 > counter-examples. "Atomic" is a bit strange, since then we'd have a
 > class that is both a thin wrapper with member functions and an
 > AllStatic with similarly named static member functions taking an
 > additional explicit argument. The alternative would be to give this
 > class the "Atomic" name and rename the existing "Atomic", but that
 > would be pretty disruptive. atomic is short and aligns nicely with
 > Atomic, but names that differ only in case may be considered a
 > problem.  Brevity doesn't seem super critical here; the type name is
 > mostly used as the type for data members and global variables, and
 > rarely appears in function signatures.

The name is okay with me, I am also open for another name (it is hard to 
get something everyone likes).

 > 
------------------------------------------------------------------------------
 >
 > There are a number of places where plain arithmetic was being
 > performed on relevant variables (mostly pre/post increment/decrement).
 > I think most of these were intentionally not using Atomic arithmetic,
 > because they are in single-threaded (mutex or safepoint) contexts.
 >
 > This change currently transforms those all into atomic inc/add/&etc.
 > I've been waffling about this, and have noted occurrences in the
 > detailed comments below. There is a (probably often negligable, though
 > maybe not always) performance cost to that transformation, but that's
 > not what concerns me. When I see an atomic operation, I expect there's
 > something interesting happening and ordering or atomicity is
 > important. Using atomic operations unnecessarily breaks that
 > expectation, making the code less clear. That doesn't mean we should
 > use pre/post increment/decrement operators and such for "non-atomic"
 > uses though; we could have other names that are more explicit. (Note
 > that atomic arithmetic operations with relaxed memory order are not
 > the same thing as non-atomic arithmetic operations.)

Its a trade-off, when you see the atomic operations you might (wrongly) 
believe the atomicity being important, but if you do not see the atomic 
operations you get the urge to prove that the code is only executed in 
serial phases. If speed is a concern you can always load it into a local 
variable, do calculations and store it back in the AtomicValue.

 >
 > Also note that during review I found one place where I think the use
 > of non-atomic arithmetic may have been a mistake, in which case the
 > change to atomic arithmetic would be correct and necessary.
 >
 > 
------------------------------------------------------------------------------
 > src/hotspot/share/runtime/atomic.hpp
 >    42 #include "utilities/debug.hpp"
 >
 > Out of order.

Will fix.

 >
 > 
------------------------------------------------------------------------------
 > src/hotspot/share/runtime/atomic.hpp
 >   434 template<typename D, typename T, typename PlatformOp>
 >   435 struct Atomic::StoreImpl<
 >   436   D, T,
 >   437   PlatformOp,
 >   438   typename EnableIf<(IsIntegral<D>::value || 
IsRegisteredEnum<D>::value) && (IsIntegral<T>::value || 
IsRegisteredEnum<T>::value)>::type>
 >
 > ...
 >   442     D value = new_value;
 >
 > Allowing implicit conversions here was discussed and rejected during
 > the templatization of Atomic.  While I think allowing some conversions
 > here would be good, I think this is much too loose.  I think a change
 > of this type should be dealt with separately.
 >
 > Also, the line length for the EnableIf line seems rather excessive to me.

I will revert this change, and add casts/1u etc to make everything compile.

 >
 > 
------------------------------------------------------------------------------
 > src/hotspot/share/runtime/atomic.hpp
 >   943   AtomicValue(const AtomicValue<D>& a); // = delete
 >
 > I think copy-assign should also be forbidden.  Why not use NONCOPYABLE?

Will fix.

 > 
------------------------------------------------------------------------------
 > src/hotspot/share/runtime/atomic.hpp
 >   946     STATIC_ASSERT(sizeof(AtomicValue<D>) == sizeof(D)); // 
ensure the same memory layout
 >
 > I think there are more requirements than same-size.
 > But maybe they fall out of usage, since the Atomic operations will
 > provide the additional requirements.

If you have a suggestion, I will look at it.

 > 
------------------------------------------------------------------------------
 > src/hotspot/share/runtime/atomic.hpp
 >   949   inline void inc(atomic_memory_order 
order=memory_order_conservative) {
 >
 > I think there should be whitespace around "=".

Will fix.

 > 
------------------------------------------------------------------------------
 > src/hotspot/share/runtime/atomic.hpp
 >   940 class AtomicValue {
 >
 > I think I'd like the order of the load/store operations rearranged:
 >
 > load, load_acquire
 > store, release_store, release_store_fence
 >
 > I also think I'd like inc/dec near add/sub.2

Will try that.

 >
 > 
------------------------------------------------------------------------------
 > src/hotspot/share/runtime/atomic.hpp
 > 1007   template<typename T>
 > 1008   static AtomicValue<T>* cast_to_atomic_ptr(T* nonatomic);
 >
 > Declared but not defined.

Refactoring artifact, will remove, thanks for finding it.

 > 
------------------------------------------------------------------------------
 > src/hotspot/share/runtime/atomic.hpp
 > 1011 template<typename T>
 > 1012 static AtomicValue<T>* cast_to_atomic_ptr(T* nonatomic) {
 >
 > I would really like to not have this.  I'd like to see how far we can
 > get without it.

I try to use it as little as possible.

 > 
------------------------------------------------------------------------------
 > src/hotspot/share/runtime/atomic.hpp
 >   940 class AtomicValue {
 >
 > Should it be default constructible? If so, should the default
 > constructor guarantee value initialization? I think having a trivial
 > default constructor is needed to avoid UB in some uses. See discussion
 > of G1BOT's _offset_array below.

I prefer not having it default constructable if I do not have to. Better 
to force the user to set the value IMO, but if it creates UB I will have 
to think again

 > 
------------------------------------------------------------------------------
 > src/hotspot/share/runtime/atomic.hpp
 >   957   inline D load() const {
 >
 >   961   template <typename T>
 >   962   inline void store(T store_value) {
 >
 > I wonder if load_relaxed and relaxed_store or something like that
 > might be better names?

I am convinced that following the names of the AllStatic Atomic is best. 
I have no opinion about that name though, but if that is to be changed, 
it is better to do it as a separate change IMO.

 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/g1BlockOffsetTable.cpp
 >   45   _offset_array = cast_to_atomic_ptr((u_char*) 
bot_reserved.start());
 >
 > This is currently the only use of cast_to_atomic_ptr. It leads to UB
 > since the (non-trivial) constructor for AtomicValue is never called
 > for any of the objects in the array. I think the call to
 > memset_with_concurrent_readers is also UB. I *think* having a trivial
 > default constructor solves the UB problem, but that isn't possible
 > until C++11.

I will have to look further into this...

 >
 > Rather than just casting to a pointer to AtomicValue<>, this could
 > iterate over the array memory, doing placement new construction of the
 > elements.  If we gave AtomicValue<> a non-trivial but empty default
 > constructor (so not initializing the value), then such an iteration
 > should be optimized away.  We'd still have the UB from memset_with_cr,
 > though that could use an iteration with the one-arg constructor instead.
 > Or ignore the technical UB and just keep using memset_with_cr.
 >
 > And that gets rid of the one use (so far) of cast_to_atomic_ptr.

But it is useful to have until all code is fixed, and fixing all code at 
the same time is hard. I do not want to have UB, so I have to look 
further into this

 >
 > The problem, of course, is that having a trivial/empty default
 > constructor means one could unintentionally construct without
 > initialization. I think a private trivial default constructor would
 > deal with that, though again that requires C++11.
 >
 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/g1CodeCacheRemSet.cpp
 >   39 AtomicValue<G1CodeRootSetTable*> 
G1CodeRootSetTable::_purge_list(NULL);
 >
 > Extra space between type and name.

Will fix.

 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/g1CollectedHeap.cpp
 > 2023       _old_marking_cycles_started.load() == 
_old_marking_cycles_completed.load() + 1,
 >
 > Mis-indented.

Will fix.

 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/g1CollectedHeap.cpp
 > 2027   _old_marking_cycles_started.inc();
 >
 > This was _old_marking_cycles_started++. The membars introduced by
 > using inc() aren't needed here.  It is sufficient here to use
 > _old_marking_cycles_started.store(_old_marking_cycles_started.load() + 1)
 > This is the only write, and it's in the vmthread in a safepoint.
 >
 > Similarly in increment_old_marking_cycles_completed, except here we
 > have the only write serialized by holding G1OldGCCount_lock.
 >
 > I also think _old_marking_cycles_started might not need any special
 > handling at all.
 >
 > The additional membars don't really matter for performance, since
 > these operations are very rare.  But they catch they eye and make one
 > wonder what is being ordered here and why.  Maybe some commentary
 > would suffice?

I kind of feel the opposite. If I see an atomically incremented 
variable, it feels fine (although maybe not as fast as possible), if I 
see a store(load() + 1) it feels unsafe and that it could be a potential 
bug, and I must check that the code is only used in serial contexts.

Not sure what to prefer here.

 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/g1CollectedHeap.cpp
 > 4677 void G1CollectedHeap::increase_used(size_t bytes) {
 > 4681 void G1CollectedHeap::decrease_used(size_t bytes) {
 >
 > I think these functions are mutexed, so the additional membars from
 > using AtomicValue<>::add/sub aren't needed.

Do you think there is a performance problem? Otherwise I would like to 
keep all operations atomic.

 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/g1CollectionSet.cpp
 > 129 
_collection_set_regions[_collection_set_cur_length.fetch_and_add((size_t)1)] 
= hr->hrm_index();
 >
 > I think unnecessary membars.
 >
 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/g1CollectionSet.cpp
 > 191   size_t len = _collection_set_cur_length.load();
 > 192   OrderAccess::loadload();
 >
 > This could just be a load_acquire().
 >
 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/g1CollectionSet.cpp
 > 332   _collection_set_cur_length.inc();
 >
 > I think unnecessary membars.
 >
 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/g1ConcurrentMark.cpp
 >   179   _chunks_in_chunk_list.inc();
 >   199     _chunks_in_chunk_list.dec();
 >
 > I think unnecessary membars.
 >
 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/g1ConcurrentMark.hpp
 > 466   AtomicValue<HeapWord*>* _top_at_rebuild_starts;
 >
 > [pre-existing]
 > Not sure, but I think this doesn't need to be atomic / didn't need to
 > have a volatile qualifier in there.
 >
 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/g1FullGCAdjustTask.hpp
 > 42   AtomicValue<uint>             _references_done; // Atomic 
counter / bool
 >
 > Member name no longer aligned with others adjacent.
 >

Will fix.

 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/g1ParallelCleaning.hpp
 > 32   AtomicValue<int>       _cleaning_claimed;
 >
 > [pre-existing]
 > This looks like it ought to be bool rather than int, and change values
 > from 1/0 to true/false.
 >

I rather fix this as a separate issue: 
https://bugs.openjdk.java.net/browse/JDK-8248149

 >------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/g1RegionsOnNodes.cpp
 > 44     _count_per_node[node_index].inc();
 >
 > Previously not atomic; was that a bug?  If not a bug, then not clear
 > why these array elements were volatile.
 >

I do not know.

 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/g1SurvivorRegions.cpp
 > 67   _used_bytes.add(used_bytes);
 >
 > I think unnecessary membars.
 >
 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/heapRegionManager.cpp
 > 663   for (uint i = 0; i < _n_regions; ++i) {
 > 664     new (&_claims[i]) AtomicValue<uint>(Unclaimed);
 > 665   }
 >
 > Even if AtomicValue has a trivial default destructor (as discussed
 > above) to avoid UB when using memset, this change is an improvement.
 > The old memset code only works because Unclaimed == 0.
 >
 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/heapRegionRemSet.cpp
 >   54       fl = _free_list.load();
 >
 > `fl = res;` instead?
 >
 > 
------------------------------------------------------------------------------
 > src/hotspot/share/gc/g1/heapRegionType.hpp
 >   94   AtomicValue<Tag> _tag;
 >
 > It's not immediately obvious why _tag needs to be atomic. Unless I've
 > missed something, there don't appear to be anything other than relaxed
 > load/store operations.  It's not even obvious that those might occur
 > concurrently.
 >
 > 
------------------------------------------------------------------------------
 >

As a general reply to the questions about unnecessary membars and 
atomicity: I think it is good to just change everything to use atomics 
even if it is too conservative. I believe this change to be quite big 
(as you noted), and you thought I should break it down into smaller 
parts. Why not simply do the changes, most of which are quite 
conservative, let me take a look at the UB issues that you found, and 
then later after this change is committed anyone can remove atomicity if 
it bothers them?

For me it is not obvious when I can use volatile (therefore this 
change). I know it is used to force the compiler not to optimise away 
load and stores, but whether I can --- portably --- rely on (even) load 
and stores not to word tear is beyond my knowledge. Therefore, I think 
it is better to be conservative, and fix performance problems if they 
appear. I did not notice any visible performance degradations.

Also, if we do smaller optimization changes from something conservative 
it is easier to revert them, rather than to revert this big change or to 
do small fixes on it.

Thanks,
Leo