RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot

Fri Oct 13 00:55:53 UTC 2017

Hi Kim,

Very detailed analysis! A few things have already been updated by Coleen.

Many of the issues with possibly incorrect/inappropriate types really 
need to be dealt with separately - they go beyond the basic renaming - 
by their component teams.

Similarly any ABA issues - which are likely non-issues but not clearly 
documented - should be handled separately. And the potential race you 
highlight below - though to be honest I couldn't match your statements 
with the code as shown.

Thanks,
David

On 13/10/2017 9:17 AM, Kim Barrett wrote:
>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote:
>>
>> Summary: With the new template functions these are unnecessary.
>>
>> The changes are mostly s/_ptr// and removing the cast to return type.  There weren't many types that needed to be improved to match the template version of the function.   Some notes:
>> 1. replaced CASPTR with Atomic::cmpxchg() in mutex.cpp, rearranging arguments.
>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null.  I disliked the first name because it's not explicit from the callers that there's an underlying cas.  If people want to fight, I'll remove the function and use cmpxchg because there are only a couple places where this is a little nicer.
>> 3. Added Atomic::sub()
>>
>> Tested with JPRT, mach5 tier1-5 on linux,windows and solaris.
>>
>> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.01/webrev
>> bug link https://bugs.openjdk.java.net/browse/JDK-8188220
>>
>> Thanks,
>> Coleen
> 
> I looked harder at the potential ABA problems, and believe they are
> okay.  There can be multiple threads doing pushes, and there can be
> multiple threads doing pops, but not both at the same time.
> 
> ------------------------------------------------------------------------------
> src/hotspot/cpu/zero/cppInterpreter_zero.cpp
>   279     if (Atomic::cmpxchg(monitor, lockee->mark_addr(), disp) != disp) {
> 
> How does this work?  monitor and disp seem like they have unrelated
> types?  Given that this is zero-specific code, maybe this hasn't been
> tested?
> 
> Similarly here:
>   423       if (Atomic::cmpxchg(header, rcvr->mark_addr(), lock) != lock) {
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/asm/assembler.cpp
>   239         dcon->value_fn = cfn;
> 
> Is it actually safe to remove the atomic update?  If multiple threads
> performing the assignment *are* possible (and I don't understand the
> context yet, so don't know the answer to that), then a bare non-atomic
> assignment is a race, e.g. undefined behavior.
> 
> Regardless of that, I think the CAST_FROM_FN_PTR should be retained.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/classfile/classLoaderData.cpp
>   167   Chunk* head = (Chunk*) OrderAccess::load_acquire(&_head);
> 
> I think the cast to Chunk* is no longer needed.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/classfile/classLoaderData.cpp
>   946     ClassLoaderData* old = Atomic::cmpxchg(cld, cld_addr, (ClassLoaderData*)NULL);
>   947     if (old != NULL) {
>   948       delete cld;
>   949       // Returns the data.
>   950       return old;
>   951     }
> 
> That could instead be
> 
>    if (!Atomic::replace_if_null(cld, cld_addr)) {
>      delete cld;           // Lost the race.
>      return *cld_addr;     // Use the winner's value.
>    }
> 
> And apparently the caller of CLDG::add doesn't care whether the
> returned CLD has actually been added to the graph yet.  If that's not
> true, then there's a bug here, since a race loser might return a
> winner's value before the winner has actually done the insertion.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/classfile/verifier.cpp
>    71 static void* verify_byte_codes_fn() {
>    72   if (OrderAccess::load_acquire(&_verify_byte_codes_fn) == NULL) {
>    73     void *lib_handle = os::native_java_library();
>    74     void *func = os::dll_lookup(lib_handle, "VerifyClassCodesForMajorVersion");
>    75     OrderAccess::release_store(&_verify_byte_codes_fn, func);
>    76     if (func == NULL) {
>    77       _is_new_verify_byte_codes_fn = false;
>    78       func = os::dll_lookup(lib_handle, "VerifyClassCodes");
>    79       OrderAccess::release_store(&_verify_byte_codes_fn, func);
>    80     }
>    81   }
>    82   return (void*)_verify_byte_codes_fn;
>    83 }
> 
> [pre-existing]
> 
> I think this code has race problems; a caller could unexpectedly and
> inappropriately return NULL.  Consider the case where there is no
> VerifyClassCodesForMajorVersion, but there is VerifyClassCodes.
> 
> The variable is initially NULL.
> 
> Both Thread1 and Thread2 reach line 73, having both seen a NULL value
> for the variable.
> 
> Thread1 reaches line 80, setting the variable to VerifyClassCodes.
> 
> Thread2 reaches line 76, resetting the variable to NULL.
> 
> Thread1 reads the now (momentarily) NULL value and returns it.
> 
> I think the first release_store should be conditional on func != NULL.
> Also, the usage of _is_new_verify_byte_codes_fn seems suspect.
> And a minor additional nit: the cast in the return is unnecessary.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/code/nmethod.cpp
> 1664   nmethod* observed_mark_link = _oops_do_mark_link;
> 1665   if (observed_mark_link == NULL) {
> 1666     // Claim this nmethod for this thread to mark.
> 1667     if (Atomic::cmpxchg_if_null(NMETHOD_SENTINEL, &_oops_do_mark_link)) {
> 
> With these changes, the only use of observed_mark_link is in the if.
> I'm not sure that variable is really useful anymore, e.g. just use
> 
>    if (_oops_do_mark_link == NULL) {
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp
> 
> In CMSCollector::par_take_from_overflow_list, if BUSY and prefix were
> of type oopDesc*, I think there would be a whole lot fewer casts and
> cast_to_oop's.  Later on, I think suffix_head, observed_overflow_list,
> and curr_overflow_list could also be oopDesc* instead of oop to
> eliminate more casts.
> 
> And some similar changes in CMSCollector::par_push_on_overflow_list.
> 
> And similarly in parNewGeneration.cpp, in push_on_overflow_list and
> take_from_overflow_list_work.
> 
> As noted in the comments for JDK-8165857, the lists and "objects"
> involved here aren't really oops, but rather the shattered remains of
> oops.  The suggestion there was to use HeapWord* and carry through the
> fanout; what was actually done was to change _overflow_list to
> oopDesc* to minimize fanout, even though that's kind of lying to the
> type system.  Now, with the cleanup of cmpxchg_ptr and such, we're
> paying the price of doing the minimal thing back then.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp
> 7960   Atomic::add(-n, &_num_par_pushes);
> 
> Atomic::sub
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/cms/parNewGeneration.cpp
> 1455   Atomic::add(-n, &_num_par_pushes);
> 
> Atomic::sub
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/dirtyCardQueue.cpp
>   283     void* actual = Atomic::cmpxchg(next, &_cur_par_buffer_node, nd);
> ...
>   289       nd = static_cast<BufferNode*>(actual);
> 
> Change actual's type to BufferNode* and remove the cast on line 289.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1CollectedHeap.cpp
> 
> [pre-existing]
> 3499         old = (CompiledMethod*)_postponed_list;
> 
> I think that cast is only needed because
> G1CodeCacheUnloadingTask::_postponed_list is incorrectly typed as
> "volatile CompiledMethod*", when I think it ought to be
> "CompiledMethod* volatile".
> 
> I think G1CodeCacheUnloading::_claimed_nmethod is similarly mis-typed,
> with a similar should not be needed cast:
> 3530       first = (CompiledMethod*)_claimed_nmethod;
> 
> and another for _postponed_list here:
> 3552       claim = (CompiledMethod*)_postponed_list;
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1HotCardCache.cpp
>    77   jbyte* previous_ptr = (jbyte*)Atomic::cmpxchg(card_ptr,
> 
> I think the cast of the cmpxchg result is no longer needed.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1PageBasedVirtualSpace.cpp
>   254       char* touch_addr = (char*)Atomic::add(actual_chunk_size, &_cur_addr) - actual_chunk_size;
> 
> I think the cast of the add result is no longer needed.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1StringDedup.cpp
>   213   return (size_t)Atomic::add(partition_size, &_next_bucket) - partition_size;
> 
> I think the cast of the add result is no longer needed.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/heapRegionRemSet.cpp
>   200       PerRegionTable* res =
>   201         Atomic::cmpxchg(nxt, &_free_list, fl);
> 
> Please remove the line break, now that the code has been simplified.
> 
> But wait, doesn't this alloc exhibit classic ABA problems?  I *think*
> this works because alloc and bulk_free are called in different phases,
> never overlapping.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/sparsePRT.cpp
>   295     SparsePRT* res =
>   296       Atomic::cmpxchg(sprt, &_head_expanded_list, hd);
> and
>   307     SparsePRT* res =
>   308       Atomic::cmpxchg(next, &_head_expanded_list, hd);
> 
> I'd rather not have the line breaks in these either.
> 
> And get_from_expanded_list also appears to have classic ABA problems.
> I *think* this works because add_to_expanded_list and
> get_from_expanded_list are called in different phases, never
> overlapping.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskqueue.inline.hpp
>   262   return (size_t) Atomic::cmpxchg((intptr_t)new_age._data,
>   263                                   (volatile intptr_t *)&_data,
>   264                                   (intptr_t)old_age._data);
> 
> This should be
> 
>    return Atomic::cmpxchg(new_age._data, &_data, old_age._data);
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/interpreter/bytecodeInterpreter.cpp
> This doesn't have any casts, which I think is correct.
>   708             if (Atomic::cmpxchg(header, rcvr->mark_addr(), mark) == mark) {
> 
> but these do.
>   718             if (Atomic::cmpxchg((void*)new_header, rcvr->mark_addr(), mark) == mark) {
>   737             if (Atomic::cmpxchg((void*)new_header, rcvr->mark_addr(), header) == header) {
> 
> I'm not sure how the ones with casts even compile?  mark_addr() seems
> to be a markOop*, which is a markOopDesc**, where markOopDesc is a
> class.  void* is not implicitly convertible to markOopDesc*.
> 
> Hm, this entire file is #ifdef CC_INTERP.  Is this zero-only code?  Or
> something like that?
> 
> Similarly here:
>   906           if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) {
> and
>   917           if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) {
>   935           if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) {
> 
> and here:
> 1847               if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) {
> 1858               if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) {
> 1878               if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) {
> 
> and here:
> 1847               if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) {
> 1858               if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) {
> 1878               if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) {
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/memory/metaspace.cpp
> 1502   size_t value = OrderAccess::load_acquire(&_capacity_until_GC);
> ...
> 1537   return (size_t)Atomic::sub((intptr_t)v, &_capacity_until_GC);
> 
> These and other uses of _capacity_until_GC suggest that variable's
> type should be size_t rather than intptr_t.  Note that I haven't done
> a careful check of uses to see if there are any places where such a
> change would cause problems.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/oops/constantPool.cpp
>   229   OrderAccess::release_store((Klass* volatile *)adr, k);
>   246   OrderAccess::release_store((Klass* volatile *)adr, k);
>   514   OrderAccess::release_store((Klass* volatile *)adr, k);
> 
> Casts are not needed.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/oops/constantPool.hpp
>   148     volatile intptr_t adr = OrderAccess::load_acquire(obj_at_addr_raw(which));
> 
> [pre-existing]
> Why is adr declared volatile?
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/oops/cpCache.cpp
>   157     intx newflags = (value & parameter_size_mask);
>   158     Atomic::cmpxchg(newflags, &_flags, (intx)0);
> 
> This is a nice demonstration of why I wanted to include some value
> preserving integral conversions in cmpxchg, rather than requiring
> exact type matching in the integral case.  There have been some others
> that I haven't commented on.  Apparently we (I) got away with
> including such conversions in Atomic::add, which I'd forgotten about.
> And see comment regarding Atomic::sub below.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/oops/cpCache.hpp
>   139   volatile Metadata*   _f1;       // entry specific metadata field
> 
> [pre-existing]
> I suspect the type should be Metadata* volatile.  And that would
> eliminate the need for the cast here:
> 
>   339   Metadata* f1_ord() const                       { return (Metadata *)OrderAccess::load_acquire(&_f1); }
> 
> I don't know if there are any other changes needed or desirable around
> _f1 usage.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/oops/method.hpp
>   139   volatile address from_compiled_entry() const   { return OrderAccess::load_acquire(&_from_compiled_entry); }
>   140   volatile address from_compiled_entry_no_trampoline() const;
>   141   volatile address from_interpreted_entry() const{ return OrderAccess::load_acquire(&_from_interpreted_entry); }
> 
> [pre-existing]
> The volatile qualifiers here seem suspect to me.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/oops/oop.inline.hpp
>   391     narrowOop old = (narrowOop)Atomic::xchg(val, (narrowOop*)dest);
> 
> Cast of return type is not needed.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/prims/jni.cpp
> 
> [pre-existing]
> 
> copy_jni_function_table should be using Copy::disjoint_words_atomic.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/prims/jni.cpp
> 
> [pre-existing]
> 
> 3892   // We're about to use Atomic::xchg for synchronization.  Some Zero
> 3893   // platforms use the GCC builtin __sync_lock_test_and_set for this,
> 3894   // but __sync_lock_test_and_set is not guaranteed to do what we want
> 3895   // on all architectures.  So we check it works before relying on it.
> 3896 #if defined(ZERO) && defined(ASSERT)
> 3897   {
> 3898     jint a = 0xcafebabe;
> 3899     jint b = Atomic::xchg(0xdeadbeef, &a);
> 3900     void *c = &a;
> 3901     void *d = Atomic::xchg(&b, &c);
> 3902     assert(a == (jint) 0xdeadbeef && b == (jint) 0xcafebabe, "Atomic::xchg() works");
> 3903     assert(c == &b && d == &a, "Atomic::xchg() works");
> 3904   }
> 3905 #endif // ZERO && ASSERT
> 
> It seems rather strange to be testing Atomic::xchg() here, rather than
> as part of unit testing Atomic?  Fail unit testing => don't try to
> use...
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/prims/jvmtiRawMonitor.cpp
>   130     if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) {
>   142     if (_owner == NULL && Atomic::cmpxchg_if_null((void*)Self, &_owner)) {
> 
> I think these casts aren't needed. _owner is void*, and Self is
> Thread*, which is implicitly convertible to void*.
> 
> Similarly here, for the THREAD argument:
>   280     Contended = Atomic::cmpxchg((void*)THREAD, &_owner, (void*)NULL);
>   283     Contended = Atomic::cmpxchg((void*)THREAD, &_owner, (void*)NULL);
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/prims/jvmtiRawMonitor.hpp
> 
> This file is in the webrev, but seems to be unchanged.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/atomic.hpp
>   520 template<typename I, typename D>
>   521 inline D Atomic::sub(I sub_value, D volatile* dest) {
>   522   STATIC_ASSERT(IsPointer<D>::value || IsIntegral<D>::value);
>   523   // Assumes two's complement integer representation.
>   524   #pragma warning(suppress: 4146)
>   525   return Atomic::add(-sub_value, dest);
>   526 }
> 
> I'm pretty sure this implementation is incorrect.  I think it produces
> the wrong result when I and D are both unsigned integer types and
> sizeof(I) < sizeof(D).
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/mutex.cpp
>   304   intptr_t v = Atomic::cmpxchg((intptr_t)_LBIT, &_LockWord.FullWord, (intptr_t)0);  // agro ...
> 
> _LBIT should probably be intptr_t, rather than an enum.  Note that the
> enum type is unused.  The old value here is another place where an
> implicit widening of same signedness would have been nice.  (Such
> implicit widening doesn't work for enums, since it's unspecified
> whether they default to signed or unsigned representation, and
> implementatinos differ.)
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/mutex.hpp
> 
> [pre-existing]
> 
> I think the Address member of the SplitWord union is unused.  Looking
> at AcquireOrPush (and others), I'm wondering whether it *should* be
> used there, or whether just using intptr_t casts and doing integral
> arithmetic (as is presently being done) is easier and clearer.
> 
> Also the _LSBINDEX macro probably ought to be defined in mutex.cpp
> rather than polluting the global namespace.  And technically, that
> name is reserved word.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/objectMonitor.cpp
>   252   void * cur = Atomic::cmpxchg((void*)Self, &_owner, (void*)NULL);
>   409   if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) {
> 1983       ox = (Thread*)Atomic::cmpxchg((void*)Self, &_owner, (void*)NULL);
> 
> I think the casts of Self aren't needed.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/objectMonitor.cpp
>   995       if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) {
> 1020         if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) {
> 
> I think the casts of THREAD aren't needed.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/objectMonitor.hpp
>   254   markOopDesc* volatile* header_addr();
> 
> Why isn't this volatile markOop* ?
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/synchronizer.cpp
>   242         Atomic::cmpxchg_if_null((void*)Self, &(m->_owner))) {
> 
> I think the cast of Self isn't needed.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/synchronizer.cpp
>   992   for (; block != NULL; block = (PaddedEnd<ObjectMonitor> *)next(block)) {
> 1734     for (; block != NULL; block = (PaddedEnd<ObjectMonitor> *)next(block)) {
> 
> [pre-existing]
> All calls to next() pass a PaddedEnd<ObjectMonitor>* and cast the
> result.  How about moving all that behavior into next().
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/synchronizer.cpp
> 1970     if (monitor > (ObjectMonitor *)&block[0] &&
> 1971         monitor < (ObjectMonitor *)&block[_BLOCKSIZE]) {
> 
> [pre-existing]
> Are the casts needed here?  I think PaddedEnd<ObjectMonitor> is
> derived from ObjectMonitor, so implicit conversions should apply.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/synchronizer.hpp
>    28 #include "memory/padded.hpp"
>   163   static PaddedEnd<ObjectMonitor> * volatile gBlockList;
> 
> I was going to suggest as an alternative just making gBlockList a file
> scoped variable in synchronizer.cpp, since it isn't used outside of
> that file. Except that it is referenced by vmStructs.  Curses!
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/thread.cpp
> 4707   intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, (intptr_t)0);
> 
> This and other places suggest LOCKBIT should be defined as intptr_t,
> rather than as an enum value.  The MuxBits enum type is unused.
> 
> And the cast of 0 is another case where implicit widening would be nice.
> 
> ------------------------------------------------------------------------------
> src/hotspot/share/services/mallocSiteTable.cpp
>   261 bool MallocSiteHashtableEntry::atomic_insert(const MallocSiteHashtableEntry* entry) {
>   262   return Atomic::cmpxchg_if_null(entry, (const MallocSiteHashtableEntry**)&_next);
>   263 }
> 
> I think the problem here that is leading to the cast is that
> atomic_insert is taking a const T*.  Note that it's only caller passes
> a non-const T*.
> 
> ------------------------------------------------------------------------------
>