RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot

Thu Oct 12 23:17:38 UTC 2017

> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote:
> 
> Summary: With the new template functions these are unnecessary.
> 
> The changes are mostly s/_ptr// and removing the cast to return type.  There weren't many types that needed to be improved to match the template version of the function.   Some notes:
> 1. replaced CASPTR with Atomic::cmpxchg() in mutex.cpp, rearranging arguments.
> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null.  I disliked the first name because it's not explicit from the callers that there's an underlying cas.  If people want to fight, I'll remove the function and use cmpxchg because there are only a couple places where this is a little nicer.
> 3. Added Atomic::sub()
> 
> Tested with JPRT, mach5 tier1-5 on linux,windows and solaris.
> 
> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.01/webrev
> bug link https://bugs.openjdk.java.net/browse/JDK-8188220
> 
> Thanks,
> Coleen

I looked harder at the potential ABA problems, and believe they are
okay.  There can be multiple threads doing pushes, and there can be
multiple threads doing pops, but not both at the same time.

------------------------------------------------------------------------------
src/hotspot/cpu/zero/cppInterpreter_zero.cpp
 279     if (Atomic::cmpxchg(monitor, lockee->mark_addr(), disp) != disp) {

How does this work?  monitor and disp seem like they have unrelated
types?  Given that this is zero-specific code, maybe this hasn't been
tested?

Similarly here:
 423       if (Atomic::cmpxchg(header, rcvr->mark_addr(), lock) != lock) {

------------------------------------------------------------------------------
src/hotspot/share/asm/assembler.cpp
 239         dcon->value_fn = cfn;

Is it actually safe to remove the atomic update?  If multiple threads
performing the assignment *are* possible (and I don't understand the
context yet, so don't know the answer to that), then a bare non-atomic
assignment is a race, e.g. undefined behavior.

Regardless of that, I think the CAST_FROM_FN_PTR should be retained.

------------------------------------------------------------------------------
src/hotspot/share/classfile/classLoaderData.cpp
 167   Chunk* head = (Chunk*) OrderAccess::load_acquire(&_head);

I think the cast to Chunk* is no longer needed.

------------------------------------------------------------------------------
src/hotspot/share/classfile/classLoaderData.cpp
 946     ClassLoaderData* old = Atomic::cmpxchg(cld, cld_addr, (ClassLoaderData*)NULL);
 947     if (old != NULL) {
 948       delete cld;
 949       // Returns the data.
 950       return old;
 951     }

That could instead be

  if (!Atomic::replace_if_null(cld, cld_addr)) {
    delete cld;           // Lost the race.
    return *cld_addr;     // Use the winner's value.
  }

And apparently the caller of CLDG::add doesn't care whether the
returned CLD has actually been added to the graph yet.  If that's not
true, then there's a bug here, since a race loser might return a
winner's value before the winner has actually done the insertion.

------------------------------------------------------------------------------
src/hotspot/share/classfile/verifier.cpp
  71 static void* verify_byte_codes_fn() {
  72   if (OrderAccess::load_acquire(&_verify_byte_codes_fn) == NULL) {
  73     void *lib_handle = os::native_java_library();
  74     void *func = os::dll_lookup(lib_handle, "VerifyClassCodesForMajorVersion");
  75     OrderAccess::release_store(&_verify_byte_codes_fn, func);
  76     if (func == NULL) {
  77       _is_new_verify_byte_codes_fn = false;
  78       func = os::dll_lookup(lib_handle, "VerifyClassCodes");
  79       OrderAccess::release_store(&_verify_byte_codes_fn, func);
  80     }
  81   }
  82   return (void*)_verify_byte_codes_fn;
  83 }

[pre-existing]

I think this code has race problems; a caller could unexpectedly and
inappropriately return NULL.  Consider the case where there is no
VerifyClassCodesForMajorVersion, but there is VerifyClassCodes.

The variable is initially NULL.

Both Thread1 and Thread2 reach line 73, having both seen a NULL value
for the variable.

Thread1 reaches line 80, setting the variable to VerifyClassCodes.

Thread2 reaches line 76, resetting the variable to NULL.

Thread1 reads the now (momentarily) NULL value and returns it.

I think the first release_store should be conditional on func != NULL.
Also, the usage of _is_new_verify_byte_codes_fn seems suspect.
And a minor additional nit: the cast in the return is unnecessary.

------------------------------------------------------------------------------
src/hotspot/share/code/nmethod.cpp
1664   nmethod* observed_mark_link = _oops_do_mark_link;
1665   if (observed_mark_link == NULL) {
1666     // Claim this nmethod for this thread to mark.
1667     if (Atomic::cmpxchg_if_null(NMETHOD_SENTINEL, &_oops_do_mark_link)) {

With these changes, the only use of observed_mark_link is in the if.
I'm not sure that variable is really useful anymore, e.g. just use

  if (_oops_do_mark_link == NULL) {

------------------------------------------------------------------------------
src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp

In CMSCollector::par_take_from_overflow_list, if BUSY and prefix were
of type oopDesc*, I think there would be a whole lot fewer casts and
cast_to_oop's.  Later on, I think suffix_head, observed_overflow_list,
and curr_overflow_list could also be oopDesc* instead of oop to
eliminate more casts.

And some similar changes in CMSCollector::par_push_on_overflow_list.

And similarly in parNewGeneration.cpp, in push_on_overflow_list and
take_from_overflow_list_work.

As noted in the comments for JDK-8165857, the lists and "objects"
involved here aren't really oops, but rather the shattered remains of
oops.  The suggestion there was to use HeapWord* and carry through the
fanout; what was actually done was to change _overflow_list to
oopDesc* to minimize fanout, even though that's kind of lying to the
type system.  Now, with the cleanup of cmpxchg_ptr and such, we're
paying the price of doing the minimal thing back then.

------------------------------------------------------------------------------
src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp  
7960   Atomic::add(-n, &_num_par_pushes);

Atomic::sub

------------------------------------------------------------------------------
src/hotspot/share/gc/cms/parNewGeneration.cpp
1455   Atomic::add(-n, &_num_par_pushes);

Atomic::sub

------------------------------------------------------------------------------
src/hotspot/share/gc/g1/dirtyCardQueue.cpp
 283     void* actual = Atomic::cmpxchg(next, &_cur_par_buffer_node, nd);
...
 289       nd = static_cast<BufferNode*>(actual);

Change actual's type to BufferNode* and remove the cast on line 289.

------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1CollectedHeap.cpp

[pre-existing]
3499         old = (CompiledMethod*)_postponed_list;

I think that cast is only needed because
G1CodeCacheUnloadingTask::_postponed_list is incorrectly typed as
"volatile CompiledMethod*", when I think it ought to be
"CompiledMethod* volatile".

I think G1CodeCacheUnloading::_claimed_nmethod is similarly mis-typed,
with a similar should not be needed cast:
3530       first = (CompiledMethod*)_claimed_nmethod;

and another for _postponed_list here:
3552       claim = (CompiledMethod*)_postponed_list;

------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1HotCardCache.cpp
  77   jbyte* previous_ptr = (jbyte*)Atomic::cmpxchg(card_ptr,

I think the cast of the cmpxchg result is no longer needed.

------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1PageBasedVirtualSpace.cpp
 254       char* touch_addr = (char*)Atomic::add(actual_chunk_size, &_cur_addr) - actual_chunk_size;

I think the cast of the add result is no longer needed.

------------------------------------------------------------------------------
src/hotspot/share/gc/g1/g1StringDedup.cpp
 213   return (size_t)Atomic::add(partition_size, &_next_bucket) - partition_size;

I think the cast of the add result is no longer needed.

------------------------------------------------------------------------------
src/hotspot/share/gc/g1/heapRegionRemSet.cpp
 200       PerRegionTable* res =
 201         Atomic::cmpxchg(nxt, &_free_list, fl);

Please remove the line break, now that the code has been simplified.

But wait, doesn't this alloc exhibit classic ABA problems?  I *think*
this works because alloc and bulk_free are called in different phases,
never overlapping.   

------------------------------------------------------------------------------
src/hotspot/share/gc/g1/sparsePRT.cpp
 295     SparsePRT* res =
 296       Atomic::cmpxchg(sprt, &_head_expanded_list, hd);
and
 307     SparsePRT* res =
 308       Atomic::cmpxchg(next, &_head_expanded_list, hd);

I'd rather not have the line breaks in these either.

And get_from_expanded_list also appears to have classic ABA problems.
I *think* this works because add_to_expanded_list and
get_from_expanded_list are called in different phases, never
overlapping.

------------------------------------------------------------------------------
src/hotspot/share/gc/shared/taskqueue.inline.hpp
 262   return (size_t) Atomic::cmpxchg((intptr_t)new_age._data,
 263                                   (volatile intptr_t *)&_data,
 264                                   (intptr_t)old_age._data);

This should be

  return Atomic::cmpxchg(new_age._data, &_data, old_age._data);

------------------------------------------------------------------------------
src/hotspot/share/interpreter/bytecodeInterpreter.cpp
This doesn't have any casts, which I think is correct.
 708             if (Atomic::cmpxchg(header, rcvr->mark_addr(), mark) == mark) {

but these do.
 718             if (Atomic::cmpxchg((void*)new_header, rcvr->mark_addr(), mark) == mark) {
 737             if (Atomic::cmpxchg((void*)new_header, rcvr->mark_addr(), header) == header) {

I'm not sure how the ones with casts even compile?  mark_addr() seems
to be a markOop*, which is a markOopDesc**, where markOopDesc is a
class.  void* is not implicitly convertible to markOopDesc*.

Hm, this entire file is #ifdef CC_INTERP.  Is this zero-only code?  Or
something like that?

Similarly here:
 906           if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) {
and
 917           if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) {
 935           if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) {

and here:
1847               if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) {
1858               if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) {
1878               if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) {

and here:
1847               if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) {
1858               if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) {
1878               if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) {

------------------------------------------------------------------------------
src/hotspot/share/memory/metaspace.cpp
1502   size_t value = OrderAccess::load_acquire(&_capacity_until_GC);
...
1537   return (size_t)Atomic::sub((intptr_t)v, &_capacity_until_GC);

These and other uses of _capacity_until_GC suggest that variable's
type should be size_t rather than intptr_t.  Note that I haven't done
a careful check of uses to see if there are any places where such a
change would cause problems.

------------------------------------------------------------------------------
src/hotspot/share/oops/constantPool.cpp
 229   OrderAccess::release_store((Klass* volatile *)adr, k);
 246   OrderAccess::release_store((Klass* volatile *)adr, k);
 514   OrderAccess::release_store((Klass* volatile *)adr, k);

Casts are not needed.

------------------------------------------------------------------------------
src/hotspot/share/oops/constantPool.hpp
 148     volatile intptr_t adr = OrderAccess::load_acquire(obj_at_addr_raw(which));

[pre-existing]  
Why is adr declared volatile?

------------------------------------------------------------------------------
src/hotspot/share/oops/cpCache.cpp
 157     intx newflags = (value & parameter_size_mask);
 158     Atomic::cmpxchg(newflags, &_flags, (intx)0);

This is a nice demonstration of why I wanted to include some value
preserving integral conversions in cmpxchg, rather than requiring
exact type matching in the integral case.  There have been some others
that I haven't commented on.  Apparently we (I) got away with
including such conversions in Atomic::add, which I'd forgotten about.
And see comment regarding Atomic::sub below.

------------------------------------------------------------------------------
src/hotspot/share/oops/cpCache.hpp
 139   volatile Metadata*   _f1;       // entry specific metadata field

[pre-existing]  
I suspect the type should be Metadata* volatile.  And that would
eliminate the need for the cast here:

 339   Metadata* f1_ord() const                       { return (Metadata *)OrderAccess::load_acquire(&_f1); }

I don't know if there are any other changes needed or desirable around
_f1 usage.

------------------------------------------------------------------------------
src/hotspot/share/oops/method.hpp
 139   volatile address from_compiled_entry() const   { return OrderAccess::load_acquire(&_from_compiled_entry); }
 140   volatile address from_compiled_entry_no_trampoline() const;
 141   volatile address from_interpreted_entry() const{ return OrderAccess::load_acquire(&_from_interpreted_entry); }

[pre-existing]  
The volatile qualifiers here seem suspect to me.

------------------------------------------------------------------------------
src/hotspot/share/oops/oop.inline.hpp
 391     narrowOop old = (narrowOop)Atomic::xchg(val, (narrowOop*)dest);

Cast of return type is not needed.

------------------------------------------------------------------------------
src/hotspot/share/prims/jni.cpp

[pre-existing] 

copy_jni_function_table should be using Copy::disjoint_words_atomic.

------------------------------------------------------------------------------
src/hotspot/share/prims/jni.cpp

[pre-existing] 

3892   // We're about to use Atomic::xchg for synchronization.  Some Zero
3893   // platforms use the GCC builtin __sync_lock_test_and_set for this,
3894   // but __sync_lock_test_and_set is not guaranteed to do what we want
3895   // on all architectures.  So we check it works before relying on it.
3896 #if defined(ZERO) && defined(ASSERT)
3897   {
3898     jint a = 0xcafebabe;
3899     jint b = Atomic::xchg(0xdeadbeef, &a);
3900     void *c = &a;
3901     void *d = Atomic::xchg(&b, &c);
3902     assert(a == (jint) 0xdeadbeef && b == (jint) 0xcafebabe, "Atomic::xchg() works");
3903     assert(c == &b && d == &a, "Atomic::xchg() works");
3904   }
3905 #endif // ZERO && ASSERT

It seems rather strange to be testing Atomic::xchg() here, rather than
as part of unit testing Atomic?  Fail unit testing => don't try to
use...

------------------------------------------------------------------------------
src/hotspot/share/prims/jvmtiRawMonitor.cpp
 130     if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) {
 142     if (_owner == NULL && Atomic::cmpxchg_if_null((void*)Self, &_owner)) {

I think these casts aren't needed. _owner is void*, and Self is
Thread*, which is implicitly convertible to void*.

Similarly here, for the THREAD argument:
 280     Contended = Atomic::cmpxchg((void*)THREAD, &_owner, (void*)NULL);
 283     Contended = Atomic::cmpxchg((void*)THREAD, &_owner, (void*)NULL);

------------------------------------------------------------------------------
src/hotspot/share/prims/jvmtiRawMonitor.hpp

This file is in the webrev, but seems to be unchanged.

------------------------------------------------------------------------------
src/hotspot/share/runtime/atomic.hpp
 520 template<typename I, typename D>
 521 inline D Atomic::sub(I sub_value, D volatile* dest) {
 522   STATIC_ASSERT(IsPointer<D>::value || IsIntegral<D>::value);
 523   // Assumes two's complement integer representation.
 524   #pragma warning(suppress: 4146)
 525   return Atomic::add(-sub_value, dest);
 526 }

I'm pretty sure this implementation is incorrect.  I think it produces
the wrong result when I and D are both unsigned integer types and
sizeof(I) < sizeof(D).

------------------------------------------------------------------------------
src/hotspot/share/runtime/mutex.cpp
 304   intptr_t v = Atomic::cmpxchg((intptr_t)_LBIT, &_LockWord.FullWord, (intptr_t)0);  // agro ...

_LBIT should probably be intptr_t, rather than an enum.  Note that the
enum type is unused.  The old value here is another place where an
implicit widening of same signedness would have been nice.  (Such
implicit widening doesn't work for enums, since it's unspecified
whether they default to signed or unsigned representation, and
implementatinos differ.)

------------------------------------------------------------------------------
src/hotspot/share/runtime/mutex.hpp

[pre-existing] 

I think the Address member of the SplitWord union is unused.  Looking
at AcquireOrPush (and others), I'm wondering whether it *should* be
used there, or whether just using intptr_t casts and doing integral
arithmetic (as is presently being done) is easier and clearer.

Also the _LSBINDEX macro probably ought to be defined in mutex.cpp
rather than polluting the global namespace.  And technically, that
name is reserved word.

------------------------------------------------------------------------------
src/hotspot/share/runtime/objectMonitor.cpp 
 252   void * cur = Atomic::cmpxchg((void*)Self, &_owner, (void*)NULL);
 409   if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) {
1983       ox = (Thread*)Atomic::cmpxchg((void*)Self, &_owner, (void*)NULL);

I think the casts of Self aren't needed. 

------------------------------------------------------------------------------
src/hotspot/share/runtime/objectMonitor.cpp 
 995       if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) {
1020         if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) {

I think the casts of THREAD aren't needed. 

------------------------------------------------------------------------------
src/hotspot/share/runtime/objectMonitor.hpp
 254   markOopDesc* volatile* header_addr();

Why isn't this volatile markOop* ?

------------------------------------------------------------------------------
src/hotspot/share/runtime/synchronizer.cpp
 242         Atomic::cmpxchg_if_null((void*)Self, &(m->_owner))) {

I think the cast of Self isn't needed.

------------------------------------------------------------------------------
src/hotspot/share/runtime/synchronizer.cpp
 992   for (; block != NULL; block = (PaddedEnd<ObjectMonitor> *)next(block)) {
1734     for (; block != NULL; block = (PaddedEnd<ObjectMonitor> *)next(block)) {

[pre-existing] 
All calls to next() pass a PaddedEnd<ObjectMonitor>* and cast the
result.  How about moving all that behavior into next().

------------------------------------------------------------------------------
src/hotspot/share/runtime/synchronizer.cpp
1970     if (monitor > (ObjectMonitor *)&block[0] &&
1971         monitor < (ObjectMonitor *)&block[_BLOCKSIZE]) {

[pre-existing] 
Are the casts needed here?  I think PaddedEnd<ObjectMonitor> is
derived from ObjectMonitor, so implicit conversions should apply.

------------------------------------------------------------------------------
src/hotspot/share/runtime/synchronizer.hpp
  28 #include "memory/padded.hpp"
 163   static PaddedEnd<ObjectMonitor> * volatile gBlockList;

I was going to suggest as an alternative just making gBlockList a file
scoped variable in synchronizer.cpp, since it isn't used outside of
that file. Except that it is referenced by vmStructs.  Curses!

------------------------------------------------------------------------------
src/hotspot/share/runtime/thread.cpp
4707   intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, (intptr_t)0);

This and other places suggest LOCKBIT should be defined as intptr_t,
rather than as an enum value.  The MuxBits enum type is unused.

And the cast of 0 is another case where implicit widening would be nice.

------------------------------------------------------------------------------
src/hotspot/share/services/mallocSiteTable.cpp
 261 bool MallocSiteHashtableEntry::atomic_insert(const MallocSiteHashtableEntry* entry) {
 262   return Atomic::cmpxchg_if_null(entry, (const MallocSiteHashtableEntry**)&_next);
 263 }

I think the problem here that is leading to the cast is that
atomic_insert is taking a const T*.  Note that it's only caller passes
a non-const T*.

------------------------------------------------------------------------------