RFR (L, but tedious) 8188220: Remove Atomic::*_ptr() uses and overloads from hotspot
David Holmes
david.holmes at oracle.com
Fri Oct 13 00:55:53 UTC 2017
Hi Kim,
Very detailed analysis! A few things have already been updated by Coleen.
Many of the issues with possibly incorrect/inappropriate types really
need to be dealt with separately - they go beyond the basic renaming -
by their component teams.
Similarly any ABA issues - which are likely non-issues but not clearly
documented - should be handled separately. And the potential race you
highlight below - though to be honest I couldn't match your statements
with the code as shown.
Thanks,
David
On 13/10/2017 9:17 AM, Kim Barrett wrote:
>> On Oct 10, 2017, at 6:01 PM, coleen.phillimore at oracle.com wrote:
>>
>> Summary: With the new template functions these are unnecessary.
>>
>> The changes are mostly s/_ptr// and removing the cast to return type. There weren't many types that needed to be improved to match the template version of the function. Some notes:
>> 1. replaced CASPTR with Atomic::cmpxchg() in mutex.cpp, rearranging arguments.
>> 2. renamed Atomic::replace_if_null to Atomic::cmpxchg_if_null. I disliked the first name because it's not explicit from the callers that there's an underlying cas. If people want to fight, I'll remove the function and use cmpxchg because there are only a couple places where this is a little nicer.
>> 3. Added Atomic::sub()
>>
>> Tested with JPRT, mach5 tier1-5 on linux,windows and solaris.
>>
>> open webrev at http://cr.openjdk.java.net/~coleenp/8188220.01/webrev
>> bug link https://bugs.openjdk.java.net/browse/JDK-8188220
>>
>> Thanks,
>> Coleen
>
> I looked harder at the potential ABA problems, and believe they are
> okay. There can be multiple threads doing pushes, and there can be
> multiple threads doing pops, but not both at the same time.
>
> ------------------------------------------------------------------------------
> src/hotspot/cpu/zero/cppInterpreter_zero.cpp
> 279 if (Atomic::cmpxchg(monitor, lockee->mark_addr(), disp) != disp) {
>
> How does this work? monitor and disp seem like they have unrelated
> types? Given that this is zero-specific code, maybe this hasn't been
> tested?
>
> Similarly here:
> 423 if (Atomic::cmpxchg(header, rcvr->mark_addr(), lock) != lock) {
>
> ------------------------------------------------------------------------------
> src/hotspot/share/asm/assembler.cpp
> 239 dcon->value_fn = cfn;
>
> Is it actually safe to remove the atomic update? If multiple threads
> performing the assignment *are* possible (and I don't understand the
> context yet, so don't know the answer to that), then a bare non-atomic
> assignment is a race, e.g. undefined behavior.
>
> Regardless of that, I think the CAST_FROM_FN_PTR should be retained.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/classfile/classLoaderData.cpp
> 167 Chunk* head = (Chunk*) OrderAccess::load_acquire(&_head);
>
> I think the cast to Chunk* is no longer needed.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/classfile/classLoaderData.cpp
> 946 ClassLoaderData* old = Atomic::cmpxchg(cld, cld_addr, (ClassLoaderData*)NULL);
> 947 if (old != NULL) {
> 948 delete cld;
> 949 // Returns the data.
> 950 return old;
> 951 }
>
> That could instead be
>
> if (!Atomic::replace_if_null(cld, cld_addr)) {
> delete cld; // Lost the race.
> return *cld_addr; // Use the winner's value.
> }
>
> And apparently the caller of CLDG::add doesn't care whether the
> returned CLD has actually been added to the graph yet. If that's not
> true, then there's a bug here, since a race loser might return a
> winner's value before the winner has actually done the insertion.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/classfile/verifier.cpp
> 71 static void* verify_byte_codes_fn() {
> 72 if (OrderAccess::load_acquire(&_verify_byte_codes_fn) == NULL) {
> 73 void *lib_handle = os::native_java_library();
> 74 void *func = os::dll_lookup(lib_handle, "VerifyClassCodesForMajorVersion");
> 75 OrderAccess::release_store(&_verify_byte_codes_fn, func);
> 76 if (func == NULL) {
> 77 _is_new_verify_byte_codes_fn = false;
> 78 func = os::dll_lookup(lib_handle, "VerifyClassCodes");
> 79 OrderAccess::release_store(&_verify_byte_codes_fn, func);
> 80 }
> 81 }
> 82 return (void*)_verify_byte_codes_fn;
> 83 }
>
> [pre-existing]
>
> I think this code has race problems; a caller could unexpectedly and
> inappropriately return NULL. Consider the case where there is no
> VerifyClassCodesForMajorVersion, but there is VerifyClassCodes.
>
> The variable is initially NULL.
>
> Both Thread1 and Thread2 reach line 73, having both seen a NULL value
> for the variable.
>
> Thread1 reaches line 80, setting the variable to VerifyClassCodes.
>
> Thread2 reaches line 76, resetting the variable to NULL.
>
> Thread1 reads the now (momentarily) NULL value and returns it.
>
> I think the first release_store should be conditional on func != NULL.
> Also, the usage of _is_new_verify_byte_codes_fn seems suspect.
> And a minor additional nit: the cast in the return is unnecessary.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/code/nmethod.cpp
> 1664 nmethod* observed_mark_link = _oops_do_mark_link;
> 1665 if (observed_mark_link == NULL) {
> 1666 // Claim this nmethod for this thread to mark.
> 1667 if (Atomic::cmpxchg_if_null(NMETHOD_SENTINEL, &_oops_do_mark_link)) {
>
> With these changes, the only use of observed_mark_link is in the if.
> I'm not sure that variable is really useful anymore, e.g. just use
>
> if (_oops_do_mark_link == NULL) {
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp
>
> In CMSCollector::par_take_from_overflow_list, if BUSY and prefix were
> of type oopDesc*, I think there would be a whole lot fewer casts and
> cast_to_oop's. Later on, I think suffix_head, observed_overflow_list,
> and curr_overflow_list could also be oopDesc* instead of oop to
> eliminate more casts.
>
> And some similar changes in CMSCollector::par_push_on_overflow_list.
>
> And similarly in parNewGeneration.cpp, in push_on_overflow_list and
> take_from_overflow_list_work.
>
> As noted in the comments for JDK-8165857, the lists and "objects"
> involved here aren't really oops, but rather the shattered remains of
> oops. The suggestion there was to use HeapWord* and carry through the
> fanout; what was actually done was to change _overflow_list to
> oopDesc* to minimize fanout, even though that's kind of lying to the
> type system. Now, with the cleanup of cmpxchg_ptr and such, we're
> paying the price of doing the minimal thing back then.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/cms/concurrentMarkSweepGeneration.cpp
> 7960 Atomic::add(-n, &_num_par_pushes);
>
> Atomic::sub
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/cms/parNewGeneration.cpp
> 1455 Atomic::add(-n, &_num_par_pushes);
>
> Atomic::sub
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/dirtyCardQueue.cpp
> 283 void* actual = Atomic::cmpxchg(next, &_cur_par_buffer_node, nd);
> ...
> 289 nd = static_cast<BufferNode*>(actual);
>
> Change actual's type to BufferNode* and remove the cast on line 289.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1CollectedHeap.cpp
>
> [pre-existing]
> 3499 old = (CompiledMethod*)_postponed_list;
>
> I think that cast is only needed because
> G1CodeCacheUnloadingTask::_postponed_list is incorrectly typed as
> "volatile CompiledMethod*", when I think it ought to be
> "CompiledMethod* volatile".
>
> I think G1CodeCacheUnloading::_claimed_nmethod is similarly mis-typed,
> with a similar should not be needed cast:
> 3530 first = (CompiledMethod*)_claimed_nmethod;
>
> and another for _postponed_list here:
> 3552 claim = (CompiledMethod*)_postponed_list;
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1HotCardCache.cpp
> 77 jbyte* previous_ptr = (jbyte*)Atomic::cmpxchg(card_ptr,
>
> I think the cast of the cmpxchg result is no longer needed.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1PageBasedVirtualSpace.cpp
> 254 char* touch_addr = (char*)Atomic::add(actual_chunk_size, &_cur_addr) - actual_chunk_size;
>
> I think the cast of the add result is no longer needed.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/g1StringDedup.cpp
> 213 return (size_t)Atomic::add(partition_size, &_next_bucket) - partition_size;
>
> I think the cast of the add result is no longer needed.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/heapRegionRemSet.cpp
> 200 PerRegionTable* res =
> 201 Atomic::cmpxchg(nxt, &_free_list, fl);
>
> Please remove the line break, now that the code has been simplified.
>
> But wait, doesn't this alloc exhibit classic ABA problems? I *think*
> this works because alloc and bulk_free are called in different phases,
> never overlapping.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/g1/sparsePRT.cpp
> 295 SparsePRT* res =
> 296 Atomic::cmpxchg(sprt, &_head_expanded_list, hd);
> and
> 307 SparsePRT* res =
> 308 Atomic::cmpxchg(next, &_head_expanded_list, hd);
>
> I'd rather not have the line breaks in these either.
>
> And get_from_expanded_list also appears to have classic ABA problems.
> I *think* this works because add_to_expanded_list and
> get_from_expanded_list are called in different phases, never
> overlapping.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/gc/shared/taskqueue.inline.hpp
> 262 return (size_t) Atomic::cmpxchg((intptr_t)new_age._data,
> 263 (volatile intptr_t *)&_data,
> 264 (intptr_t)old_age._data);
>
> This should be
>
> return Atomic::cmpxchg(new_age._data, &_data, old_age._data);
>
> ------------------------------------------------------------------------------
> src/hotspot/share/interpreter/bytecodeInterpreter.cpp
> This doesn't have any casts, which I think is correct.
> 708 if (Atomic::cmpxchg(header, rcvr->mark_addr(), mark) == mark) {
>
> but these do.
> 718 if (Atomic::cmpxchg((void*)new_header, rcvr->mark_addr(), mark) == mark) {
> 737 if (Atomic::cmpxchg((void*)new_header, rcvr->mark_addr(), header) == header) {
>
> I'm not sure how the ones with casts even compile? mark_addr() seems
> to be a markOop*, which is a markOopDesc**, where markOopDesc is a
> class. void* is not implicitly convertible to markOopDesc*.
>
> Hm, this entire file is #ifdef CC_INTERP. Is this zero-only code? Or
> something like that?
>
> Similarly here:
> 906 if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) {
> and
> 917 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) {
> 935 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) {
>
> and here:
> 1847 if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) {
> 1858 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) {
> 1878 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) {
>
> and here:
> 1847 if (Atomic::cmpxchg(header, lockee->mark_addr(), mark) == mark) {
> 1858 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), mark) == mark) {
> 1878 if (Atomic::cmpxchg((void*)new_header, lockee->mark_addr(), header) == header) {
>
> ------------------------------------------------------------------------------
> src/hotspot/share/memory/metaspace.cpp
> 1502 size_t value = OrderAccess::load_acquire(&_capacity_until_GC);
> ...
> 1537 return (size_t)Atomic::sub((intptr_t)v, &_capacity_until_GC);
>
> These and other uses of _capacity_until_GC suggest that variable's
> type should be size_t rather than intptr_t. Note that I haven't done
> a careful check of uses to see if there are any places where such a
> change would cause problems.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/oops/constantPool.cpp
> 229 OrderAccess::release_store((Klass* volatile *)adr, k);
> 246 OrderAccess::release_store((Klass* volatile *)adr, k);
> 514 OrderAccess::release_store((Klass* volatile *)adr, k);
>
> Casts are not needed.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/oops/constantPool.hpp
> 148 volatile intptr_t adr = OrderAccess::load_acquire(obj_at_addr_raw(which));
>
> [pre-existing]
> Why is adr declared volatile?
>
> ------------------------------------------------------------------------------
> src/hotspot/share/oops/cpCache.cpp
> 157 intx newflags = (value & parameter_size_mask);
> 158 Atomic::cmpxchg(newflags, &_flags, (intx)0);
>
> This is a nice demonstration of why I wanted to include some value
> preserving integral conversions in cmpxchg, rather than requiring
> exact type matching in the integral case. There have been some others
> that I haven't commented on. Apparently we (I) got away with
> including such conversions in Atomic::add, which I'd forgotten about.
> And see comment regarding Atomic::sub below.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/oops/cpCache.hpp
> 139 volatile Metadata* _f1; // entry specific metadata field
>
> [pre-existing]
> I suspect the type should be Metadata* volatile. And that would
> eliminate the need for the cast here:
>
> 339 Metadata* f1_ord() const { return (Metadata *)OrderAccess::load_acquire(&_f1); }
>
> I don't know if there are any other changes needed or desirable around
> _f1 usage.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/oops/method.hpp
> 139 volatile address from_compiled_entry() const { return OrderAccess::load_acquire(&_from_compiled_entry); }
> 140 volatile address from_compiled_entry_no_trampoline() const;
> 141 volatile address from_interpreted_entry() const{ return OrderAccess::load_acquire(&_from_interpreted_entry); }
>
> [pre-existing]
> The volatile qualifiers here seem suspect to me.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/oops/oop.inline.hpp
> 391 narrowOop old = (narrowOop)Atomic::xchg(val, (narrowOop*)dest);
>
> Cast of return type is not needed.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/prims/jni.cpp
>
> [pre-existing]
>
> copy_jni_function_table should be using Copy::disjoint_words_atomic.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/prims/jni.cpp
>
> [pre-existing]
>
> 3892 // We're about to use Atomic::xchg for synchronization. Some Zero
> 3893 // platforms use the GCC builtin __sync_lock_test_and_set for this,
> 3894 // but __sync_lock_test_and_set is not guaranteed to do what we want
> 3895 // on all architectures. So we check it works before relying on it.
> 3896 #if defined(ZERO) && defined(ASSERT)
> 3897 {
> 3898 jint a = 0xcafebabe;
> 3899 jint b = Atomic::xchg(0xdeadbeef, &a);
> 3900 void *c = &a;
> 3901 void *d = Atomic::xchg(&b, &c);
> 3902 assert(a == (jint) 0xdeadbeef && b == (jint) 0xcafebabe, "Atomic::xchg() works");
> 3903 assert(c == &b && d == &a, "Atomic::xchg() works");
> 3904 }
> 3905 #endif // ZERO && ASSERT
>
> It seems rather strange to be testing Atomic::xchg() here, rather than
> as part of unit testing Atomic? Fail unit testing => don't try to
> use...
>
> ------------------------------------------------------------------------------
> src/hotspot/share/prims/jvmtiRawMonitor.cpp
> 130 if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) {
> 142 if (_owner == NULL && Atomic::cmpxchg_if_null((void*)Self, &_owner)) {
>
> I think these casts aren't needed. _owner is void*, and Self is
> Thread*, which is implicitly convertible to void*.
>
> Similarly here, for the THREAD argument:
> 280 Contended = Atomic::cmpxchg((void*)THREAD, &_owner, (void*)NULL);
> 283 Contended = Atomic::cmpxchg((void*)THREAD, &_owner, (void*)NULL);
>
> ------------------------------------------------------------------------------
> src/hotspot/share/prims/jvmtiRawMonitor.hpp
>
> This file is in the webrev, but seems to be unchanged.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/atomic.hpp
> 520 template<typename I, typename D>
> 521 inline D Atomic::sub(I sub_value, D volatile* dest) {
> 522 STATIC_ASSERT(IsPointer<D>::value || IsIntegral<D>::value);
> 523 // Assumes two's complement integer representation.
> 524 #pragma warning(suppress: 4146)
> 525 return Atomic::add(-sub_value, dest);
> 526 }
>
> I'm pretty sure this implementation is incorrect. I think it produces
> the wrong result when I and D are both unsigned integer types and
> sizeof(I) < sizeof(D).
>
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/mutex.cpp
> 304 intptr_t v = Atomic::cmpxchg((intptr_t)_LBIT, &_LockWord.FullWord, (intptr_t)0); // agro ...
>
> _LBIT should probably be intptr_t, rather than an enum. Note that the
> enum type is unused. The old value here is another place where an
> implicit widening of same signedness would have been nice. (Such
> implicit widening doesn't work for enums, since it's unspecified
> whether they default to signed or unsigned representation, and
> implementatinos differ.)
>
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/mutex.hpp
>
> [pre-existing]
>
> I think the Address member of the SplitWord union is unused. Looking
> at AcquireOrPush (and others), I'm wondering whether it *should* be
> used there, or whether just using intptr_t casts and doing integral
> arithmetic (as is presently being done) is easier and clearer.
>
> Also the _LSBINDEX macro probably ought to be defined in mutex.cpp
> rather than polluting the global namespace. And technically, that
> name is reserved word.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/objectMonitor.cpp
> 252 void * cur = Atomic::cmpxchg((void*)Self, &_owner, (void*)NULL);
> 409 if (Atomic::cmpxchg_if_null((void*)Self, &_owner)) {
> 1983 ox = (Thread*)Atomic::cmpxchg((void*)Self, &_owner, (void*)NULL);
>
> I think the casts of Self aren't needed.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/objectMonitor.cpp
> 995 if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) {
> 1020 if (!Atomic::cmpxchg_if_null((void*)THREAD, &_owner)) {
>
> I think the casts of THREAD aren't needed.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/objectMonitor.hpp
> 254 markOopDesc* volatile* header_addr();
>
> Why isn't this volatile markOop* ?
>
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/synchronizer.cpp
> 242 Atomic::cmpxchg_if_null((void*)Self, &(m->_owner))) {
>
> I think the cast of Self isn't needed.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/synchronizer.cpp
> 992 for (; block != NULL; block = (PaddedEnd<ObjectMonitor> *)next(block)) {
> 1734 for (; block != NULL; block = (PaddedEnd<ObjectMonitor> *)next(block)) {
>
> [pre-existing]
> All calls to next() pass a PaddedEnd<ObjectMonitor>* and cast the
> result. How about moving all that behavior into next().
>
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/synchronizer.cpp
> 1970 if (monitor > (ObjectMonitor *)&block[0] &&
> 1971 monitor < (ObjectMonitor *)&block[_BLOCKSIZE]) {
>
> [pre-existing]
> Are the casts needed here? I think PaddedEnd<ObjectMonitor> is
> derived from ObjectMonitor, so implicit conversions should apply.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/synchronizer.hpp
> 28 #include "memory/padded.hpp"
> 163 static PaddedEnd<ObjectMonitor> * volatile gBlockList;
>
> I was going to suggest as an alternative just making gBlockList a file
> scoped variable in synchronizer.cpp, since it isn't used outside of
> that file. Except that it is referenced by vmStructs. Curses!
>
> ------------------------------------------------------------------------------
> src/hotspot/share/runtime/thread.cpp
> 4707 intptr_t w = Atomic::cmpxchg((intptr_t)LOCKBIT, Lock, (intptr_t)0);
>
> This and other places suggest LOCKBIT should be defined as intptr_t,
> rather than as an enum value. The MuxBits enum type is unused.
>
> And the cast of 0 is another case where implicit widening would be nice.
>
> ------------------------------------------------------------------------------
> src/hotspot/share/services/mallocSiteTable.cpp
> 261 bool MallocSiteHashtableEntry::atomic_insert(const MallocSiteHashtableEntry* entry) {
> 262 return Atomic::cmpxchg_if_null(entry, (const MallocSiteHashtableEntry**)&_next);
> 263 }
>
> I think the problem here that is leading to the cast is that
> atomic_insert is taking a const T*. Note that it's only caller passes
> a non-const T*.
>
> ------------------------------------------------------------------------------
>
More information about the hotspot-dev
mailing list