RFR: 8186838: Generalize Atomic::inc/dec with templates

Fri Sep 1 15:52:01 UTC 2017

The only Atomic::inc* that I found in product code that wasn't printing 
statistics or exception cases was mostly in G1 and one interesting case 
in objectMonitor and safepointing, where a lot of other CAS operations 
already have been done.  I'm willing to bet this platform specific 
optimization has no value.   I would vote removal, pending examination 
of these places.

share/vm/gc/g1/dirtyCardQueue.cpp

   if (result) {
     assert_fully_consumed(node, buffer_size());
     Atomic::inc(&_processed_buffers_mut);
   }
...
       Atomic::inc(&_processed_buffers_rs_thread);*
*
share/vm/gc/g1/heapRegionRemSet.cpp

           Atomic::inc(&_occupied);
   Atomic::inc(&_n_coarsenings);

share/vm/runtime/objectMonitor.cpp

ObjectMonitor::enter()

   // Prevent deflation at STW-time.  See deflate_idle_monitors() and 
is_busy().
   // Ensure the object-monitor relationship remains stable while 
there's contention.
   Atomic::inc(&_count);

share/vm/runtime/safepoint.cpp

       if (is_synchronizing()) {
          Atomic::inc (&TryingToBlock) ;
       }

share/vm/code/nmethod.cpp
nmethodLocker

   Atomic::inc(&nm->_lock_count);

Coleen

On 9/1/17 11:23 AM, Erik Österlund wrote:
> Hi Coleen,
>
> On 2017-09-01 16:42, coleen.phillimore at oracle.com wrote:
>>
>>
>> On 9/1/17 10:15 AM, Erik Österlund wrote:
>>> Hi Andrew,
>>>
>>> On 2017-09-01 15:41, Andrew Haley wrote:
>>>> On 31/08/17 13:45, Erik Österlund wrote:
>>>>> Hi everyone,
>>>>>
>>>>> Bug ID:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/
>>>>>
>>>>> The time has come for the next step in generalizing Atomic with
>>>>> templates. Today I will focus on Atomic::inc/dec.
>>>>>
>>>>> I have tried to mimic the new Kim style that seems to have been
>>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the
>>>>> structure looks like this:
>>>>>
>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object
>>>>> that performs some basic type checks.
>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can 
>>>>> define
>>>>> the operation arbitrarily for a given platform. The default
>>>>> implementation if not specialized for a platform is to call 
>>>>> Atomic::add.
>>>>> So only platforms that want to do something different than that as an
>>>>> optimization have to provide a specialization.
>>>>> Layer 3) Platforms that decide to specialize 
>>>>> PlatformInc/PlatformDec to
>>>>> be more optimized may inherit from a helper class
>>>>> IncUsingConstant/DecUsingConstant. This helper helps performing the
>>>>> necessary computation what the increment/decrement should be after
>>>>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation 
>>>>> then
>>>>> only needs to define an inc/dec member function, and will then get 
>>>>> all
>>>>> the context information necessary to generate a more optimized
>>>>> implementation. Easy peasy.
>>>> I wanted to say something nice, but I honestly can't.  I am dismayed.
>>>
>>> Okay.
>>>
>>>> I hoped that inc/dec would turn out to be much simpler than the
>>>> cmpxchg functions: I think they should, because they don't have to
>>>> deal with the complexity of potentially three different types.
>>>> Instead we have, again, a large and complex patch.
>>>>
>>>> Even on AArch64, which should be the simplest case because Atomic::inc
>>>> can be defined as
>>>>
>>>> template <typename T1>
>>>> inc(T1 *dest) {
>>>>    return __sync_add_and_fetch(dest, 1);
>>>> }
>>>
>>> AArch64 is indeed the simplest case. It does not have a 
>>> specialization in my patch. It simply expresses Atomic::inc in terms 
>>> of Atomic::add.
>>>
>>>> or something similar, we have
>>>>
>>>> Atomic::inc<int>
>>>> Atomic::IncImpl<int, void>::operator()
>>>> Atomic::PlatformInc<4ul, IntegralConstant<int, 1> >::operator()
>>>> Atomic::add<int, int>
>>>> Atomic::AddImpl<int, int, void>::operator()
>>>> Atomic::AddAndFetch<Atomic::PlatformAdd<4ul> >::operator()<int, int>
>>>> Atomic::PlatformAdd<4ul>::add_and_fetch<int, int>
>>>> __sync_add_and_fetch
>>>>
>>>> I quite understand that it isn't so easy on some systems, and they
>>>> need a generic form that explodes into four different calls, one for
>>>> each size of integer.  I completely accept that it will be more
>>>> complex for everything else.  But is it necessary to have so much code
>>>> for something so simple?  This is a 1400 line patch. Granted, much of
>>>> it is simply moving stuff around, but despite the potential of
>>>> template code to simplify the implementation we have a more complex
>>>> solution than we had before.
>>>>
>>>> I ask you, is this the simplest solution that you believe is possible?
>>>
>>> It is not the simplest solution I can think of. The simplest 
>>> solution I can think of is to remove all specialized versions of 
>>> Atomic::inc/dec and just have it call Atomic::add directly. That 
>>> would remove the optimizations we have today, for whatever reason we 
>>> have them. It would lead to slightly more conservative fencing on 
>>> PPC/S390, and would lead to slightly less optimal machine encoding 
>>> on x86 (without immediate values in the instructions). But it would 
>>> be simpler for sure. I did not put any judgement into whether our 
>>> existing optimizations are worthwhile or not. But if you want to 
>>> prioritize simplicity, removing those optimizations is one possible 
>>> solution. Would you prefer that?
>>
>> I wonder if you could remove the linux x86 asm code for inc/dec, 
>> recode it to use add, and do a dev submit run against your patch? 
>> While we're discussing this.
>
> Okay, I will try that.
>
> /Erik
>
>> thanks,
>> Coleen
>>
>>>
>>> Thanks,
>>> /Erik
>>
>