RFR: 8186838: Generalize Atomic::inc/dec with templates

Fri Sep 1 21:57:14 UTC 2017

On 2/09/2017 12:15 AM, Erik Österlund wrote:
> Hi Andrew,
> 
> On 2017-09-01 15:41, Andrew Haley wrote:
>> On 31/08/17 13:45, Erik Österlund wrote:
>>> Hi everyone,
>>>
>>> Bug ID:
>>> https://bugs.openjdk.java.net/browse/JDK-8186838
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/
>>>
>>> The time has come for the next step in generalizing Atomic with
>>> templates. Today I will focus on Atomic::inc/dec.
>>>
>>> I have tried to mimic the new Kim style that seems to have been
>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the
>>> structure looks like this:
>>>
>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object
>>> that performs some basic type checks.
>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define
>>> the operation arbitrarily for a given platform. The default
>>> implementation if not specialized for a platform is to call Atomic::add.
>>> So only platforms that want to do something different than that as an
>>> optimization have to provide a specialization.
>>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to
>>> be more optimized may inherit from a helper class
>>> IncUsingConstant/DecUsingConstant. This helper helps performing the
>>> necessary computation what the increment/decrement should be after
>>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation then
>>> only needs to define an inc/dec member function, and will then get all
>>> the context information necessary to generate a more optimized
>>> implementation. Easy peasy.
>> I wanted to say something nice, but I honestly can't.  I am dismayed.
> 
> Okay.
> 
>> I hoped that inc/dec would turn out to be much simpler than the
>> cmpxchg functions: I think they should, because they don't have to
>> deal with the complexity of potentially three different types.
>> Instead we have, again, a large and complex patch.
>>
>> Even on AArch64, which should be the simplest case because Atomic::inc
>> can be defined as
>>
>> template <typename T1>
>> inc(T1 *dest) {
>>    return __sync_add_and_fetch(dest, 1);
>> }
> 
> AArch64 is indeed the simplest case. It does not have a specialization 
> in my patch. It simply expresses Atomic::inc in terms of Atomic::add.
> 
>> or something similar, we have
>>
>> Atomic::inc<int>
>> Atomic::IncImpl<int, void>::operator()
>> Atomic::PlatformInc<4ul, IntegralConstant<int, 1> >::operator()
>> Atomic::add<int, int>
>> Atomic::AddImpl<int, int, void>::operator()
>> Atomic::AddAndFetch<Atomic::PlatformAdd<4ul> >::operator()<int, int>
>> Atomic::PlatformAdd<4ul>::add_and_fetch<int, int>
>> __sync_add_and_fetch
>>
>> I quite understand that it isn't so easy on some systems, and they
>> need a generic form that explodes into four different calls, one for
>> each size of integer.  I completely accept that it will be more
>> complex for everything else.  But is it necessary to have so much code
>> for something so simple?  This is a 1400 line patch.  Granted, much of
>> it is simply moving stuff around, but despite the potential of
>> template code to simplify the implementation we have a more complex
>> solution than we had before.
>>
>> I ask you, is this the simplest solution that you believe is possible?
> 
> It is not the simplest solution I can think of. The simplest solution I 
> can think of is to remove all specialized versions of Atomic::inc/dec 
> and just have it call Atomic::add directly. That would remove the 

I don't think this is the source of complexity that screams for 
simplification. It is all the template stuff. I can't get right into 
this right now (it's Saturday) but I still don't see why we have such 
seemingly different things in cmpxchg, add and inc/dec when the basic 
jobs at each level are the same. Maybe it is just use of different names 
that is confusing me.

David
-----

> optimizations we have today, for whatever reason we have them. It would 
> lead to slightly more conservative fencing on PPC/S390, and would lead 
> to slightly less optimal machine encoding on x86 (without immediate 
> values in the instructions). But it would be simpler for sure. I did not 
> put any judgement into whether our existing optimizations are worthwhile 
> or not. But if you want to prioritize simplicity, removing those 
> optimizations is one possible solution. Would you prefer that?
> 
> Thanks,
> /Erik