RFR: 8186838: Generalize Atomic::inc/dec with templates

Erik Österlund erik.osterlund at oracle.com
Fri Sep 1 15:23:49 UTC 2017


Hi Coleen,

On 2017-09-01 16:42, coleen.phillimore at oracle.com wrote:
>
>
> On 9/1/17 10:15 AM, Erik Österlund wrote:
>> Hi Andrew,
>>
>> On 2017-09-01 15:41, Andrew Haley wrote:
>>> On 31/08/17 13:45, Erik Österlund wrote:
>>>> Hi everyone,
>>>>
>>>> Bug ID:
>>>> https://bugs.openjdk.java.net/browse/JDK-8186838
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/
>>>>
>>>> The time has come for the next step in generalizing Atomic with
>>>> templates. Today I will focus on Atomic::inc/dec.
>>>>
>>>> I have tried to mimic the new Kim style that seems to have been
>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the
>>>> structure looks like this:
>>>>
>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object
>>>> that performs some basic type checks.
>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define
>>>> the operation arbitrarily for a given platform. The default
>>>> implementation if not specialized for a platform is to call 
>>>> Atomic::add.
>>>> So only platforms that want to do something different than that as an
>>>> optimization have to provide a specialization.
>>>> Layer 3) Platforms that decide to specialize 
>>>> PlatformInc/PlatformDec to
>>>> be more optimized may inherit from a helper class
>>>> IncUsingConstant/DecUsingConstant. This helper helps performing the
>>>> necessary computation what the increment/decrement should be after
>>>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation then
>>>> only needs to define an inc/dec member function, and will then get all
>>>> the context information necessary to generate a more optimized
>>>> implementation. Easy peasy.
>>> I wanted to say something nice, but I honestly can't.  I am dismayed.
>>
>> Okay.
>>
>>> I hoped that inc/dec would turn out to be much simpler than the
>>> cmpxchg functions: I think they should, because they don't have to
>>> deal with the complexity of potentially three different types.
>>> Instead we have, again, a large and complex patch.
>>>
>>> Even on AArch64, which should be the simplest case because Atomic::inc
>>> can be defined as
>>>
>>> template <typename T1>
>>> inc(T1 *dest) {
>>>    return __sync_add_and_fetch(dest, 1);
>>> }
>>
>> AArch64 is indeed the simplest case. It does not have a 
>> specialization in my patch. It simply expresses Atomic::inc in terms 
>> of Atomic::add.
>>
>>> or something similar, we have
>>>
>>> Atomic::inc<int>
>>> Atomic::IncImpl<int, void>::operator()
>>> Atomic::PlatformInc<4ul, IntegralConstant<int, 1> >::operator()
>>> Atomic::add<int, int>
>>> Atomic::AddImpl<int, int, void>::operator()
>>> Atomic::AddAndFetch<Atomic::PlatformAdd<4ul> >::operator()<int, int>
>>> Atomic::PlatformAdd<4ul>::add_and_fetch<int, int>
>>> __sync_add_and_fetch
>>>
>>> I quite understand that it isn't so easy on some systems, and they
>>> need a generic form that explodes into four different calls, one for
>>> each size of integer.  I completely accept that it will be more
>>> complex for everything else.  But is it necessary to have so much code
>>> for something so simple?  This is a 1400 line patch.  Granted, much of
>>> it is simply moving stuff around, but despite the potential of
>>> template code to simplify the implementation we have a more complex
>>> solution than we had before.
>>>
>>> I ask you, is this the simplest solution that you believe is possible?
>>
>> It is not the simplest solution I can think of. The simplest solution 
>> I can think of is to remove all specialized versions of 
>> Atomic::inc/dec and just have it call Atomic::add directly. That 
>> would remove the optimizations we have today, for whatever reason we 
>> have them. It would lead to slightly more conservative fencing on 
>> PPC/S390, and would lead to slightly less optimal machine encoding on 
>> x86 (without immediate values in the instructions). But it would be 
>> simpler for sure. I did not put any judgement into whether our 
>> existing optimizations are worthwhile or not. But if you want to 
>> prioritize simplicity, removing those optimizations is one possible 
>> solution. Would you prefer that?
>
> I wonder if you could remove the linux x86 asm code for inc/dec, 
> recode it to use add, and do a dev submit run against your patch? 
> While we're discussing this.

Okay, I will try that.

/Erik

> thanks,
> Coleen
>
>>
>> Thanks,
>> /Erik
>



More information about the hotspot-dev mailing list