RFR: 8186838: Generalize Atomic::inc/dec with templates

Erik Österlund erik.osterlund at oracle.com
Fri Sep 1 14:15:57 UTC 2017


Hi Andrew,

On 2017-09-01 15:41, Andrew Haley wrote:
> On 31/08/17 13:45, Erik Österlund wrote:
>> Hi everyone,
>>
>> Bug ID:
>> https://bugs.openjdk.java.net/browse/JDK-8186838
>>
>> Webrev:
>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/
>>
>> The time has come for the next step in generalizing Atomic with
>> templates. Today I will focus on Atomic::inc/dec.
>>
>> I have tried to mimic the new Kim style that seems to have been
>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the
>> structure looks like this:
>>
>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object
>> that performs some basic type checks.
>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define
>> the operation arbitrarily for a given platform. The default
>> implementation if not specialized for a platform is to call Atomic::add.
>> So only platforms that want to do something different than that as an
>> optimization have to provide a specialization.
>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to
>> be more optimized may inherit from a helper class
>> IncUsingConstant/DecUsingConstant. This helper helps performing the
>> necessary computation what the increment/decrement should be after
>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation then
>> only needs to define an inc/dec member function, and will then get all
>> the context information necessary to generate a more optimized
>> implementation. Easy peasy.
> I wanted to say something nice, but I honestly can't.  I am dismayed.

Okay.

> I hoped that inc/dec would turn out to be much simpler than the
> cmpxchg functions: I think they should, because they don't have to
> deal with the complexity of potentially three different types.
> Instead we have, again, a large and complex patch.
>
> Even on AArch64, which should be the simplest case because Atomic::inc
> can be defined as
>
> template <typename T1>
> inc(T1 *dest) {
>    return __sync_add_and_fetch(dest, 1);
> }

AArch64 is indeed the simplest case. It does not have a specialization 
in my patch. It simply expresses Atomic::inc in terms of Atomic::add.

> or something similar, we have
>
> Atomic::inc<int>
> Atomic::IncImpl<int, void>::operator()
> Atomic::PlatformInc<4ul, IntegralConstant<int, 1> >::operator()
> Atomic::add<int, int>
> Atomic::AddImpl<int, int, void>::operator()
> Atomic::AddAndFetch<Atomic::PlatformAdd<4ul> >::operator()<int, int>
> Atomic::PlatformAdd<4ul>::add_and_fetch<int, int>
> __sync_add_and_fetch
>
> I quite understand that it isn't so easy on some systems, and they
> need a generic form that explodes into four different calls, one for
> each size of integer.  I completely accept that it will be more
> complex for everything else.  But is it necessary to have so much code
> for something so simple?  This is a 1400 line patch.  Granted, much of
> it is simply moving stuff around, but despite the potential of
> template code to simplify the implementation we have a more complex
> solution than we had before.
>
> I ask you, is this the simplest solution that you believe is possible?

It is not the simplest solution I can think of. The simplest solution I 
can think of is to remove all specialized versions of Atomic::inc/dec 
and just have it call Atomic::add directly. That would remove the 
optimizations we have today, for whatever reason we have them. It would 
lead to slightly more conservative fencing on PPC/S390, and would lead 
to slightly less optimal machine encoding on x86 (without immediate 
values in the instructions). But it would be simpler for sure. I did not 
put any judgement into whether our existing optimizations are worthwhile 
or not. But if you want to prioritize simplicity, removing those 
optimizations is one possible solution. Would you prefer that?

Thanks,
/Erik


More information about the hotspot-dev mailing list