RFR: 8186838: Generalize Atomic::inc/dec with templates

Mon Sep 4 21:59:12 UTC 2017

Hi Erik,

On 4/09/2017 5:15 PM, Erik Österlund wrote:
> Hi David,
> 
> On 2017-09-04 03:24, David Holmes wrote:
>> Hi Erik,
>>
>> On 1/09/2017 8:49 PM, Erik Österlund wrote:
>>> Hi David,
>>>
>>> The shared structure for all operations is the following:
>>>
>>> An Atomic::something call creates a SomethingImpl function object 
>>> that performs some basic type checking and then forwards the call 
>>> straight to a PlatformSomething function object. This 
>>> PlatformSomething object could decide to do anything. But to make 
>>> life easier, it may inherit from a shared SomethingHelper function 
>>> object with CRTP that calls back into the PlatformSomething function 
>>> object to emit inline assembly.
>>
>> Right, but! Lets look at some details.
>>
>> Atomic::add
>>   AddImpl
>>     PlatformAdd<size>
>>       FetchAndAdd
>>       AddAndFetch
>>       add_using_helper
>>
>> Atomic::cmpxchg
>>   CmpxchgImpl
>>     PlatformCmpxchg<size>
>>       cmpxchg_using_helper
>>
>> Atomic::inc
>>   IncImpl
>>     PlatformInc<size, Adjustment>
>>       IncUsingConstant
>>
>> Why is it that the simplest operation (inc/dec) has the most complex 
>> platform template definition? Why do we need Adjustment? You 
>> previously said "Adjustment represents the increment/decrement value 
>> as an IntegralConstant - your template friend for passing around a 
>> constant with both a specified type and value in templates". But add 
>> passes around values and doesn't need this. Further inc/dec don't need 
>> to pass anything around anywhere - inc adds 1, dec subtracts 1! This 
>> "1" does not need to appear anywhere in the API or get passed across 
>> layers - the only place this "1" becomes evident is in the actual 
>> platform asm that does the logic of "add 1" or "subtract 1".
>>
>> My understanding from previous discussions is that much of the 
>> template machinations was to deal with type management for "dest" and 
>> the values being passed around. But here, for inc/dec there are no 
>> values being passed so we don't have to make "dest" type-compatible 
>> with any value.
> 
> Dealing with different types being passed in is one part of the problem 
> - a problem that almost all operations seems to have. But Atomic::add 
> and inc/dec have more problems to deal with.
> 
> The Atomic::add operation has two more problems that cmpxchg does not have.
> 1) It needs to scale pointer arithmetic. So if you have a P* and you add 
> it by 2, then you really add the underlying value by 2 * sizeof(P), and 
> the scaled addend needs to be of the right type - the type of the 
> destination for integral types and ptrdiff_t for pointers. This is 
> similar semantics to ++pointer.

I'll address this below - but yes I overlooked this aspect.

> 2) It connects backends with different semantics - either fetch_and_add 
> or add_and_fetch to a common public interface with add_and_fetch semantics.

Not at all clear why this has to manifest in the upper/middle layers 
instead of being handled by the actual lowest-layer ??

> This is the reason that Atomic::add might appear more complicated than 
> Atomic::cmpxchg. Because Atomic::cmpxchg only had the different type 
> problems to deal with - no pointer arithmetics.
> 
> The reason why Atomic::inc/dec looks more complicated than Atomic::add 
> is that it needs to preserve the pointer arithmetic as constants rather 
> than values, because the scaled addend is embedded in the inline 
> assembly as immediate values. Therefore it passes around an 
> IntegralConstant that embeds both the type and size of the addend. And 
> it is not just 1/-1. For integral destinations the constant used is 1/-1 
> of the type stored at the destination. For pointers the constant is 
> ptrdiff_t with a value representing the size of the element pointed to.

This is insanely complicated (I think that counts as 'accidental 
complexity' per Andrew's comment ;-) ). Pointer arithmetic is a 
basic/fundamental part of C/C++, yet this template stuff has to jump 
through multiple inverted hoops to do something the language "just 
does"! All this complexity to manage a conversion addend -> addend * 
sizeof(*dest) ??

And the fact that inc/dec are simpler than add, yet result in far more 
complicated templates because the simpler addend is a constant, is just 
as unfathomable to me!

> Having said that - I am not opposed to simply removing the 
> specializations of inc/dec if we are scared of the complexity of passing 
> this constant to the platform layer. After running a bunch of benchmarks 
> over the weekend, it showed no significant regressions after removal. 
> Now of course that might not tell the full story - it could have missed 
> that some critical operation in the JVM takes longer. But I would be 
> very surprised if that was the case.

I can imagine we use an "add immediate" form for inc/dec of 1, do we 
actually use that for other values? I would expect inc_ptr/dec_ptr to 
always translate to add_ptr, with no special case for when ptr is char* 
and so we only add/sub 1. ??

Thanks,
David

> Thanks,
> /Erik
> 
>>
>> Cheers,
>> David
>> -----
>>
>>> Hope this explanation helps understanding the intended structure of 
>>> this work.
>>>
>>> Thanks,
>>> /Erik
>>>
>>> On 2017-09-01 12:34, David Holmes wrote:
>>>> Hi Erik,
>>>>
>>>> I just wanted to add that I would expect the cmpxchg, add and inc, 
>>>> Atomic API's to all require similar basic structure for manipulating 
>>>> types/values etc, yet all three seem to have quite different 
>>>> structures that I find very confusing. I'm still at a loss to fathom 
>>>> the CRTP and the hoops we seemingly have to jump through just to add 
>>>> or subtract 1!!!
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>> On 1/09/2017 7:29 PM, Erik Österlund wrote:
>>>>> Hi David,
>>>>>
>>>>> On 2017-09-01 02:49, David Holmes wrote:
>>>>>> Hi Erik,
>>>>>>
>>>>>> Sorry but this one is really losing me.
>>>>>>
>>>>>> What is the role of Adjustment ??
>>>>>
>>>>> Adjustment represents the increment/decrement value as an 
>>>>> IntegralConstant - your template friend for passing around a 
>>>>> constant with both a specified type and value in templates. The 
>>>>> type of the increment/decrement is the type of the destination when 
>>>>> the destination is an integral type, otherwise if it is a pointer 
>>>>> type, the increment/decrement type is ptrdiff_t.
>>>>>
>>>>>> How are inc/dec anything but "using constant" ??
>>>>>
>>>>> I was also a bit torn on that name (I assume you are referring to 
>>>>> IncUsingConstant/DecUsingConstant). It was hard to find a name that 
>>>>> depicted what this platform helper does. I considered calling the 
>>>>> helper something with immediate in the name because it is really 
>>>>> used to embed the constant as immediate values in inline assembly 
>>>>> today. But then again that seemed too specific, as it is not 
>>>>> completely obvious platform specializations will use it in that 
>>>>> way. One might just want to specialize this to send it into some 
>>>>> compiler Atomic::inc intrinsic for example. Do you have any other 
>>>>> preferred names? Here are a few possible names for IncUsingConstant:
>>>>>
>>>>> IncUsingScaledConstant
>>>>> IncUsingAdjustedConstant
>>>>> IncUsingPlatformHelper
>>>>>
>>>>> Any favourites?
>>>>>
>>>>>> Why do we special case jshort??
>>>>>
>>>>> To be consistent with the special case of Atomic::add on jshort. Do 
>>>>> you want it removed?
>>>>>
>>>>>> This is indecipherable to normal people ;-)
>>>>>>
>>>>>>  This()->template inc<I, Adjustment>(dest);
>>>>>>
>>>>>> For something as trivial as adding or subtracting 1 the template 
>>>>>> machinations here are just mind boggling!
>>>>>
>>>>> This uses the CRTP (Curiously Recurring Template Pattern) C++ 
>>>>> idiom. The idea is to devirtualize a virtual call by passing in the 
>>>>> derived type as a template parameter to a base class, and then let 
>>>>> the base class static_cast to the derived class to devirtualize the 
>>>>> call. I hope this explanation sheds some light on what is going on. 
>>>>> The same CRTP idiom was used in the Atomic::add implementation in a 
>>>>> similar fashion.
>>>>>
>>>>> I will add some comments describing this in the next round after 
>>>>> Coleen replies.
>>>>>
>>>>> Thanks for looking at this.
>>>>>
>>>>> /Erik
>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> David
>>>>>>
>>>>>> On 31/08/2017 10:45 PM, Erik Österlund wrote:
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> Bug ID:
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838
>>>>>>>
>>>>>>> Webrev:
>>>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/
>>>>>>>
>>>>>>> The time has come for the next step in generalizing Atomic with 
>>>>>>> templates. Today I will focus on Atomic::inc/dec.
>>>>>>>
>>>>>>> I have tried to mimic the new Kim style that seems to have been 
>>>>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the 
>>>>>>> structure looks like this:
>>>>>>>
>>>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function 
>>>>>>> object that performs some basic type checks.
>>>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can 
>>>>>>> define the operation arbitrarily for a given platform. The 
>>>>>>> default implementation if not specialized for a platform is to 
>>>>>>> call Atomic::add. So only platforms that want to do something 
>>>>>>> different than that as an optimization have to provide a 
>>>>>>> specialization.
>>>>>>> Layer 3) Platforms that decide to specialize 
>>>>>>> PlatformInc/PlatformDec to be more optimized may inherit from a 
>>>>>>> helper class IncUsingConstant/DecUsingConstant. This helper helps 
>>>>>>> performing the necessary computation what the increment/decrement 
>>>>>>> should be after pointer scaling using CRTP. The 
>>>>>>> PlatformInc/PlatformDec operation then only needs to define an 
>>>>>>> inc/dec member function, and will then get all the context 
>>>>>>> information necessary to generate a more optimized 
>>>>>>> implementation. Easy peasy.
>>>>>>>
>>>>>>> It is worth noticing that the generalized Atomic::dec operation 
>>>>>>> assumes a two's complement integer machine and potentially sends 
>>>>>>> the unary negative of a potentially unsigned type to Atomic::add. 
>>>>>>> I have the following comments about this:
>>>>>>> 1) We already assume in other code that two's complement integers 
>>>>>>> must be present.
>>>>>>> 2) A machine that does not have two's complement integers may 
>>>>>>> still simply provide a specialization that solves the problem in 
>>>>>>> a different way.
>>>>>>> 3) The alternative that does not make assumptions about that 
>>>>>>> would use the good old IntegerTypes::cast_to_signed 
>>>>>>> metaprogramming stuff, and I seem to recall we thought that was a 
>>>>>>> bit too involved and complicated.
>>>>>>> This is the reason why I have chosen to use unary minus on the 
>>>>>>> potentially unsigned type in the shared helper code that sends 
>>>>>>> the decrement as an addend to Atomic::add.
>>>>>>>
>>>>>>> It would also be nice if somebody with access to PPC and s390 
>>>>>>> machines could try out the relevant changes there so I do not 
>>>>>>> accidentally break those platforms. I have blind-coded the 
>>>>>>> addition of the immediate values passed in to the inline assembly 
>>>>>>> in a way that I think looks like it should work.
>>>>>>>
>>>>>>> Testing:
>>>>>>> RBT hs-tier3, JPRT --testset hotspot
>>>>>>>
>>>>>>> Thanks,
>>>>>>> /Erik
>>>>>
>>>
>