RFR: 8186838: Generalize Atomic::inc/dec with templates

Tue Sep 19 23:19:55 UTC 2017

Erik,  This looks like a very nice cleanup!
Thanks,
Coleen

On 9/18/17 4:43 AM, Erik Österlund wrote:
> Hi,
>
> After some off-list discussions I have made a new version with the 
> following improvements:
>
> 1) Added some comments describing the constraints on the types passed 
> in to inc/dec (integral or pointer, and pointers are scaled).
> 2) Removed inc_ptr/dec_ptr and all uses of it. None of these actually 
> used pointers, only pointer sized integers. So I thought removing 
> these overloads and the unnecessary confusion caused by them would 
> make it easier to review this change.
> 3) Renamed the typedef in the body representing the addend to be 
> called I instead of T to be consistent with the convention Kim 
> introduced.
>
> Full webrev:
> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.02/
>
> Incremental webrev:
> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.01_02/
>
> Thanks,
> /Erik
>
> On 2017-09-05 12:07, Erik Österlund wrote:
>> Hi David,
>>
>> On 2017-09-04 23:59, David Holmes wrote:
>>> Hi Erik,
>>>
>>> On 4/09/2017 5:15 PM, Erik Österlund wrote:
>>>> Hi David,
>>>>
>>>> On 2017-09-04 03:24, David Holmes wrote:
>>>>> Hi Erik,
>>>>>
>>>>> On 1/09/2017 8:49 PM, Erik Österlund wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> The shared structure for all operations is the following:
>>>>>>
>>>>>> An Atomic::something call creates a SomethingImpl function object 
>>>>>> that performs some basic type checking and then forwards the call 
>>>>>> straight to a PlatformSomething function object. This 
>>>>>> PlatformSomething object could decide to do anything. But to make 
>>>>>> life easier, it may inherit from a shared SomethingHelper 
>>>>>> function object with CRTP that calls back into the 
>>>>>> PlatformSomething function object to emit inline assembly.
>>>>>
>>>>> Right, but! Lets look at some details.
>>>>>
>>>>> Atomic::add
>>>>>   AddImpl
>>>>>     PlatformAdd<size>
>>>>>       FetchAndAdd
>>>>>       AddAndFetch
>>>>>       add_using_helper
>>>>>
>>>>> Atomic::cmpxchg
>>>>>   CmpxchgImpl
>>>>>     PlatformCmpxchg<size>
>>>>>       cmpxchg_using_helper
>>>>>
>>>>> Atomic::inc
>>>>>   IncImpl
>>>>>     PlatformInc<size, Adjustment>
>>>>>       IncUsingConstant
>>>>>
>>>>> Why is it that the simplest operation (inc/dec) has the most 
>>>>> complex platform template definition? Why do we need Adjustment? 
>>>>> You previously said "Adjustment represents the increment/decrement 
>>>>> value as an IntegralConstant - your template friend for passing 
>>>>> around a constant with both a specified type and value in 
>>>>> templates". But add passes around values and doesn't need this. 
>>>>> Further inc/dec don't need to pass anything around anywhere - inc 
>>>>> adds 1, dec subtracts 1! This "1" does not need to appear anywhere 
>>>>> in the API or get passed across layers - the only place this "1" 
>>>>> becomes evident is in the actual platform asm that does the logic 
>>>>> of "add 1" or "subtract 1".
>>>>>
>>>>> My understanding from previous discussions is that much of the 
>>>>> template machinations was to deal with type management for "dest" 
>>>>> and the values being passed around. But here, for inc/dec there 
>>>>> are no values being passed so we don't have to make "dest" 
>>>>> type-compatible with any value.
>>>>
>>>> Dealing with different types being passed in is one part of the 
>>>> problem - a problem that almost all operations seems to have. But 
>>>> Atomic::add and inc/dec have more problems to deal with.
>>>>
>>>> The Atomic::add operation has two more problems that cmpxchg does 
>>>> not have.
>>>> 1) It needs to scale pointer arithmetic. So if you have a P* and 
>>>> you add it by 2, then you really add the underlying value by 2 * 
>>>> sizeof(P), and the scaled addend needs to be of the right type - 
>>>> the type of the destination for integral types and ptrdiff_t for 
>>>> pointers. This is similar semantics to ++pointer.
>>>
>>> I'll address this below - but yes I overlooked this aspect.
>>>
>>>> 2) It connects backends with different semantics - either 
>>>> fetch_and_add or add_and_fetch to a common public interface with 
>>>> add_and_fetch semantics.
>>>
>>> Not at all clear why this has to manifest in the upper/middle layers 
>>> instead of being handled by the actual lowest-layer ??
>>
>> It could have been addressed in the lowest layer indeed. I suppose 
>> Kim found it nicer to do that on a higher level while you find it 
>> nicer to do it on a lower level. I have no opinion here.
>>
>>>
>>>> This is the reason that Atomic::add might appear more complicated 
>>>> than Atomic::cmpxchg. Because Atomic::cmpxchg only had the 
>>>> different type problems to deal with - no pointer arithmetics.
>>>>
>>>> The reason why Atomic::inc/dec looks more complicated than 
>>>> Atomic::add is that it needs to preserve the pointer arithmetic as 
>>>> constants rather than values, because the scaled addend is embedded 
>>>> in the inline assembly as immediate values. Therefore it passes 
>>>> around an IntegralConstant that embeds both the type and size of 
>>>> the addend. And it is not just 1/-1. For integral destinations the 
>>>> constant used is 1/-1 of the type stored at the destination. For 
>>>> pointers the constant is ptrdiff_t with a value representing the 
>>>> size of the element pointed to.
>>>
>>> This is insanely complicated (I think that counts as 'accidental 
>>> complexity' per Andrew's comment ;-) ). Pointer arithmetic is a 
>>> basic/fundamental part of C/C++, yet this template stuff has to jump 
>>> through multiple inverted hoops to do something the language "just 
>>> does"! All this complexity to manage a conversion addend -> addend * 
>>> sizeof(*dest) ??
>>
>> Okay.
>>
>>> And the fact that inc/dec are simpler than add, yet result in far 
>>> more complicated templates because the simpler addend is a constant, 
>>> is just as unfathomable to me!
>>
>> My latest proposal is to nuke the Atomic::inc/dec specializations and 
>> make it call Atomic::add. Any objections on that? It is arguably 
>> simpler, and then we can leave the complexity discussion behind.
>>
>>>> Having said that - I am not opposed to simply removing the 
>>>> specializations of inc/dec if we are scared of the complexity of 
>>>> passing this constant to the platform layer. After running a bunch 
>>>> of benchmarks over the weekend, it showed no significant 
>>>> regressions after removal. Now of course that might not tell the 
>>>> full story - it could have missed that some critical operation in 
>>>> the JVM takes longer. But I would be very surprised if that was the 
>>>> case.
>>>
>>> I can imagine we use an "add immediate" form for inc/dec of 1, do we 
>>> actually use that for other values? I would expect inc_ptr/dec_ptr 
>>> to always translate to add_ptr, with no special case for when ptr is 
>>> char* and so we only add/sub 1. ??
>>
>> Yes we currently only inc/sub by 1.
>>
>> Thanks,
>> /Erik
>>
>>> Thanks,
>>> David
>>>
>>>> Thanks,
>>>> /Erik
>>>>
>>>>>
>>>>> Cheers,
>>>>> David
>>>>> -----
>>>>>
>>>>>> Hope this explanation helps understanding the intended structure 
>>>>>> of this work.
>>>>>>
>>>>>> Thanks,
>>>>>> /Erik
>>>>>>
>>>>>> On 2017-09-01 12:34, David Holmes wrote:
>>>>>>> Hi Erik,
>>>>>>>
>>>>>>> I just wanted to add that I would expect the cmpxchg, add and 
>>>>>>> inc, Atomic API's to all require similar basic structure for 
>>>>>>> manipulating types/values etc, yet all three seem to have quite 
>>>>>>> different structures that I find very confusing. I'm still at a 
>>>>>>> loss to fathom the CRTP and the hoops we seemingly have to jump 
>>>>>>> through just to add or subtract 1!!!
>>>>>>>
>>>>>>> Cheers,
>>>>>>> David
>>>>>>>
>>>>>>> On 1/09/2017 7:29 PM, Erik Österlund wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> On 2017-09-01 02:49, David Holmes wrote:
>>>>>>>>> Hi Erik,
>>>>>>>>>
>>>>>>>>> Sorry but this one is really losing me.
>>>>>>>>>
>>>>>>>>> What is the role of Adjustment ??
>>>>>>>>
>>>>>>>> Adjustment represents the increment/decrement value as an 
>>>>>>>> IntegralConstant - your template friend for passing around a 
>>>>>>>> constant with both a specified type and value in templates. The 
>>>>>>>> type of the increment/decrement is the type of the destination 
>>>>>>>> when the destination is an integral type, otherwise if it is a 
>>>>>>>> pointer type, the increment/decrement type is ptrdiff_t.
>>>>>>>>
>>>>>>>>> How are inc/dec anything but "using constant" ??
>>>>>>>>
>>>>>>>> I was also a bit torn on that name (I assume you are referring 
>>>>>>>> to IncUsingConstant/DecUsingConstant). It was hard to find a 
>>>>>>>> name that depicted what this platform helper does. I considered 
>>>>>>>> calling the helper something with immediate in the name because 
>>>>>>>> it is really used to embed the constant as immediate values in 
>>>>>>>> inline assembly today. But then again that seemed too specific, 
>>>>>>>> as it is not completely obvious platform specializations will 
>>>>>>>> use it in that way. One might just want to specialize this to 
>>>>>>>> send it into some compiler Atomic::inc intrinsic for example. 
>>>>>>>> Do you have any other preferred names? Here are a few possible 
>>>>>>>> names for IncUsingConstant:
>>>>>>>>
>>>>>>>> IncUsingScaledConstant
>>>>>>>> IncUsingAdjustedConstant
>>>>>>>> IncUsingPlatformHelper
>>>>>>>>
>>>>>>>> Any favourites?
>>>>>>>>
>>>>>>>>> Why do we special case jshort??
>>>>>>>>
>>>>>>>> To be consistent with the special case of Atomic::add on 
>>>>>>>> jshort. Do you want it removed?
>>>>>>>>
>>>>>>>>> This is indecipherable to normal people ;-)
>>>>>>>>>
>>>>>>>>>  This()->template inc<I, Adjustment>(dest);
>>>>>>>>>
>>>>>>>>> For something as trivial as adding or subtracting 1 the 
>>>>>>>>> template machinations here are just mind boggling!
>>>>>>>>
>>>>>>>> This uses the CRTP (Curiously Recurring Template Pattern) C++ 
>>>>>>>> idiom. The idea is to devirtualize a virtual call by passing in 
>>>>>>>> the derived type as a template parameter to a base class, and 
>>>>>>>> then let the base class static_cast to the derived class to 
>>>>>>>> devirtualize the call. I hope this explanation sheds some light 
>>>>>>>> on what is going on. The same CRTP idiom was used in the 
>>>>>>>> Atomic::add implementation in a similar fashion.
>>>>>>>>
>>>>>>>> I will add some comments describing this in the next round 
>>>>>>>> after Coleen replies.
>>>>>>>>
>>>>>>>> Thanks for looking at this.
>>>>>>>>
>>>>>>>> /Erik
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>> On 31/08/2017 10:45 PM, Erik Österlund wrote:
>>>>>>>>>> Hi everyone,
>>>>>>>>>>
>>>>>>>>>> Bug ID:
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838
>>>>>>>>>>
>>>>>>>>>> Webrev:
>>>>>>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/
>>>>>>>>>>
>>>>>>>>>> The time has come for the next step in generalizing Atomic 
>>>>>>>>>> with templates. Today I will focus on Atomic::inc/dec.
>>>>>>>>>>
>>>>>>>>>> I have tried to mimic the new Kim style that seems to have 
>>>>>>>>>> been universally accepted. Like Atomic::add and 
>>>>>>>>>> Atomic::cmpxchg, the structure looks like this:
>>>>>>>>>>
>>>>>>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() 
>>>>>>>>>> function object that performs some basic type checks.
>>>>>>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that 
>>>>>>>>>> can define the operation arbitrarily for a given platform. 
>>>>>>>>>> The default implementation if not specialized for a platform 
>>>>>>>>>> is to call Atomic::add. So only platforms that want to do 
>>>>>>>>>> something different than that as an optimization have to 
>>>>>>>>>> provide a specialization.
>>>>>>>>>> Layer 3) Platforms that decide to specialize 
>>>>>>>>>> PlatformInc/PlatformDec to be more optimized may inherit from 
>>>>>>>>>> a helper class IncUsingConstant/DecUsingConstant. This helper 
>>>>>>>>>> helps performing the necessary computation what the 
>>>>>>>>>> increment/decrement should be after pointer scaling using 
>>>>>>>>>> CRTP. The PlatformInc/PlatformDec operation then only needs 
>>>>>>>>>> to define an inc/dec member function, and will then get all 
>>>>>>>>>> the context information necessary to generate a more 
>>>>>>>>>> optimized implementation. Easy peasy.
>>>>>>>>>>
>>>>>>>>>> It is worth noticing that the generalized Atomic::dec 
>>>>>>>>>> operation assumes a two's complement integer machine and 
>>>>>>>>>> potentially sends the unary negative of a potentially 
>>>>>>>>>> unsigned type to Atomic::add. I have the following comments 
>>>>>>>>>> about this:
>>>>>>>>>> 1) We already assume in other code that two's complement 
>>>>>>>>>> integers must be present.
>>>>>>>>>> 2) A machine that does not have two's complement integers may 
>>>>>>>>>> still simply provide a specialization that solves the problem 
>>>>>>>>>> in a different way.
>>>>>>>>>> 3) The alternative that does not make assumptions about that 
>>>>>>>>>> would use the good old IntegerTypes::cast_to_signed 
>>>>>>>>>> metaprogramming stuff, and I seem to recall we thought that 
>>>>>>>>>> was a bit too involved and complicated.
>>>>>>>>>> This is the reason why I have chosen to use unary minus on 
>>>>>>>>>> the potentially unsigned type in the shared helper code that 
>>>>>>>>>> sends the decrement as an addend to Atomic::add.
>>>>>>>>>>
>>>>>>>>>> It would also be nice if somebody with access to PPC and s390 
>>>>>>>>>> machines could try out the relevant changes there so I do not 
>>>>>>>>>> accidentally break those platforms. I have blind-coded the 
>>>>>>>>>> addition of the immediate values passed in to the inline 
>>>>>>>>>> assembly in a way that I think looks like it should work.
>>>>>>>>>>
>>>>>>>>>> Testing:
>>>>>>>>>> RBT hs-tier3, JPRT --testset hotspot
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> /Erik
>>>>>>>>
>>>>>>
>>>>
>>
>