RFR: 8186838: Generalize Atomic::inc/dec with templates

Wed Sep 20 07:15:17 UTC 2017

Hi David,

> On 20 Sep 2017, at 02:05, David Holmes <david.holmes at oracle.com> wrote:
> 
> Hi Erik,
> 
> It's a Thumbs Up from me!

Glad to hear it.

> On 18/09/2017 6:43 PM, Erik Österlund wrote:
>> Hi,
>> After some off-list discussions I have made a new version with the following improvements:
>> 1) Added some comments describing the constraints on the types passed in to inc/dec (integral or pointer, and pointers are scaled).
> 
> Looks fine. Though from previous discussions isn't it the case that we never actually do any of those ptr inc/dec operations?

At the moment we do not do it - true, but perhaps tomorrow somebody might, and then I don’t want the user of the API to get surprised.

> 
>> 2) Removed inc_ptr/dec_ptr and all uses of it. None of these actually used pointers, only pointer sized integers. So I thought removing these overloads and the unnecessary confusion caused by them would make it easier to review this change.
> 
> Yes! Thank you. This was extremely confusing and misleading. I really should try and dig into the history of this ... but don't have time :(

:)

> 
>> 3) Renamed the typedef in the body representing the addend to be called I instead of T to be consistent with the convention Kim introduced.
> 
> Ok.

Thanks for the review.

/Erik

> 
> Thanks,
> David
> -----
> 
>> Full webrev:
>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.02/
>> Incremental webrev:
>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.01_02/
>> Thanks,
>> /Erik
>> On 2017-09-05 12:07, Erik Österlund wrote:
>>> Hi David,
>>> 
>>> On 2017-09-04 23:59, David Holmes wrote:
>>>> Hi Erik,
>>>> 
>>>> On 4/09/2017 5:15 PM, Erik Österlund wrote:
>>>>> Hi David,
>>>>> 
>>>>> On 2017-09-04 03:24, David Holmes wrote:
>>>>>> Hi Erik,
>>>>>> 
>>>>>> On 1/09/2017 8:49 PM, Erik Österlund wrote:
>>>>>>> Hi David,
>>>>>>> 
>>>>>>> The shared structure for all operations is the following:
>>>>>>> 
>>>>>>> An Atomic::something call creates a SomethingImpl function object that performs some basic type checking and then forwards the call straight to a PlatformSomething function object. This PlatformSomething object could decide to do anything. But to make life easier, it may inherit from a shared SomethingHelper function object with CRTP that calls back into the PlatformSomething function object to emit inline assembly.
>>>>>> 
>>>>>> Right, but! Lets look at some details.
>>>>>> 
>>>>>> Atomic::add
>>>>>>   AddImpl
>>>>>>     PlatformAdd<size>
>>>>>>       FetchAndAdd
>>>>>>       AddAndFetch
>>>>>>       add_using_helper
>>>>>> 
>>>>>> Atomic::cmpxchg
>>>>>>   CmpxchgImpl
>>>>>>     PlatformCmpxchg<size>
>>>>>>       cmpxchg_using_helper
>>>>>> 
>>>>>> Atomic::inc
>>>>>>   IncImpl
>>>>>>     PlatformInc<size, Adjustment>
>>>>>>       IncUsingConstant
>>>>>> 
>>>>>> Why is it that the simplest operation (inc/dec) has the most complex platform template definition? Why do we need Adjustment? You previously said "Adjustment represents the increment/decrement value as an IntegralConstant - your template friend for passing around a constant with both a specified type and value in templates". But add passes around values and doesn't need this. Further inc/dec don't need to pass anything around anywhere - inc adds 1, dec subtracts 1! This "1" does not need to appear anywhere in the API or get passed across layers - the only place this "1" becomes evident is in the actual platform asm that does the logic of "add 1" or "subtract 1".
>>>>>> 
>>>>>> My understanding from previous discussions is that much of the template machinations was to deal with type management for "dest" and the values being passed around. But here, for inc/dec there are no values being passed so we don't have to make "dest" type-compatible with any value.
>>>>> 
>>>>> Dealing with different types being passed in is one part of the problem - a problem that almost all operations seems to have. But Atomic::add and inc/dec have more problems to deal with.
>>>>> 
>>>>> The Atomic::add operation has two more problems that cmpxchg does not have.
>>>>> 1) It needs to scale pointer arithmetic. So if you have a P* and you add it by 2, then you really add the underlying value by 2 * sizeof(P), and the scaled addend needs to be of the right type - the type of the destination for integral types and ptrdiff_t for pointers. This is similar semantics to ++pointer.
>>>> 
>>>> I'll address this below - but yes I overlooked this aspect.
>>>> 
>>>>> 2) It connects backends with different semantics - either fetch_and_add or add_and_fetch to a common public interface with add_and_fetch semantics.
>>>> 
>>>> Not at all clear why this has to manifest in the upper/middle layers instead of being handled by the actual lowest-layer ??
>>> 
>>> It could have been addressed in the lowest layer indeed. I suppose Kim found it nicer to do that on a higher level while you find it nicer to do it on a lower level. I have no opinion here.
>>> 
>>>> 
>>>>> This is the reason that Atomic::add might appear more complicated than Atomic::cmpxchg. Because Atomic::cmpxchg only had the different type problems to deal with - no pointer arithmetics.
>>>>> 
>>>>> The reason why Atomic::inc/dec looks more complicated than Atomic::add is that it needs to preserve the pointer arithmetic as constants rather than values, because the scaled addend is embedded in the inline assembly as immediate values. Therefore it passes around an IntegralConstant that embeds both the type and size of the addend. And it is not just 1/-1. For integral destinations the constant used is 1/-1 of the type stored at the destination. For pointers the constant is ptrdiff_t with a value representing the size of the element pointed to.
>>>> 
>>>> This is insanely complicated (I think that counts as 'accidental complexity' per Andrew's comment ;-) ). Pointer arithmetic is a basic/fundamental part of C/C++, yet this template stuff has to jump through multiple inverted hoops to do something the language "just does"! All this complexity to manage a conversion addend -> addend * sizeof(*dest) ??
>>> 
>>> Okay.
>>> 
>>>> And the fact that inc/dec are simpler than add, yet result in far more complicated templates because the simpler addend is a constant, is just as unfathomable to me!
>>> 
>>> My latest proposal is to nuke the Atomic::inc/dec specializations and make it call Atomic::add. Any objections on that? It is arguably simpler, and then we can leave the complexity discussion behind.
>>> 
>>>>> Having said that - I am not opposed to simply removing the specializations of inc/dec if we are scared of the complexity of passing this constant to the platform layer. After running a bunch of benchmarks over the weekend, it showed no significant regressions after removal. Now of course that might not tell the full story - it could have missed that some critical operation in the JVM takes longer. But I would be very surprised if that was the case.
>>>> 
>>>> I can imagine we use an "add immediate" form for inc/dec of 1, do we actually use that for other values? I would expect inc_ptr/dec_ptr to always translate to add_ptr, with no special case for when ptr is char* and so we only add/sub 1. ??
>>> 
>>> Yes we currently only inc/sub by 1.
>>> 
>>> Thanks,
>>> /Erik
>>> 
>>>> Thanks,
>>>> David
>>>> 
>>>>> Thanks,
>>>>> /Erik
>>>>> 
>>>>>> 
>>>>>> Cheers,
>>>>>> David
>>>>>> -----
>>>>>> 
>>>>>>> Hope this explanation helps understanding the intended structure of this work.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> /Erik
>>>>>>> 
>>>>>>> On 2017-09-01 12:34, David Holmes wrote:
>>>>>>>> Hi Erik,
>>>>>>>> 
>>>>>>>> I just wanted to add that I would expect the cmpxchg, add and inc, Atomic API's to all require similar basic structure for manipulating types/values etc, yet all three seem to have quite different structures that I find very confusing. I'm still at a loss to fathom the CRTP and the hoops we seemingly have to jump through just to add or subtract 1!!!
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> David
>>>>>>>> 
>>>>>>>> On 1/09/2017 7:29 PM, Erik Österlund wrote:
>>>>>>>>> Hi David,
>>>>>>>>> 
>>>>>>>>> On 2017-09-01 02:49, David Holmes wrote:
>>>>>>>>>> Hi Erik,
>>>>>>>>>> 
>>>>>>>>>> Sorry but this one is really losing me.
>>>>>>>>>> 
>>>>>>>>>> What is the role of Adjustment ??
>>>>>>>>> 
>>>>>>>>> Adjustment represents the increment/decrement value as an IntegralConstant - your template friend for passing around a constant with both a specified type and value in templates. The type of the increment/decrement is the type of the destination when the destination is an integral type, otherwise if it is a pointer type, the increment/decrement type is ptrdiff_t.
>>>>>>>>> 
>>>>>>>>>> How are inc/dec anything but "using constant" ??
>>>>>>>>> 
>>>>>>>>> I was also a bit torn on that name (I assume you are referring to IncUsingConstant/DecUsingConstant). It was hard to find a name that depicted what this platform helper does. I considered calling the helper something with immediate in the name because it is really used to embed the constant as immediate values in inline assembly today. But then again that seemed too specific, as it is not completely obvious platform specializations will use it in that way. One might just want to specialize this to send it into some compiler Atomic::inc intrinsic for example. Do you have any other preferred names? Here are a few possible names for IncUsingConstant:
>>>>>>>>> 
>>>>>>>>> IncUsingScaledConstant
>>>>>>>>> IncUsingAdjustedConstant
>>>>>>>>> IncUsingPlatformHelper
>>>>>>>>> 
>>>>>>>>> Any favourites?
>>>>>>>>> 
>>>>>>>>>> Why do we special case jshort??
>>>>>>>>> 
>>>>>>>>> To be consistent with the special case of Atomic::add on jshort. Do you want it removed?
>>>>>>>>> 
>>>>>>>>>> This is indecipherable to normal people ;-)
>>>>>>>>>> 
>>>>>>>>>>  This()->template inc<I, Adjustment>(dest);
>>>>>>>>>> 
>>>>>>>>>> For something as trivial as adding or subtracting 1 the template machinations here are just mind boggling!
>>>>>>>>> 
>>>>>>>>> This uses the CRTP (Curiously Recurring Template Pattern) C++ idiom. The idea is to devirtualize a virtual call by passing in the derived type as a template parameter to a base class, and then let the base class static_cast to the derived class to devirtualize the call. I hope this explanation sheds some light on what is going on. The same CRTP idiom was used in the Atomic::add implementation in a similar fashion.
>>>>>>>>> 
>>>>>>>>> I will add some comments describing this in the next round after Coleen replies.
>>>>>>>>> 
>>>>>>>>> Thanks for looking at this.
>>>>>>>>> 
>>>>>>>>> /Erik
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> David
>>>>>>>>>> 
>>>>>>>>>> On 31/08/2017 10:45 PM, Erik Österlund wrote:
>>>>>>>>>>> Hi everyone,
>>>>>>>>>>> 
>>>>>>>>>>> Bug ID:
>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838
>>>>>>>>>>> 
>>>>>>>>>>> Webrev:
>>>>>>>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/
>>>>>>>>>>> 
>>>>>>>>>>> The time has come for the next step in generalizing Atomic with templates. Today I will focus on Atomic::inc/dec.
>>>>>>>>>>> 
>>>>>>>>>>> I have tried to mimic the new Kim style that seems to have been universally accepted. Like Atomic::add and Atomic::cmpxchg, the structure looks like this:
>>>>>>>>>>> 
>>>>>>>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function object that performs some basic type checks.
>>>>>>>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can define the operation arbitrarily for a given platform. The default implementation if not specialized for a platform is to call Atomic::add. So only platforms that want to do something different than that as an optimization have to provide a specialization.
>>>>>>>>>>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec to be more optimized may inherit from a helper class IncUsingConstant/DecUsingConstant. This helper helps performing the necessary computation what the increment/decrement should be after pointer scaling using CRTP. The PlatformInc/PlatformDec operation then only needs to define an inc/dec member function, and will then get all the context information necessary to generate a more optimized implementation. Easy peasy.
>>>>>>>>>>> 
>>>>>>>>>>> It is worth noticing that the generalized Atomic::dec operation assumes a two's complement integer machine and potentially sends the unary negative of a potentially unsigned type to Atomic::add. I have the following comments about this:
>>>>>>>>>>> 1) We already assume in other code that two's complement integers must be present.
>>>>>>>>>>> 2) A machine that does not have two's complement integers may still simply provide a specialization that solves the problem in a different way.
>>>>>>>>>>> 3) The alternative that does not make assumptions about that would use the good old IntegerTypes::cast_to_signed metaprogramming stuff, and I seem to recall we thought that was a bit too involved and complicated.
>>>>>>>>>>> This is the reason why I have chosen to use unary minus on the potentially unsigned type in the shared helper code that sends the decrement as an addend to Atomic::add.
>>>>>>>>>>> 
>>>>>>>>>>> It would also be nice if somebody with access to PPC and s390 machines could try out the relevant changes there so I do not accidentally break those platforms. I have blind-coded the addition of the immediate values passed in to the inline assembly in a way that I think looks like it should work.
>>>>>>>>>>> 
>>>>>>>>>>> Testing:
>>>>>>>>>>> RBT hs-tier3, JPRT --testset hotspot
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> /Erik
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>>