RFR: 8186838: Generalize Atomic::inc/dec with templates
Erik Österlund
erik.osterlund at oracle.com
Tue Sep 5 10:07:26 UTC 2017
Hi David,
On 2017-09-04 23:59, David Holmes wrote:
> Hi Erik,
>
> On 4/09/2017 5:15 PM, Erik Österlund wrote:
>> Hi David,
>>
>> On 2017-09-04 03:24, David Holmes wrote:
>>> Hi Erik,
>>>
>>> On 1/09/2017 8:49 PM, Erik Österlund wrote:
>>>> Hi David,
>>>>
>>>> The shared structure for all operations is the following:
>>>>
>>>> An Atomic::something call creates a SomethingImpl function object
>>>> that performs some basic type checking and then forwards the call
>>>> straight to a PlatformSomething function object. This
>>>> PlatformSomething object could decide to do anything. But to make
>>>> life easier, it may inherit from a shared SomethingHelper function
>>>> object with CRTP that calls back into the PlatformSomething
>>>> function object to emit inline assembly.
>>>
>>> Right, but! Lets look at some details.
>>>
>>> Atomic::add
>>> AddImpl
>>> PlatformAdd<size>
>>> FetchAndAdd
>>> AddAndFetch
>>> add_using_helper
>>>
>>> Atomic::cmpxchg
>>> CmpxchgImpl
>>> PlatformCmpxchg<size>
>>> cmpxchg_using_helper
>>>
>>> Atomic::inc
>>> IncImpl
>>> PlatformInc<size, Adjustment>
>>> IncUsingConstant
>>>
>>> Why is it that the simplest operation (inc/dec) has the most complex
>>> platform template definition? Why do we need Adjustment? You
>>> previously said "Adjustment represents the increment/decrement value
>>> as an IntegralConstant - your template friend for passing around a
>>> constant with both a specified type and value in templates". But add
>>> passes around values and doesn't need this. Further inc/dec don't
>>> need to pass anything around anywhere - inc adds 1, dec subtracts 1!
>>> This "1" does not need to appear anywhere in the API or get passed
>>> across layers - the only place this "1" becomes evident is in the
>>> actual platform asm that does the logic of "add 1" or "subtract 1".
>>>
>>> My understanding from previous discussions is that much of the
>>> template machinations was to deal with type management for "dest"
>>> and the values being passed around. But here, for inc/dec there are
>>> no values being passed so we don't have to make "dest"
>>> type-compatible with any value.
>>
>> Dealing with different types being passed in is one part of the
>> problem - a problem that almost all operations seems to have. But
>> Atomic::add and inc/dec have more problems to deal with.
>>
>> The Atomic::add operation has two more problems that cmpxchg does not
>> have.
>> 1) It needs to scale pointer arithmetic. So if you have a P* and you
>> add it by 2, then you really add the underlying value by 2 *
>> sizeof(P), and the scaled addend needs to be of the right type - the
>> type of the destination for integral types and ptrdiff_t for
>> pointers. This is similar semantics to ++pointer.
>
> I'll address this below - but yes I overlooked this aspect.
>
>> 2) It connects backends with different semantics - either
>> fetch_and_add or add_and_fetch to a common public interface with
>> add_and_fetch semantics.
>
> Not at all clear why this has to manifest in the upper/middle layers
> instead of being handled by the actual lowest-layer ??
It could have been addressed in the lowest layer indeed. I suppose Kim
found it nicer to do that on a higher level while you find it nicer to
do it on a lower level. I have no opinion here.
>
>> This is the reason that Atomic::add might appear more complicated
>> than Atomic::cmpxchg. Because Atomic::cmpxchg only had the different
>> type problems to deal with - no pointer arithmetics.
>>
>> The reason why Atomic::inc/dec looks more complicated than
>> Atomic::add is that it needs to preserve the pointer arithmetic as
>> constants rather than values, because the scaled addend is embedded
>> in the inline assembly as immediate values. Therefore it passes
>> around an IntegralConstant that embeds both the type and size of the
>> addend. And it is not just 1/-1. For integral destinations the
>> constant used is 1/-1 of the type stored at the destination. For
>> pointers the constant is ptrdiff_t with a value representing the size
>> of the element pointed to.
>
> This is insanely complicated (I think that counts as 'accidental
> complexity' per Andrew's comment ;-) ). Pointer arithmetic is a
> basic/fundamental part of C/C++, yet this template stuff has to jump
> through multiple inverted hoops to do something the language "just
> does"! All this complexity to manage a conversion addend -> addend *
> sizeof(*dest) ??
Okay.
> And the fact that inc/dec are simpler than add, yet result in far more
> complicated templates because the simpler addend is a constant, is
> just as unfathomable to me!
My latest proposal is to nuke the Atomic::inc/dec specializations and
make it call Atomic::add. Any objections on that? It is arguably
simpler, and then we can leave the complexity discussion behind.
>> Having said that - I am not opposed to simply removing the
>> specializations of inc/dec if we are scared of the complexity of
>> passing this constant to the platform layer. After running a bunch of
>> benchmarks over the weekend, it showed no significant regressions
>> after removal. Now of course that might not tell the full story - it
>> could have missed that some critical operation in the JVM takes
>> longer. But I would be very surprised if that was the case.
>
> I can imagine we use an "add immediate" form for inc/dec of 1, do we
> actually use that for other values? I would expect inc_ptr/dec_ptr to
> always translate to add_ptr, with no special case for when ptr is
> char* and so we only add/sub 1. ??
Yes we currently only inc/sub by 1.
Thanks,
/Erik
> Thanks,
> David
>
>> Thanks,
>> /Erik
>>
>>>
>>> Cheers,
>>> David
>>> -----
>>>
>>>> Hope this explanation helps understanding the intended structure of
>>>> this work.
>>>>
>>>> Thanks,
>>>> /Erik
>>>>
>>>> On 2017-09-01 12:34, David Holmes wrote:
>>>>> Hi Erik,
>>>>>
>>>>> I just wanted to add that I would expect the cmpxchg, add and inc,
>>>>> Atomic API's to all require similar basic structure for
>>>>> manipulating types/values etc, yet all three seem to have quite
>>>>> different structures that I find very confusing. I'm still at a
>>>>> loss to fathom the CRTP and the hoops we seemingly have to jump
>>>>> through just to add or subtract 1!!!
>>>>>
>>>>> Cheers,
>>>>> David
>>>>>
>>>>> On 1/09/2017 7:29 PM, Erik Österlund wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> On 2017-09-01 02:49, David Holmes wrote:
>>>>>>> Hi Erik,
>>>>>>>
>>>>>>> Sorry but this one is really losing me.
>>>>>>>
>>>>>>> What is the role of Adjustment ??
>>>>>>
>>>>>> Adjustment represents the increment/decrement value as an
>>>>>> IntegralConstant - your template friend for passing around a
>>>>>> constant with both a specified type and value in templates. The
>>>>>> type of the increment/decrement is the type of the destination
>>>>>> when the destination is an integral type, otherwise if it is a
>>>>>> pointer type, the increment/decrement type is ptrdiff_t.
>>>>>>
>>>>>>> How are inc/dec anything but "using constant" ??
>>>>>>
>>>>>> I was also a bit torn on that name (I assume you are referring to
>>>>>> IncUsingConstant/DecUsingConstant). It was hard to find a name
>>>>>> that depicted what this platform helper does. I considered
>>>>>> calling the helper something with immediate in the name because
>>>>>> it is really used to embed the constant as immediate values in
>>>>>> inline assembly today. But then again that seemed too specific,
>>>>>> as it is not completely obvious platform specializations will use
>>>>>> it in that way. One might just want to specialize this to send it
>>>>>> into some compiler Atomic::inc intrinsic for example. Do you have
>>>>>> any other preferred names? Here are a few possible names for
>>>>>> IncUsingConstant:
>>>>>>
>>>>>> IncUsingScaledConstant
>>>>>> IncUsingAdjustedConstant
>>>>>> IncUsingPlatformHelper
>>>>>>
>>>>>> Any favourites?
>>>>>>
>>>>>>> Why do we special case jshort??
>>>>>>
>>>>>> To be consistent with the special case of Atomic::add on jshort.
>>>>>> Do you want it removed?
>>>>>>
>>>>>>> This is indecipherable to normal people ;-)
>>>>>>>
>>>>>>> This()->template inc<I, Adjustment>(dest);
>>>>>>>
>>>>>>> For something as trivial as adding or subtracting 1 the template
>>>>>>> machinations here are just mind boggling!
>>>>>>
>>>>>> This uses the CRTP (Curiously Recurring Template Pattern) C++
>>>>>> idiom. The idea is to devirtualize a virtual call by passing in
>>>>>> the derived type as a template parameter to a base class, and
>>>>>> then let the base class static_cast to the derived class to
>>>>>> devirtualize the call. I hope this explanation sheds some light
>>>>>> on what is going on. The same CRTP idiom was used in the
>>>>>> Atomic::add implementation in a similar fashion.
>>>>>>
>>>>>> I will add some comments describing this in the next round after
>>>>>> Coleen replies.
>>>>>>
>>>>>> Thanks for looking at this.
>>>>>>
>>>>>> /Erik
>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> David
>>>>>>>
>>>>>>> On 31/08/2017 10:45 PM, Erik Österlund wrote:
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> Bug ID:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8186838
>>>>>>>>
>>>>>>>> Webrev:
>>>>>>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/
>>>>>>>>
>>>>>>>> The time has come for the next step in generalizing Atomic with
>>>>>>>> templates. Today I will focus on Atomic::inc/dec.
>>>>>>>>
>>>>>>>> I have tried to mimic the new Kim style that seems to have been
>>>>>>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the
>>>>>>>> structure looks like this:
>>>>>>>>
>>>>>>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function
>>>>>>>> object that performs some basic type checks.
>>>>>>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can
>>>>>>>> define the operation arbitrarily for a given platform. The
>>>>>>>> default implementation if not specialized for a platform is to
>>>>>>>> call Atomic::add. So only platforms that want to do something
>>>>>>>> different than that as an optimization have to provide a
>>>>>>>> specialization.
>>>>>>>> Layer 3) Platforms that decide to specialize
>>>>>>>> PlatformInc/PlatformDec to be more optimized may inherit from a
>>>>>>>> helper class IncUsingConstant/DecUsingConstant. This helper
>>>>>>>> helps performing the necessary computation what the
>>>>>>>> increment/decrement should be after pointer scaling using CRTP.
>>>>>>>> The PlatformInc/PlatformDec operation then only needs to define
>>>>>>>> an inc/dec member function, and will then get all the context
>>>>>>>> information necessary to generate a more optimized
>>>>>>>> implementation. Easy peasy.
>>>>>>>>
>>>>>>>> It is worth noticing that the generalized Atomic::dec operation
>>>>>>>> assumes a two's complement integer machine and potentially
>>>>>>>> sends the unary negative of a potentially unsigned type to
>>>>>>>> Atomic::add. I have the following comments about this:
>>>>>>>> 1) We already assume in other code that two's complement
>>>>>>>> integers must be present.
>>>>>>>> 2) A machine that does not have two's complement integers may
>>>>>>>> still simply provide a specialization that solves the problem
>>>>>>>> in a different way.
>>>>>>>> 3) The alternative that does not make assumptions about that
>>>>>>>> would use the good old IntegerTypes::cast_to_signed
>>>>>>>> metaprogramming stuff, and I seem to recall we thought that was
>>>>>>>> a bit too involved and complicated.
>>>>>>>> This is the reason why I have chosen to use unary minus on the
>>>>>>>> potentially unsigned type in the shared helper code that sends
>>>>>>>> the decrement as an addend to Atomic::add.
>>>>>>>>
>>>>>>>> It would also be nice if somebody with access to PPC and s390
>>>>>>>> machines could try out the relevant changes there so I do not
>>>>>>>> accidentally break those platforms. I have blind-coded the
>>>>>>>> addition of the immediate values passed in to the inline
>>>>>>>> assembly in a way that I think looks like it should work.
>>>>>>>>
>>>>>>>> Testing:
>>>>>>>> RBT hs-tier3, JPRT --testset hotspot
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> /Erik
>>>>>>
>>>>
>>
More information about the hotspot-dev
mailing list