RFR: 8186838: Generalize Atomic::inc/dec with templates

Fri Sep 1 12:51:55 UTC 2017

On 9/1/17 4:40 AM, Erik Österlund wrote:
> Hi Coleen,
>
> Thank you for taking your time to review this.
>
> On 2017-09-01 02:03, coleen.phillimore at oracle.com wrote:
>>
>> Hi, I'm trying to parse the templates to review this but maybe it's 
>> convention but decoding these with parameters that are single capital 
>> letters make reading the template very difficult.  There are already 
>> a lot of non-alphanumeric characters.   When the letter is T, that is 
>> expected by convention, but D or especially I makes it really hard.   
>> Can these be normalized to all use T when there is only one template 
>> parameter?  It'll be clear that T* is a pointer and T is an integer 
>> without having it be P.
>
> I apologize the names of the template parameters are hard to 
> understand. For what it's worth, I am only consistently applying Kim's 
> conventions here. It seemed like a bad idea to violate conventions 
> already set up - that would arguably be more confusing.
>
> The convention from earlier work by Kim is:
> D: Type of destination
> I: Operand type that has to be an integral type
> P: Operand type that is a pointer element type
> T: Generic operand type, may be integral or pointer type
>
> Personally, I do not mind this convention. It is more specific and 
> annotates things we know about the type into the name of the type.
>
> Do you want me to:
>
> 1) Keep the convention, now that I have explained what the convention 
> is and why it is your friend

It is not my friend.  It's not helpful.   I have to go through multiple 
non-alphabetic characters looking for the letter I or the letter P to 
mentally make the substitution of the template type.

> 2) Break the convention for this change only making the naming 
> inconsistent

Break it for this changeset and we'll fix it later for the earlier work 
from Kim.  I don't remember P and I in Kim's changeset but realized 
while looking at your changeset, this was one thing that makes these 
templates slower and more difficult to read.

In the case of cmpxchg templates with a source, destination and original 
values, it was necessary to have more than T be the template type, 
although unsatisfying, because it turned out that the types couldn't be 
the same.

> 3) Change the convention throughout consistently, including all 
> earlier work from Kim
>
>>
>> +template<typename I>
>> +struct Atomic::IncImpl<I, typename 
>> EnableIf<IsIntegral<I>::value>::type> VALUE_OBJ_CLASS_SPEC {
>> + void operator()(I volatile* dest) const {
>> + typedef IntegralConstant<I, I(1)> Adjustment;
>> + typedef PlatformInc<sizeof(I), Adjustment> PlatformOp;
>> + PlatformOp()(dest);
>> + }
>> +};
>>
>> This one isn't as difficult, because it's short, but it would be 
>> faster to understand with T.
>>
>> +template<typename T>
>> +struct Atomic::IncImpl<T, typename 
>> EnableIf<IsIntegral<I>::value>::type> VALUE_OBJ_CLASS_SPEC {
>> + void operator()(T volatile* dest) const {
>> + typedef IntegralConstant<T, T(1)> Adjustment;
>> + typedef PlatformInc<sizeof(T), Adjustment> PlatformOp;
>> + PlatformOp()(dest);
>> + }
>> +};
>>
>> +template<>
>> +struct Atomic::IncImpl<jshort> VALUE_OBJ_CLASS_SPEC {
>> + void operator()(jshort volatile* dest) const {
>> + add(jshort(1), dest);
>> + }
>> +};
>>
>>
>> Did I already ask if this could be changed to u2 rather than jshort?  
>> Or is that the follow-on RFE?
>
> That is a follow-on RFE.

Good.  I think that's the one that I assigned to myself.
>
>> +// Helper for platforms wanting a constant adjustment.
>> +template<size_t byte_size, typename Adjustment>
>> +struct Atomic::IncUsingConstant VALUE_OBJ_CLASS_SPEC {
>> + typedef PlatformInc<byte_size, Adjustment> Derived;
>>
>>
>> I can't find the caller of this.  Is it really a lot faster than 
>> having the platform independent add(1, T) / add(-1, T) to make all 
>> this code worth having?  How is this called?  I couldn't parse the 
>> trick.  Atomic::inc() is always a "constant adjustment" so I'm 
>> confused about what the comment means and what motivates all the asm 
>> code.   Do these platform implementations exist because they don't 
>> have twos complement for integer representation?  really?
>
> This is used by some x86, PPC and s390 platforms. Personally I 
> question its usefulness for x86. I believe it might be one of those 
> things were we ran some benchmarks a decade ago and concluded that it 
> was slightly faster to have a slimmed path for Atomic::inc rather than 
> reusing Atomic::add.

Yes, there are a lot of optimizations that we slog along in the code 
base because they might have either theoretically or measurably made 
some difference in something we don't have anymore.

>
> I did not initially want to bring this up as it seems like none of my 
> business, but now that the question has been asked about differences, 
> I could not help but notice the advertised "leading sync" convention 
> of Atomic::inc on PPC is not respected. That is, there is no "sync" 
> fence before the atomic increment, as required by the specified 
> semantics. There is not even a leading "lwsync". The corresponding 
> Atomic::add operation though, does have leading lwsync (unlike 
> Atomic::inc). Now this should arguably be reinforced to sync rather 
> than lwsync to respect the advertised semantics of both Atomic::add 
> and Atomic::inc on PPC. Hopefully that statement will not turn into a 
> long unrelated mailing thread...

Could you file an bug with this observation?
>
> Conclusively though, there is definitely a substantial difference in 
> the fencing comparing the PPC implementation of Atomic::inc to 
> Atomic::add. Whether either one of them conforms to intended semantics 
> or not is a different matter - one that I was hoping not to have to 
> deal with in this RFE as I am merely templateifying what was already 
> there, without judging the existing specializations. And it is my 
> observation that as the code looks now, we would incur a bunch of more 
> fencing compared to what the code does today on PPC.
>

Completely understand.   How are these called exactly though?  I 
couldn't figure it out.

>> Also, the function name This() is really disturbing and distracting.  
>> Can it be called some verb() representing what it does?  
>> cast_to_derived()?
>>
>> + template<typename I>
>> + void operator()(I volatile* dest) const {
>> + This()->template inc<I, Adjustment>(dest);
>> + }
>>
>
> Yes, I will change the name accordingly as you suggest.
>
>> I didn't know you could put "template" there. 
>
> It is required to put the template keyword before the member function 
> name when calling a template member function with explicit template 
> parameters (as opposed to implicitly inferred template parameters) on 
> a template type.

I thought you could just stay inc<type,type>() in the call, but my C++ 
template vocabularly is minimal.
>
>> What does this call?
>
> This calls the platform-defined intrinsic that is defined in the 
> platform files - the one that contains the inline assembly.

How?  I don't see how...  :(
>
>> Rather than I for integer case, and P for pointer case, can you add a 
>> one line comment above this like:
>> // Helper for integer types
>> and
>> // Helper for pointer types
>
> Or perhaps we could do both? Nevertheless, I will add these comments. 
> But as per the discussion above, I would be happy if we could keep the 
> convention that Kim has already set up for the template type names.
>
>> Small local comments would be really helpful for many of these 
>> functions.   Just to get more english words in there...  Since Kim's 
>> on vacation can you help me understand this code and add comments so 
>> I remember the reasons for some of this?
>
> Sure - I will decorate the code with some comments to help 
> understanding. I will send an updated webrev when I get your reply 
> regarding the typename naming convention verdict.

That's my opinion anyway.   David might have the opposite opinion.

Thanks,
Coleen

>
> Thanks for the review!
>
> /Erik
>
>>
>> Thanks!
>> Coleen
>>
>>
>> On 8/31/17 8:45 AM, Erik Österlund wrote:
>>> Hi everyone,
>>>
>>> Bug ID:
>>> https://bugs.openjdk.java.net/browse/JDK-8186838
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~eosterlund/8186838/webrev.00/
>>>
>>> The time has come for the next step in generalizing Atomic with 
>>> templates. Today I will focus on Atomic::inc/dec.
>>>
>>> I have tried to mimic the new Kim style that seems to have been 
>>> universally accepted. Like Atomic::add and Atomic::cmpxchg, the 
>>> structure looks like this:
>>>
>>> Layer 1) Atomic::inc/dec calls an IncImpl()/DecImpl() function 
>>> object that performs some basic type checks.
>>> Layer 2) IncImpl/DecImpl calls PlatformInc/PlatformDec that can 
>>> define the operation arbitrarily for a given platform. The default 
>>> implementation if not specialized for a platform is to call 
>>> Atomic::add. So only platforms that want to do something different 
>>> than that as an optimization have to provide a specialization.
>>> Layer 3) Platforms that decide to specialize PlatformInc/PlatformDec 
>>> to be more optimized may inherit from a helper class 
>>> IncUsingConstant/DecUsingConstant. This helper helps performing the 
>>> necessary computation what the increment/decrement should be after 
>>> pointer scaling using CRTP. The PlatformInc/PlatformDec operation 
>>> then only needs to define an inc/dec member function, and will then 
>>> get all the context information necessary to generate a more 
>>> optimized implementation. Easy peasy.
>>>
>>> It is worth noticing that the generalized Atomic::dec operation 
>>> assumes a two's complement integer machine and potentially sends the 
>>> unary negative of a potentially unsigned type to Atomic::add. I have 
>>> the following comments about this:
>>> 1) We already assume in other code that two's complement integers 
>>> must be present.
>>> 2) A machine that does not have two's complement integers may still 
>>> simply provide a specialization that solves the problem in a 
>>> different way.
>>> 3) The alternative that does not make assumptions about that would 
>>> use the good old IntegerTypes::cast_to_signed metaprogramming stuff, 
>>> and I seem to recall we thought that was a bit too involved and 
>>> complicated.
>>> This is the reason why I have chosen to use unary minus on the 
>>> potentially unsigned type in the shared helper code that sends the 
>>> decrement as an addend to Atomic::add.
>>>
>>> It would also be nice if somebody with access to PPC and s390 
>>> machines could try out the relevant changes there so I do not 
>>> accidentally break those platforms. I have blind-coded the addition 
>>> of the immediate values passed in to the inline assembly in a way 
>>> that I think looks like it should work.
>>>
>>> Testing:
>>> RBT hs-tier3, JPRT --testset hotspot
>>>
>>> Thanks,
>>> /Erik
>>
>