RFR (M): 8184334: Generalizing Atomic with templates

Thu Jul 27 14:54:46 UTC 2017

Hi Andrew,

Thank you for having another look at this. I do see appeal with allowing 
platforms to specialize before truncating types. I also recognize that 
your implementation is simpler. So thank you for your efforts helping 
out here.

However... I do think that the simplification comes at a cost. Since we 
are now introducing templates that accept any types, I would like to 
protect the user of the API from doing things that are not a good idea. 
That was one of the key reasons why the IntegerTypes casting mechanisms 
were employed. To safely cast and sanity check the passed in types to 
make sense.

Take for example a user passing in to your cmpxchg method a new_value of 
float, a destination of volatile int*, and an expected value of float. 
This clearly makes no sense, and I would like the Atomic API to not 
allow passing in things that clearly do not make sense. The proposed 
mechanism will explicitly convert the expected and new values to ints, 
which will truncate their values under the hood, resulting in a 
different value being written to what was expected. If the assembly used 
a load link + store conditional pair like e.g. PPC linux does, then the 
passed in truncated expected value will lead to false negatives where 
the CAS fails due to the passed in expected value being seen due to 
being truncated in the explicit conversion, and similarly it will lead 
to false positives where a CAS succeeds despite not having the expected 
value in memory.

Similarly, I believe similar problems may apply when different integer 
types of with different sizes are passed in that would not be implicitly 
convertible, but sure are explicitly convertible (with possibly 
unintentional truncation). You mentioned the case of extending a 32 bit 
integer to a 64 bit integer, which indeed is fine. Unfortunately, the 
reverse is not fine yet possible in this API. For example, if the 
destination is of type volatile jint* and the passed in expected value 
is a jlong, then that is clearly wrong. Yet the API will accept these 
types and explicitly convert the jlong to jint, truncating it in the 
process without warning. This is the kind of thing that the casting 
mechanism me and Kim worked on prevents you from doing accidentally.

So while my solution truncated type information that this solution does 
not, it seems like this solution would instead silently truncate values 
passed in where my solution would not, making it less safe to use. Only 
at runtime would you (if lucky) find out that the types used were not 
compatible, and silently messed up your data.

All in all, I do like the idea of allowing platforms that do not want to 
rely on -fno-strict-aliasing to keep as much type information as 
possible, and give it to the compiler. Having said that, I do think the 
API should protect users from messing up instead of silently accepting 
not compliant types and truncating their values.

How would you feel about a mechanism where we allow each platform to 
specialize on the top level with all type information available, but by 
default send everything in to the Atomic implementation I have proposed 
that canonicalizes and sanity checks all values. Then we have solved a 
bunch of existing problems, not introduced any new problems, yet are 
future proof for allowing arbitrary specialization on each platform to 
allow further improving the solution on platforms that have cooperative 
compilers that allow such improvements, on a platform by platform basis.

What do you think?

Thanks,
/Erik

On 2017-07-26 15:22, Andrew Haley wrote:
> I've been sketching out an alternative way to handle this.  I've got
> my flak jacket ready, so here goes...
>
> My goals are:
>
>    Simplicity; with this comes, hopefully, clarity.  Within reason, we
>    should minimize the extra code this brings in.
>
>    Type safety where possible.  I'm aware that we compile HotSpot with
>    -fno-strict-aliasing or its equivalent, but I think we can do this
>    job without relying on that.  The more information we expose to the
>    compiler, the better code it can generate.
>
>    Something that plays nicely with compiler builtins, where
>    appropriate.  It must also be able to handle legacy
>    assembly-language definitions.
>
> [ Correctness, hopefully, goes without saying. ]
>
> I've done all the work on cmpxchg, because that is the most complex of
> the operations: I believe that if cmpxchg can work well, the rest will
> be relatively simple.
>
> ----------------------------------------------------------------------
>
> We should expose the type information of the original operands all the
> way to the back ends, and not cast pointer types if we can possibly
> avoid it.  So, in this design every back end defines a template function
> to match this declaration:
>
>    template <typename T, typename U, typename V>
>    inline static U cmpxchg(T exchange_value, volatile U* dest, V compare_value,
>                            cmpxchg_memory_order order = memory_order_conservative);
>
> [ Note: the types of the operand, the compare value, and the exchange
> value are distinct.  This is a design decision I took from Erik's
> patch, but I don't like it: it seems to me that we should insist on
> the same types for the operands.  This requires a few changes in
> HotSpot, but arguably these are bugs anyway.  I tried this, and had to
> fix eight places, all trivial.  We should do it as a cleanup. ]
>
> First, the simplest possible case.  In some modern GCC targets (e.g
> Aarch64) we can trivially define our generic cmpxchg using a
> compiler builtin, like so:
>
> template <typename T, typename U, typename V>
> inline U Atomic::cmpxchg(T exchange_value, volatile U* dest, V compare_value,
>                           cmpxchg_memory_order order) {
>    return __sync_val_compare_and_swap(dest, compare_value, exchange_value);
> }
>
> That's everything we need, although I'd do some optimization for
> the memory_order.
>
> With older targets and those where we need to support assembly
> language, we'll have to use some template magic to select an
> appropriate assembly fragment:
>
> template <typename T, typename U, typename V>
> inline U Atomic::cmpxchg(T exchange_value, volatile U* dest, V compare_value,
>                           cmpxchg_memory_order order) {
>    return CmpxchgHelper<T, U, V, sizeof (U)>()(exchange_value, dest,
>                                                compare_value, order);
> }
>
> Here's one such fragment, for the 64-bit case:
>
> template <typename T, typename U, typename V>
> struct Atomic::CmpxchgHelper<T, U, V, 8> {
>    U operator()(T exchange_value, volatile U* dest, V compare_value,
>                 cmpxchg_memory_order order) {
>      U result;
>      __asm__ __volatile__ ("lock cmpxchgq %1,(%3)"
>                            : "=a" (result)
>                            : "r" ((U)exchange_value), "a" ((U)compare_value), "r" (dest)
>                            : "cc", "memory");
>      return result;
>    }
> };
>
> The other cases are very similar, and I won't describe them here.
>
> Note that we're using plain casts to convert the compare and exchange
> values: this allows trivial cases such as extending a plain integer
> constant to a 64-bit type.
>
> [ Note: The new Helper Classes are necessary because the antique
> version of C++ we use to build HotSpot doesn't permit the partial
> specialization of templates.  This is merely annoying syntax and
> doesn't affect the generated code. ]
>
> Where there are back-end functions that cannot be genericized in this
> way (perhaps because they always take a void * or an intptr_t *) we
> can do whatever casting is necessary in the back-end fragments.  There
> doesn't need to be any casting in the shared code.
>
> This proposal does not handle floating-point operands because none are
> currently needed, but I believe that the correct way to do it involves
> explicit specializations for the types double and float.  These can be
> in the shared code in atomic.h if we ever need them.
>
> [ Note: Supporting floating-point operands is problematic because it's
> hard to get the types right.  If T, U, and V can all be different
> types then we need to know exactly what is meant by cmpxchg() where
> the object in memory is an integer type and the compare or exchange
> values are of floating-point type.  IMO, we should reject nonsense
> like that at compile time.  Forcing T, U, and V to be the same type
> allows us to do that. ]
>
> ----------------------------------------------------------------------
>
> This solution doesn't require any new helper classes.  It does,
> however, require changes to the back ends which convert methods to
> helper classes.  This is additional work now, but the result IMO will
> be simpler, easier to maintain, and more future proof.
>
> Do you know of any operating systems, compilers, or targets where this
> won't work?  Or of any particular difficulty doing this?  I know we'll
> have to get buy-in from the back end maintainers, but right now in the
> JDK 10 release cycle is the time to get things like this done.
>