[vector] Questions on addSaturate and subSaturate

Wed Mar 28 22:32:34 UTC 2018

Hi Razvan,

I have not paid much attention to the implementations of the saturated addition and subtraction. After more than a cursory glance i don’t think they are correct. They don’t make sense on FP types.

For byte it should be:

  byte a = …
  byte b = ...
  byte r = (int) Math.max(Math.min(a + b, Byte.MAX_VALUE), Byte.MIN_VALUE);

It’s a little more complicated for int or long since under/overflow needs to be managed e.g. if inputs of are fo the same sign but the result is of the opposite sign then clamp to the bound in the direction of the inputs' sign. (int values could be promoted to long values to avoid overflow.) 

I don’t know how important they are although i would note the VNNI instructions also have saturated fused multiple and accumulate versions.

Paul.

> On Mar 28, 2018, at 12:47 PM, Lupusoru, Razvan A <razvan.a.lupusoru at intel.com> wrote:
> 
> Hi everyone,
> 
> I am looking into implementation of addSaturate and subSaturate and I am trying to understand intended semantics.
> 
> From my expectation just from the method name on integral types, it seems that in case of overflow, it will max out at largest value. And in case of underflow, it will not wrap around past min value.
> 
> However, I have a couple of questions based on current implementation.
> 
> *         Should these methods be defined for FP types? If yes, what would be intended semantics for FP?
> 
> *         Should these methods take care of both overflow and underflow? Currently it seems that they only take care of overflow in implementation.
> 
> *         For subword types, is the saturation the appropriate max/min values for that type? (Currently Integer.MAX_VALUE is used)
> 
> Any advice on intended semantics and whether these are valid uses cases for API would be valuable.
> 
> Thanks,
> Razvan
> 
>