VectorAPI: SubAll intrinsics for byte, short, float and double

Tue May 1 11:33:05 UTC 2018

Hi Paul,

>From my understanding, it is possible to reduce subAll() into addAll().neg() in the case of integral types. Horizontal arithmetic is allowed in this case because the order of operation does not matter. In fact, the add reduction we transform subAll into does that. For FP, you are right saying we need to keep the order because of the limited precision. In this case, I am not aware of a data parallel approach we could use to speed up the computation.

Let me know if I missed your point.

Thanks,

Jp

-----Original Message-----
From: Paul Sandoz [mailto:paul.sandoz at oracle.com] 
Sent: Monday, April 30, 2018 4:51 PM
To: Halimi, Jean-Philippe <jean-philippe.halimi at intel.com>
Cc: panama-dev at openjdk.java.net
Subject: Re: VectorAPI: SubAll intrinsics for byte, short, float and double

Hi Jp,

Looks ok. Can we derive subAll from addAll().neg(), the additional negation might be an acceptable cost but i am uncertain of the FP behavior.

IIUC, for reductive addition or subtraction, the accumulated value is kept in first lane of the destination register and the src lane element to subtract is shuffled down for each iteration. In effect it preserves the sequential order, but i wonder if there are faster data parallel approaches if we are relaxed about rounding producing different results?

Thanks,
Paul.

> On Apr 30, 2018, at 10:10 AM, Halimi, Jean-Philippe <jean-philippe.halimi at intel.com> wrote:
> 
> Hi all,
> 
> 
> 
> I would like to share a patch adding support for subAll intrinsic for byte, short, long, float and double types in VectorAPI.
> 
> 
> 
> Could you please review the two following patches?
> 
> http://cr.openjdk.java.net/~jphalimi/webrev_subAll_FP_v1.1/
> 
> http://cr.openjdk.java.net/~jphalimi/webrev_subAll_BS_v1.0/
> 
> 
> 
> Thank you,
> 
> 
> 
> Jp
>