RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes
Sandhya Viswanathan
sviswanathan at openjdk.org
Thu Sep 12 23:17:14 UTC 2024
On Thu, 22 Aug 2024 18:21:50 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:
> API shapes are good!
>
> I see you intrinsified `selectFrom` which, IIUC, optimally generates C2 nodes that are functionally equivalent to the Java expression `v.rearrange(this.toShuffle())`. That way we can better generate an optimal set of instructions?
>
> Do you know what deficiencies there that blocks us from compiling the expression down to the same set of instructions as the intrinsic? Not suggesting we do that here, just for future reference.
Yes, I intrinsified to generate optimial set of instructions. In the expression `v.rearrange(this.toShuffle())` we will do first partial wrap as part of this.toShuffle() and then full wrap as part of rearrange. In the intrinsic I am only doing full wrap. Without intrinsic, if for whatever reason the this.toShuffle() is not moved out of the loop by the JIT, we incur additional overhead of the partial wrap in the hot code path.
I saw this happening when the following is run as part of the jmh instead of being called from standalone java with a loop:
var index = ByteVector.fromArray(bspecies128, shuffles[1], 0);
for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) {
var inpvect = ByteVector.fromArray(bspecies128, byteinp, j);
index.selectFrom(inpvect).intoArray(byteres, j);
}
The perf difference between the intrinsic and no intrinsic observed in this case then is about 20%.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2305521441
More information about the hotspot-compiler-dev
mailing list