[vectorIntrinsics+compress] RFR: 8276083: Incremental patch to further optimize new compress/expand APIs over X86
Paul Sandoz
psandoz at openjdk.java.net
Thu Oct 28 16:23:30 UTC 2021
On Thu, 28 Oct 2021 16:13:49 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-VectorBits.java.template line 936:
>>
>>> 934: @ForceInline
>>> 935: public $masktype$ compress() {
>>> 936: if (VLENGTH < 4) {
>>
>> This impacts the code of *every* vector and across *every* architecture. I realize its dead code for many cases, but we need to find a better way to express and manage this. IMHO this is not a maintainable solution.
>>
>> The fallback would be an obvious place for this logic.
>
> We can push this into fall back path but in that case C2 will not be able to inline the fall back logic during failed lazy intrinsification attempt. I collected perf data with and without this and could clearly see benefit (around 1.5x) of keeping this outside the fall back. Given that conditions are guarded by constant expressions so only one of the path will be jit'ed.
>
> I think we do support 2 byte vector loads for AARCH64 but currently minimum vector size over X86 is 4 bytes.
> Agree with you, given this is an optimization on a slow path so its ok to compromise some gain to keep consistency across architectures.
Yes, i think that is reasonable for now. We might be able to revisit later. Generally improving the fallback path is an area we have yet to focus on.
I think the most important aspects to focus on at the moment are the common cases where I am presuming the vector length is likely to be >= 4.
-------------
PR: https://git.openjdk.java.net/panama-vector/pull/157
More information about the panama-dev
mailing list