RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]
Emanuel Peter
epeter at openjdk.org
Fri Jan 5 10:05:23 UTC 2024
On Fri, 5 Jan 2024 07:03:34 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5307:
>>
>>> 5305: assert(bt == T_LONG || bt == T_DOUBLE, "");
>>> 5306: vmovmskpd(rtmp, mask, vec_enc);
>>> 5307: shlq(rtmp, 5);
>>
>> Might this need to be 6? If I understand right, then you want to have a 64bit stride, hence 2^6, right?
>> If that is correct, then this did not show in your tests, and you need a regression test anyway.
>
> This computes the byte offset from start of the table, both integer and long permute table have same row sizes, 8 int elements vs 4 long elements.
Ah, I understand now. Maybe leave a comment for that?
>> test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java line 76:
>>
>>> 74: longinCol = new long[size];
>>> 75: longoutCol = new long[size];
>>> 76: lpivot = size / 2;
>>
>> I'd be interested to see what happens if you move up or down the "density" of elements that you accept. Would the simple branch prediction be faster if the density is low enough, i.e. we almost take no element.
>>
>> Though maybe that is not compiler problem but a user-problem?
>
> Included fuzzy filter micro with varying mask density.
> ![image](https://github.com/openjdk/jdk/assets/59989778/a6af21cc-36c0-4503-aeb3-e66b862da2e1)
You are using `VectorMask<Integer> pred = VectorMask.fromLong(ispecies, maskctr++);`.
That basically systematically iterates over all masks, which is nice for a correctness test.
But that would use different density inside one test run, right? The average over the loop is still at `50%`, correct?
I was thinking more a run where the percentage over the whole loop is lower than maybe `1%`. That would get us to a point where maybe the branch prediction of non-vectorized code might be faster, what do you think?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442670411
PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442676633
More information about the hotspot-compiler-dev
mailing list