RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2]

Mon Jan 8 06:09:24 UTC 2024

On Fri, 5 Jan 2024 09:45:11 GMT, Emanuel Peter <epeter at openjdk.org> wrote:

> You are using `VectorMask<Integer> pred = VectorMask.fromLong(ispecies, maskctr++);`. That basically systematically iterates over all masks, which is nice for a correctness test. But that would use different density inside one test run, right? The average over the loop is still at `50%`, correct?
> 
> I was thinking more a run where the percentage over the whole loop is lower than maybe `1%`. That would get us to a point where maybe the branch prediction of non-vectorized code might be faster, what do you think?

An imperative loop compression will check each mask bit to select compressible lane. Therefore mask with low or high density of set bits should show similar performance.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1444196848