[vectorIntrinsics+mask] RFR: 8273057: [vector] New VectorAPI "SelectiveStore"

Joshua Zhu jzhu at openjdk.java.net
Fri Aug 27 09:22:08 UTC 2021


Hi,

I want to propose a new VectorAPI "Selective Store/Load" and share my
implementation. Currently Alibaba's internal databases are in the
process of applying VectorAPI and they have requirements on "Selective
Store" for acceleration.

My proposed VectorAPI is declared as below [1]:
    int selectiveIntoArray($type$[] a, int offset, VectorMask<$Boxtype$> m);

The active elements (with their respective bit set in mask) are
contiguously stored into the array "a". Assume N is the true count of
mask, the elements starting from a[offset+N] till a[offset+laneCount]
are left unchanged. The return value represents the number of elements
store into the array and "offset + return value" is the new offset of
the next iteration.

![image](https://user-images.githubusercontent.com/70769035/131103779-a9958779-6ac4-4471-9dba-106da41797a1.png)

This API will be used like the following manner [2]:
    tld.conflict_cnt = 0;
    for (int i = 0; i < ARRAY_LENGTH; i += INT_PREFERRED_SPECIES.length()) {
      IntVector av = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_input1, i);
      IntVector bv = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_input2, i);
      IntVector cv = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_index, i);
      VectorMask<Integer> mask = av.compare(VectorOperators.NE, bv);
      tld.conflict_cnt += cv.selectiveIntoArray(tld.conflict_array, tld.conflict_cnt, mask);
    }

My patch includes the following changes:
    * Selective Store VectorAPI for Long & Int
    * Assembler: add x86 instruction "VPCOMPRESSD" and "VPCOMPRESSQ"
    * Instruction selection: vselective_store; kmask_truecount (true count of kregister)
    * Add node "StoreVectorSelective"
    * Add a new parameter "is_selective" in inline_vector_mem_masked_operation()
      in order to distinguish Masked version or Selective version
    * jtreg cases
    * JMH benchmark
      
TODO parts I will implement:
    * Selective Store for other types
    * Selective Load
    * Some potential optimization. Such as: when mask is allTrue, SelectiveIntoArray() -> IntoArray()

Test:
    * Passed VectorAPI jtreg cases.
    * Result of JMH benchmark to evaluate API's performance in Alibaba's real scenario.
        UseAVX=3; thread number = 8; conflict data percentage: 20% (that means 20% of mask bits are true)
        http://cr.openjdk.java.net/~jzhu/8273057/jmh_benchmark_result.pdf

[1] https://github.com/JoshuaZhuwj/panama-vector/commit/69623f7d6a1eae532576359328b96162d8e16837#diff-13cc2d6ec18e487ddae05cda671bdb6bb7ffd42ff7bc51a2e00c8c5e622bd55dR4667
[2] https://github.com/JoshuaZhuwj/panama-vector/commit/69623f7d6a1eae532576359328b96162d8e16837#diff-951d02bd72a931ac34bc85d1d4e656a14f8943e143fc9282b36b9c76c1893c0cR144
[3] failed to inline (intrinsic) by https://github.com/JoshuaZhuwj/panama-vector/blob/922c4ba69219211a423f7e099d7a4c63e600c133/src/hotspot/cpu/x86/x86.ad#L1769

Best Regards,
Joshua

-------------

Commit messages:
 - 8273057: [vector] New VectorAPI "SelectiveStore"

Changes: https://git.openjdk.java.net/panama-vector/pull/114/files
 Webrev: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=114&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8273057
  Stats: 1432 lines in 90 files changed: 1423 ins; 0 del; 9 mod
  Patch: https://git.openjdk.java.net/panama-vector/pull/114.diff
  Fetch: git fetch https://git.openjdk.java.net/panama-vector pull/114/head:pull/114

PR: https://git.openjdk.java.net/panama-vector/pull/114


More information about the panama-dev mailing list