[vectorIntrinsics+mask] RFR: 8273057: [vector] New VectorAPI "SelectiveStore"
Joshua Zhu
jzhu at openjdk.java.net
Fri Aug 27 09:22:08 UTC 2021
Hi,
I want to propose a new VectorAPI "Selective Store/Load" and share my
implementation. Currently Alibaba's internal databases are in the
process of applying VectorAPI and they have requirements on "Selective
Store" for acceleration.
My proposed VectorAPI is declared as below [1]:
int selectiveIntoArray($type$[] a, int offset, VectorMask<$Boxtype$> m);
The active elements (with their respective bit set in mask) are
contiguously stored into the array "a". Assume N is the true count of
mask, the elements starting from a[offset+N] till a[offset+laneCount]
are left unchanged. The return value represents the number of elements
store into the array and "offset + return value" is the new offset of
the next iteration.

This API will be used like the following manner [2]:
tld.conflict_cnt = 0;
for (int i = 0; i < ARRAY_LENGTH; i += INT_PREFERRED_SPECIES.length()) {
IntVector av = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_input1, i);
IntVector bv = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_input2, i);
IntVector cv = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_index, i);
VectorMask<Integer> mask = av.compare(VectorOperators.NE, bv);
tld.conflict_cnt += cv.selectiveIntoArray(tld.conflict_array, tld.conflict_cnt, mask);
}
My patch includes the following changes:
* Selective Store VectorAPI for Long & Int
* Assembler: add x86 instruction "VPCOMPRESSD" and "VPCOMPRESSQ"
* Instruction selection: vselective_store; kmask_truecount (true count of kregister)
* Add node "StoreVectorSelective"
* Add a new parameter "is_selective" in inline_vector_mem_masked_operation()
in order to distinguish Masked version or Selective version
* jtreg cases
* JMH benchmark
TODO parts I will implement:
* Selective Store for other types
* Selective Load
* Some potential optimization. Such as: when mask is allTrue, SelectiveIntoArray() -> IntoArray()
Test:
* Passed VectorAPI jtreg cases.
* Result of JMH benchmark to evaluate API's performance in Alibaba's real scenario.
UseAVX=3; thread number = 8; conflict data percentage: 20% (that means 20% of mask bits are true)
http://cr.openjdk.java.net/~jzhu/8273057/jmh_benchmark_result.pdf
[1] https://github.com/JoshuaZhuwj/panama-vector/commit/69623f7d6a1eae532576359328b96162d8e16837#diff-13cc2d6ec18e487ddae05cda671bdb6bb7ffd42ff7bc51a2e00c8c5e622bd55dR4667
[2] https://github.com/JoshuaZhuwj/panama-vector/commit/69623f7d6a1eae532576359328b96162d8e16837#diff-951d02bd72a931ac34bc85d1d4e656a14f8943e143fc9282b36b9c76c1893c0cR144
[3] failed to inline (intrinsic) by https://github.com/JoshuaZhuwj/panama-vector/blob/922c4ba69219211a423f7e099d7a4c63e600c133/src/hotspot/cpu/x86/x86.ad#L1769
Best Regards,
Joshua
-------------
Commit messages:
- 8273057: [vector] New VectorAPI "SelectiveStore"
Changes: https://git.openjdk.java.net/panama-vector/pull/114/files
Webrev: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=114&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8273057
Stats: 1432 lines in 90 files changed: 1423 ins; 0 del; 9 mod
Patch: https://git.openjdk.java.net/panama-vector/pull/114.diff
Fetch: git fetch https://git.openjdk.java.net/panama-vector pull/114/head:pull/114
PR: https://git.openjdk.java.net/panama-vector/pull/114
More information about the panama-dev
mailing list