[vectorIntrinsics+mask] RFR: 8273057: [vector] New VectorAPI "SelectiveStore"

Paul Sandoz paul.sandoz at oracle.com
Fri Aug 27 19:20:54 UTC 2021


Hi Joshua,

Thank you for exploring this area. I am impressed at the level of knowledge you have of HotSpot.

Instead of immediately diving into code I would first prefer to discuss the design to determine if this is the best way to support your use-case. I would like to explore if there are underlying primitives from which we can compose to support this use-case.

If possible I would like to leverage the existing mask load/store primitives we have, and if necessary make some changes, rather than add more.

We already have general mask accepting scatter/gather store/load. (I have always been a bit uncertain whether we have the right signatures for these methods, and whether they are necessary if we can use shuffles.)

To use the scatter store method today for your use-case we would have to:

- compute an int[] array from the set lanes of the mask, M say
- computing the “prefix" mask from the set lanes of M, PM
- store the vector using the int[] array and PM.

Another alternative is to:

- compute a “compression” shuffle, S, from the set lanes of the mask, M
- apply S to the vector, produce a compressed vector CV
- computing the “prefix" mask from the set lanes of M, PM
- store CV using M

In either case the loop index value is increased by the true count of M.


The primitive I am searching for might be a way to create a shuffle from a mask.

Let’s say we could write:

  Int[] a = ...
  IntVector v = ...	
  VectorMask m = …

  // The new primitive, create a shuffle from the mask that partitions vector elements 
  // according to the set and unset mask lane elements.
  VectorShuffle s = m.toPartitioningShuffle();
  // Partition the elements
  IntVector cv = v.rearrange(s);

  // This method is likely not optimal, yet!
  // Another method could be added that is prefix(int length)
  VectorMask pm = m.species().indexInRange(0, m.trueCount());

  // Use existing masked store
  cv.intoArray(a, offset, pm);
  // Increase offset by number of stores 
  offset +=  m.trueCount();

Is it possible for C2 to detect the kind of shuffle pattern and masking to sufficiently optimize? Experts please chime in!

I think this is worth exploring further, the more we can optimize the primitives, and then potentially optimize patterns of those, the more flexible we are and can avoid adding more specific functionality.

Paul.

> On Aug 27, 2021, at 2:54 AM, Joshua Zhu <jzhu at openjdk.java.net> wrote:
> 
> Hi,
> 
> I want to propose a new VectorAPI "Selective Store/Load" and share my
> implementation. Currently Alibaba's internal databases are in the
> process of applying VectorAPI and they have requirements on "Selective
> Store" for acceleration.
> 
> My proposed VectorAPI is declared as below [1]:
> 
>    int selectiveIntoArray($type$[] a, int offset, VectorMask<$Boxtype$> m);
> 
> The active elements (with their respective bit set in mask) are
> contiguously stored into the array "a". Assume N is the true count of
> mask, the elements starting from a[offset+N] till a[offset+laneCount]
> are left unchanged. The return value represents the number of elements
> store into the array and "offset + return value" is the new offset of
> the next iteration.
> ![image](https://user-images.githubusercontent.com/70769035/131108509-3dcb61f3-e8d0-4b4e-9b49-a72c077aaba6.png)
> This API will be used like the following manner [2]:
> 
>    tld.conflict_cnt = 0;
>    for (int i = 0; i < ARRAY_LENGTH; i += INT_PREFERRED_SPECIES.length()) {
>      IntVector av = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_input1, i);
>      IntVector bv = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_input2, i);
>      IntVector cv = IntVector.fromArray(INT_PREFERRED_SPECIES, tld.int_index, i);
>      VectorMask<Integer> mask = av.compare(VectorOperators.NE, bv);
>      tld.conflict_cnt += cv.selectiveIntoArray(tld.conflict_array, tld.conflict_cnt, mask);
>    }
> 
> My patch includes the following changes:
>   * Selective Store VectorAPI for Long & Int
>   * Assembler: add x86 instruction "VPCOMPRESSD" and "VPCOMPRESSQ"
>   * Instruction selection: vselective_store; kmask_truecount (true count of kregister)
>   * Add node "StoreVectorSelective"
>   * Add a new parameter "is_selective" in inline_vector_mem_masked_operation()
>     in order to distinguish Masked version or Selective version
>   * jtreg cases
>   * JMH benchmark
> 
> TODO parts I will implement:
>   * Selective Store for other types
>   * Selective Load
>   * Some potential optimization. Such as: when mask is allTrue, SelectiveIntoArray() -> IntoArray()
> 
> Test:
>   * Passed VectorAPI jtreg cases.
>   * Result of JMH benchmark to evaluate API's performance in Alibaba's real scenario.
>       UseAVX=3; thread number = 8; conflict data percentage: 20% (that means 20% of mask bits are true)
>       http://cr.openjdk.java.net/~jzhu/8273057/jmh_benchmark_result.pdf
> 
> [1] https://github.com/JoshuaZhuwj/panama-vector/commit/69623f7d6a1eae532576359328b96162d8e16837#diff-13cc2d6ec18e487ddae05cda671bdb6bb7ffd42ff7bc51a2e00c8c5e622bd55dR4667
> [2] https://github.com/JoshuaZhuwj/panama-vector/commit/69623f7d6a1eae532576359328b96162d8e16837#diff-951d02bd72a931ac34bc85d1d4e656a14f8943e143fc9282b36b9c76c1893c0cR144
> [3] failed to inline (intrinsic) by https://github.com/openjdk/panama-vector/blob/60aa8ca6dc0b3f1a3ee517db167f9660012858cd/src/hotspot/cpu/x86/x86.ad#L1769
> 
> Best Regards,
> Joshua
> 
> -------------
> 
> Commit messages:
> - 8273057: [vector] New VectorAPI "SelectiveStore"
> 
> Changes: https://git.openjdk.java.net/panama-vector/pull/115/files
> Webrev: https://webrevs.openjdk.java.net/?repo=panama-vector&pr=115&range=00
>  Issue: https://bugs.openjdk.java.net/browse/JDK-8273057
>  Stats: 1432 lines in 90 files changed: 1423 ins; 0 del; 9 mod
>  Patch: https://git.openjdk.java.net/panama-vector/pull/115.diff
>  Fetch: git fetch https://git.openjdk.java.net/panama-vector pull/115/head:pull/115
> 
> PR: https://git.openjdk.java.net/panama-vector/pull/115



More information about the panama-dev mailing list