[vectorIntrinsics+mask] RFR: 8273057: [vector] New VectorAPI "SelectiveStore"
John Rose
john.r.rose at oracle.com
Sat Sep 4 18:56:11 UTC 2021
P.S. Some googly references that seem useful for me:
https://en.wikipedia.org/wiki/Prefix_sum
https://www.cs.princeton.edu/courses/archive/fall21/cos326/lec/21-02-parallel-prefix-scan.pdf
https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf (from Connection Machine days, still relevant if you translate terms)
You can find your own easily, of course. I suppose there are plenty of GPU people who have rediscovered this stuff recently. Appel traces the basics back to 1977.
So here’s a basic tool for our toolkit: Watch out for segmented scans and reductions, even in disguise (say, as nested or grouped parallelism). Use them to turn brute-force iteration into log-N data parallel operations. (Will the hardware reward your rewrite of the algorithm? One may hope… Sometimes it does.)
More information about the panama-dev
mailing list