[vectorIntrinsics+mask] RFR: 8273057: [vector] New VectorAPI "SelectiveStore"

John Rose john.r.rose at oracle.com
Thu Sep 2 18:41:18 UTC 2021


On Sep 2, 2021, at 11:33 AM, John Rose <john.r.rose at oracle.com<mailto:john.r.rose at oracle.com>> wrote:

The double-width lanes of A are “packets” to be “routed”
to the “addresses” in the high half of the lane, and the
“payload” of each packet is the eventual index in C.
Butterfly network, aka binary radix sort, will do the job
in log time.

I forgot to point out that some hardware has min/max
operations which can perform the conditional swap
efficiently.  So log(VLENGTH) mix/max swaps, plus
classic butterfly-like shuffles to bring the pairs into
proximity as needed, does the job as well.

Also, these “packets” were, back in the day, literal
bit-serial packets on the Connection Machine, with
up to 16-bit addresses.  The microcode would worm
the data around simultaneously through a hypercube
network, getting the job done in log time.  The
weakness of the scheme is, of course, implementing
the binary N-cube network in 3-space.  Eventually
you run out of places to put the wires.



More information about the panama-dev mailing list