[vector] CharVector and unsigned bytes
John Rose
john.r.rose at oracle.com
Sat Dec 7 22:23:18 UTC 2019
On Dec 7, 2019, at 1:44 AM, Richard Startin <richard at openkappa.co.uk> wrote:
>
> "What specifically are you missing
> in order to use ShortVector as a substitute for CharVector?"
>
> Comparisons (vectorised) and eventual use as array indexes (after extraction from a vector), the motivation is unsigned 16 bit values in trie data structures, rather than a specific need for Java chars.
Thank you Richard; this is good information.
> It's possible to emulate these operations straightforwardly with shorts by masking with 0xFFFF and using a custom comparator instead of the < operator everywhere, but in my experience this is both error prone and harder to read than using comparison operators.
Agreed. Making a one-off operation (unsigned instead of signed) 10x harder to read and optimize certainly takes it off the table, for all but the most determined programmers.
(This is one motivation for me to have moved much of the API from bespoke methods into VectorOperations constants: We can more easily adjust the line between supported and unsupported operations, by adding or deleting constants rather than suites of methods or new vector types.)
> It's error prone because the masks/comparators inevitably don't get used everywhere, in scalar code it is too easy for short to promote to int and sign extend without masking, without lots of tests for data at the 0x7FFF/0x8000 boundary and perfect programmers/reviewers this results in bugs. There may be more friction without automatic promotion, of course.
In the V-API promotion is never automatic and there is (I think) an adequate set of conversions to handle unsigned as well as signed promotions, as well as other forms of lane expansions and contractions.
Regarding comparisons: I think you have provided evidence that we should just add unsigned comparison operations (LT_UNSIGNED, etc.) to VectorOperations.
For table lookup (array indexes) I think it’s the case that today’s vector hardware prefers 32- and 64-bit array indexes, so there’s got to be a promotion from short to int somewhere. With the above-mentioned operations (already in the API) you can promote the lanes yourself as you like, with zero fill.
Pulling a little more on this string, I suppose that if you have a vector of sub-word types (or float types), and you wish to use it as indexes, it would be nice if you could specify the lane-conversion as a parameter, and allow the API to internally handle the message break-up of scatter and gather operations into expansions and contractions, to get the desired outcome of an apparent memory access to a lane other than 32- or 64-bit integers. This is just one of several “FIXME” items in the design of scatter/gather methods in today’s API. I expect this will improve as we gain insight into use cases.
— John
More information about the panama-dev
mailing list