Value array element size represented as log2
David Simms
david.simms at oracle.com
Tue Oct 2 13:11:47 UTC 2018
Adds a few more instructions in decode, but might well be efficient
enough, and the memory savings aren't to be ignored, worth benchmarking,
filed a RFE:
https://bugs.openjdk.java.net/browse/JDK-8211389
Cheers
/D
On 02/10/18 11:24, John Rose wrote:
> 20 bytes would be represented as 5*4, where the 5 comes from the
> 1/3/5/7 selector,
> and 4 is the power of two.
>
> All odd sizes up to 7, all even sizes up to 16, all multiples of four
> up to 32 bytes,
> and all multiples of eight up to 64 bytes are representable in this
> scheme.
>
> The first several byte sizes not exactly representable are: 9(10),
> 11(12), 13(14), 15(16),
> 17,18,19(20), 21,22,23(24), 25,26,27(28), 29,30,31,(32). The
> parenthesized numbers
> are the next higher representable sizes. As you can see the error is
> kept small.
>
> The layout helper would have to be tweaked to allow the 1/3/5/7
> selector, and users
> of the layout helper which process value arrays would have to take it
> into account.
>
> A good encoding for the selector would be:
>
> enum sel1357 { sel1=0, sel3=1, sel5=2, sel7=3 };
>
> sel1357 encode_sel(int sel) {
> sel1357 sel_code = (sel1357)((sel - 1) >> 1);
> assert(sel == 1 + (sel_code << 1));
> return sel_code;
> }
> int decode_sel(sel1357 sel_code) {
> int sel = 1 + ((int)sel_code << 1);
> }
>
> The logic for choosing the best selector and shift could be:
>
> int decode_shift_and_sel(int size, sel1357& return_sel) {
> int best_err = size, best_sel = 0, best_shift = 0;
> for (int sel = 1; sel <= 7; sel += 2) {
> int shift = upper_log2((size + sel - 1) / sel);
> int err = (1 << shift) * sel;
> if (err < best_err) { best_err = err; best_sel = sel; best_shift =
> shift; }
> }
> return_sel = encode_sel(best_sel);
> return best_shift;
> }
>
> (There are surely bugs in this, but you get the idea. A dead-reckoning
> algorithm could do the job faster, without the loop, but it would be
> harder
> to read.)
>
>>
>> A long time ago we discussed that mul was too expensive compared to
>> shift, so that encoding is intentional (addressing uses shift)
>
> No mul is needed, ever. In most cases the JIT can see the array type and
> emit suitable shift/add sequences to produce 3 (1+2), 5 (1+4), or 7 (8-1).
>
> Even the interpreter, or polymorphic code, can produce suitable ALU
> operations that are competitive with mul, if only those four multipliers
> are possible.
>
> int mul_by_sel(intptr_t x, sel1357 sel) {
> switch (sel) {
> case sel1: return x;
> case sel3: return x + (x << 1);
> case sel5: return x + (x << 2);
> case sel7: return (x << 3) - x;
> }
> }
>
> Similar points can be made for a one-bit 1/3 selector, with more footprint
> and somewhat simpler decoding.
>
> int mul_by_sel(intptr_t x, sel13 sel) {
> return x + (-(int)sel & (x << 1));
> }
>
> In many cases, the value types will have true power of two size, so that
> the selector will be "sel1" (unit multiplier). If that's common
> enough, then
> a speculation might be in order:
>
> int mul_by_sel(intptr_t x, sel1357 sel) {
> if (sel == sel1) return x;
> int result = (x << (int)sel); // needs fixup for sel7
> if (sel == sel7) x = -x; // fixup
> result += x;
> return result;
> }
>
> int mul_by_sel(intptr_t x, sel13 sel) {
> return sel == sel1 ? x : x + (x << 1);
> }
>
>
More information about the valhalla-dev
mailing list