Value array element size represented as log2

Tue Oct 2 13:11:47 UTC 2018

Adds a few more instructions in decode, but might well be efficient 
enough, and the memory savings aren't to be ignored, worth benchmarking, 
filed a RFE:

https://bugs.openjdk.java.net/browse/JDK-8211389

Cheers
/D

On 02/10/18 11:24, John Rose wrote:
> 20 bytes would be represented as 5*4, where the 5 comes from the 
> 1/3/5/7 selector,
> and 4 is the power of two.
>
> All odd sizes up to 7, all even sizes up to 16, all multiples of four 
> up to 32 bytes,
> and all multiples of eight up to 64 bytes are representable in this 
> scheme.
>
> The first several byte sizes not exactly representable are:  9(10), 
> 11(12), 13(14), 15(16),
> 17,18,19(20), 21,22,23(24), 25,26,27(28), 29,30,31,(32).  The 
> parenthesized numbers
> are the next higher representable sizes.  As you can see the error is 
> kept small.
>
> The layout helper would have to be tweaked to allow the 1/3/5/7 
> selector, and users
> of the layout helper which process value arrays would have to take it 
> into account.
>
> A good encoding for the selector would be:
>
> enum sel1357 { sel1=0, sel3=1, sel5=2, sel7=3 };
>
> sel1357 encode_sel(int sel) {
>   sel1357 sel_code = (sel1357)((sel - 1) >> 1);
>   assert(sel == 1 + (sel_code << 1));
>   return sel_code;
> }
> int decode_sel(sel1357 sel_code) {
>   int sel = 1 + ((int)sel_code << 1);
> }
>
> The logic for choosing the best selector and shift could be:
>
> int decode_shift_and_sel(int size, sel1357& return_sel) {
>   int best_err = size, best_sel = 0, best_shift = 0;
>   for (int sel = 1; sel <= 7; sel += 2) {
>     int shift = upper_log2((size + sel - 1) / sel);
>     int err = (1 << shift) * sel;
>     if (err < best_err) { best_err = err; best_sel = sel; best_shift = 
> shift; }
>   }
>   return_sel = encode_sel(best_sel);
>   return best_shift;
> }
>
> (There are surely bugs in this, but you get the idea.  A dead-reckoning
> algorithm could do the job faster, without the loop, but it would be 
> harder
> to read.)
>
>>
>> A long time ago we discussed that mul was too expensive compared to 
>> shift, so that encoding is intentional (addressing uses shift)
>
> No mul is needed, ever.  In most cases the JIT can see the array type and
> emit suitable shift/add sequences to produce 3 (1+2), 5 (1+4), or 7 (8-1).
>
> Even the interpreter, or polymorphic code, can produce suitable ALU
> operations that are competitive with mul, if only those four multipliers
> are possible.
>
> int mul_by_sel(intptr_t x, sel1357 sel) {
>   switch (sel) {
>   case sel1: return x;
>   case sel3: return x + (x << 1);
>   case sel5: return x + (x << 2);
>   case sel7: return (x << 3) - x;
>   }
> }
>
> Similar points can be made for a one-bit 1/3 selector, with more footprint
> and somewhat simpler decoding.
>
> int mul_by_sel(intptr_t x, sel13 sel) {
>   return x + (-(int)sel & (x << 1));
> }
>
> In many cases, the value types will have true power of two size, so that
> the selector will be "sel1" (unit multiplier).  If that's common 
> enough, then
> a speculation might be in order:
>
> int mul_by_sel(intptr_t x, sel1357 sel) {
>   if (sel == sel1)  return x;
>   int result = (x << (int)sel);  // needs fixup for sel7
>   if (sel == sel7)  x = -x;  // fixup
>   result += x;
>   return result;
> }
>
> int mul_by_sel(intptr_t x, sel13 sel) {
>   return sel == sel1 ? x : x + (x << 1);
> }
>
>