RFR: 8283892: Compress and expand bits

Wed Apr 6 18:43:42 UTC 2022

On Tue, 5 Apr 2022 22:05:19 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

> Add support to compress bits and expand bits of `int` and `long` values, see Hacker's Delight (2nd edition), section 7.4.
> 
> Compressing or expanding bits of an `int` or `long` value can be composed to enable general permutations, and the "sheep and goats" operation (SAG) see Hacker's Delight (2nd edition), section 7.7. SAG can be used to perform a stable binary radix sort.
> 
> The compress and expand functionality maps efficiently to hardware instructions, such as `PEXT` and `PDEP` on x86 hardware. Thus the implementations can be very efficient on supporting hardware. Intrinsification will occur in a separate PR.
> 
> This [paper](https://arxiv.org/pdf/1706.00990.pdf) investigates the beneficial performance impact of the `PDEP` instruction, and by extension the `expand` method, when applied to the implementation of a bit-vector select operation in succinct data structures (for example `select(r)` returns the position of the `r`th 1).
> 
> Testing-wise the approach take is three fold:
> 1. Tests compared against simple implementations that are easy to read and verify against the JDK implementations (which later will also be made intrinsic). To compensate all tests are also run flipping the test methods and the methods under test.
> 2. Tests composed of compress and expand and vice versa.
> 3. Tests with known mask patterns, whose expected values are easily derived from the inputs.

Changes requested by jrose (Reviewer).

src/java.base/share/classes/java/lang/Integer.java line 1781:

> 1779:      * All the upper remaining bits of the compressed value are set
> 1780:      * to zero.
> 1781:      *

Following Alan's comment, I suggest the following examples:

  compress(0x9ABCDEF, 0x0F0F0FF) == 0xACEF
  compress(x, 1<<n) == (x>>n & 1)
  compress(x, -1<<n) == x >>> n
  compress(m, m) == (m==-1||m==0)? m : (1<<bitCount(m))-1
  compress(x, m) == compress(x & m, m)
  compress(expand(x, m), m) == x & compress(m, m)
  (compress(x, m) >>> n) & 1 == /*the bit of x corresponding to the nth set bit in m*/

…In two places.  Similarly, examples for expand:

  expand(0x9ABCDEF, 0x0F0F0FF) == 0xC0D0EF
  expand(x, 1<<n) == (x&1) << n
  expand(x, -1<<n) == x << n
  expand(-1, m) == m
  expand(x, m) == expand(x, m) & m
  expand(compress(x, m), m) == x & m
  expand(1<<n, m) == /*the nth set bit in m, as a mask in place; cf. highest/lowestOneBit*/

(Please double check these examples!)

test/jdk/java/lang/AbstractCompressExpandTest.java line 104:

> 102:     abstract long expectedExpand(long i, long mask);
> 103: 
> 104:     static int SIZE = 1024;

It feels like `SIZE` should be a constructor parameter, to make it easier to run ad hoc stress tests.

test/jdk/java/lang/AbstractCompressExpandTest.java line 145:

> 143: 
> 144:         int i = 0;
> 145:         for (int len = 0; len < 32; len++) {

Lengths should be `len = 1; len <= 32; len++`, generating fewer redundant zero-length masks and a full-length -1 mask.  The pos should be `pos = 0; pos <= 32 - len; pos++`, generating fully left-justified masks of the form `-1<<pos`.

Also, you should generate a zero-mask, just once.  I think initializing `int i = 1` will do that trick.

(Same comment in two places.)

-------------

PR: https://git.openjdk.java.net/jdk/pull/8115