Bounds Check Elimination with Fast-Range

Thu Dec 19 14:38:49 UTC 2019

One thing I’ve come to realize is that the bit-mixing step in fast-range does not require a high quality hash function. It just needs to distribute the bits so that the probability of falling into the buckets [0, 2^32), [2^32, 2 * 2^32), [2*2^32, 3*2^32), …, is even. This is why I’m so enthusiastic about fibonacci hashing for this operation, since it costs only a single multiply.

To prove it works, take a look at this sample. The integer gFactor is equal to 2^32 / phi, where phi is the Golden Ratio. Note that hash * gFactor is 32-bit multiplication, and we AND with 0xffffffffL for the unsigned multiply with int size. Set::add returns false if the element is present.

import java.util.HashSet;
import java.util.Set;

class Scratch {
    public static void main(String[] args) {
        int gFactor = -1640531527;
        int size = 1_000;
        int range = 4_000;

        int collisions = 0;
        Set<Integer> set = new HashSet<>();

        for (int hash = 0; hash < range; ++hash) {
            int reduced = (int) (((hash * gFactor) & 0xffffffffL) * size >>> 32);
            if (!set.add(reduced)) collisions++;
        }

        System.out.println("Optimal collisions: " + (range - size));
        System.out.println("Actual collisions: " + collisions);
    }
}

> On Dec 19, 2019, at 4:31 AM, Andrew Haley <aph at redhat.com> wrote:
> 
> On 12/3/19 6:06 AM, John Rose wrote:
>> But what I keep wishing for a good one- or two-cycle instruction
>> that will mix all input bits into all the output bits, so that any
>> change in one input bit is likely to cause cascading changes in
>> many output bits, perhaps even 50% on average.  A pair of AES
>> steps is a good example of this.
> 
> I've searched for something like this too. However, I
> experimented with two rounds of AES and I didn't get very good
> results. From what I remember, it took at least three or four
> rounds to get decent mixing, let alone full avalanche.
> 
> Also, the latency of AES instructions tends to be a few cycles
> and the latency of moving data from integer to vector registers
> is as many as five cycles. So I gave up.
> 
> This was a while ago, so probably badly remembered...
> 
> -- 
> Andrew Haley  (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>