Bounds Check Elimination with Fast-Range

Tue Nov 19 04:43:23 UTC 2019

Apologies; there were actually a few errors in my Universal Hashing javadoc (not the code); they’ve been corrected: https://gist.github.com/AugustNagro/4f2d70d261347e515efe0f87de9e8dc2

One thing that might be relevant to the bounds check elimination is that in Java fast-range will output in the range of [-tableSize / 2, tableSize / 2 - 1]. So then we need table[fr(hash) + tableSize/2]. However, tableSize / 2 will be a constant, so that division need only be done once.

Regarding Florian’s concerns: yes it is right that fast-range isn’t optimal in every case (and I never tried to claim that). If your tableSize is a power of 2, then just use xor/mask ala HashMap. But the benefit is when mapping to tables of arbitrary size, where those modulo intrinsics may not apply.

And here’s a tangent to thing to think about: is growing HashMap’s backing array by powers of 2 actually a good thing, when the HashMap gets large? What if you instead wanted to grow by powers of 1.5, or even grow probabilistically, based on the collision rate, allocation pressure, or other data? With fast-range you can do this if you want. And without the performance hit of %!

> On Nov 18, 2019, at 2:17 PM, Florian Weimer <fweimer at redhat.com> wrote:
> 
> * John Rose:
> 
>> On Nov 18, 2019, at 5:10 AM, Florian Weimer <fweimer at redhat.com> wrote:
>>> 
>>>> 
>>>> The idea is that, given (int) hash h and (int) size N, then ((long) h)
>>>> * N) >>> 32 is a good mapping.
>>> 
>>> I looked at this in the past weeks in a different context, and I don't
>>> think this would work because we have:
>> 
>> That technique appears to require either a well-conditioned hash code
>> (which is not the case with Integer.hashCode) or else a value of N that
>> performs extra mixing on h.  (So a very *non-*power-of-two value of N
>> would be better here, i.e., N with larger popcount.)
>> 
>> A little more mixing should help the problem Florian reports with a
>> badly conditioned h.  Given this:
>> 
>> int fr(int h) { return (int)(((long)h * N) >>> 32); }
>> int h = x.hashCode();
>> //int bucket = fr(h);  // weak if h is badly conditioned
>> 
>> then, assuming multiplication is cheap:
> 
> (Back-to-back multiplications probably are not.)
> 
>> int bucket = fr(h * M); // M = 0x2357BD or something
>> 
>> or maybe something fast and sloppy like:
>> 
>> int bucket = fr(h + (h << 8));
>> 
>> or even:
>> 
>> int bucket = fr(h) ^ (h & (N-1));
> 
> Does this really work?  I don't think so.
> 
> I think this kind of perturbation is quite expensive.  Arm's BITR should
> be helpful here.  But even though this operation is commonly needed and
> easily implemented in hardware, it's rarely found in CPUs.
> 
> Any scheme with another multiplication is probably not an improvement
> over the multiply-shift-multiply-subtract sequence to implement modulo
> for certain convenient bucket counts, and for that, we can look up
> extensive analysis. 8-)
> 
> Thanks,
> Florian