RFR: 8236641: Improve Set.of(...).iterator() warmup characteristics

Claes Redestad claes.redestad at oracle.com
Thu Jan 16 16:21:28 UTC 2020


Florian,

On 2020-01-16 11:20, Florian Weimer wrote:
> * Claes Redestad:
> 
>> we're doing plenty of iterations over Set.of() instances during
>> bootstrap, which makes these operations show up prominently in
>> startup profiles. This patch refactors a few hot methods to get a
>> measureable startup improvement without regressing on targeted
>> microbenchmarks.
>>
>> Bug:    https://bugs.openjdk.java.net/browse/JDK-8236641
>> Webrev: http://cr.openjdk.java.net/~redestad/8236641/open.00/
>>
>> (The patch is baselined against 8236850)
> 
> Would it be possible to replace
> 
>    idx = SALT % elements.length;
> 
>    idx = (SALT % (table.length >> 1)) << 1;
> 
> with this?
> 
>    idx = (int) ((elements.length * (long) SALT) >>> 32);
> 
>    idx = (int) (((table.length >> 1) * (long) SALT) >>> 31);
> 
> A long multiplication followed by a shift should be faster than an int
> modulo.
> 
> See this thread for some background:
> 
> <https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-November/040011.html>

I must have missed this - very interesting!

I see there's been some discussion about using it more generally, with
well-founded concern that it will only work well if the hashCode is
well-conditioned. Still could be interesting as a means to speed up
SetN/MapN::probe:

  int idx = Math.floorMod(pe.hashCode(), elements.length);

As we probably need to up-mix the hashCode a bit, I did this quick test
guided by how well it manages to distributed various ranges of Integers:

             int h = pe.hashCode();
             // mix low bits into upper bits
             h ^= h << 3;
             h ^= h << 13;
             h ^= h << 27;
             h >>>= 1; // stay positive
             int idx = (int)(((long)h * elements.length) >> 31);

For the ranges of Integers I tested it distributes things (much) better.

And it speed up ImmutableColls.createSetOf by a factor ~1.6-1.7x
(uses constant Strings as payload, so hashCode overhead is downplayed).

While promising, any mixer is a trade-off between how well it mixes
(which is very hard to evaluate universally) and the performance hit it
incurs. A few shifts should remain faster than Math.floorMod while
hopefully improving distribution over the baseline in 99% of cases, but
this surely needs more investigation (above is most likely too naive).

Let's file a follow-up.

Regardless, your suggestion above ought to work great for SALT %
foo.length since SALT is just a pseudo-random number, which is likely to
have a good mix of high bits set, so we'll get a random number in the
[0..N-1] range for any table size N. (For MapN we can't simplify (x >>>
32) << 1 to x >> 31 like you did, though, since we need to mask out the
lowest bit to get an even index.). I don't see any significant
improvement in micros, but since we're no longer doing a division we can
remove the test for zero in the constructors, so it feels like a
cleanup.

I've cleaned up a bit (SALT can be made into a long to reduce casts
etc), and rolled it into the current patch:

http://cr.openjdk.java.net/~redestad/8236641/open.01/

Thanks!

/Claes


More information about the core-libs-dev mailing list