RFR: JDK-8031043 ClassValue's backing map should have a smaller initial size

Mon Jun 13 17:18:17 UTC 2016

Hi,

I explored various strategies to minimize worst-case lookup performance 
for MethodType keys in LinearProbeHashtable. One idea is from the 
"Hopscotch hashing" algorithm [1] which tries to optimize placement of 
keys by moving them around at each insertion or deletion. While a 
concurrent Hopscotch hashtable is possible, it requires additional 
"metadata" about buckets which complicates it and does not make it 
practical for implementing in Java until Java gets value types and 
arrays of them. The simplest idea until then is to optimize placement of 
keys when the table is rehashed. Normally when table is rehashed the old 
table is scanned and entries from it inserted into new table. To achieve 
similar effect to "Hopscotch hashing", the order in which keys are taken 
from the old table and inserted into new table is changed. Keys are 
ordered by increasing bucket index as it would be computed for the key 
in the new table. Inserting in this order minimizes the worst-case 
lookup performance. Doing this when rehashing and not at every insertion 
or deletion guarantees that at least half of keys are optimally placed.

Another strategy to minimize worst-case lookup performance is to use 
quadratic probe sequence instead of linear probe sequence. Normally, 
when looking up a key, slots in the table are probed in the following 
sequence (seq = 0, 1, 2 ...):

     index(seq) = (hashCode + seq) % tableLength

Quadratic probing uses the following probe sequence:

     index(seq) = (hashCode + seq * (seq + 1) / 2) % tableLength

Those two strategies can be combined. Here's a simulation of using those 
two strategies in an open-addressing hashtable:

http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_MethodType_probe_sequence.txt

Using those strategies does not affect the average length of probing 
sequence much (length of 0 means that the key was found at its home 
location, length of 1 means that one non-equal key was probed before 
finding the equal one, etc ...), but worst-case lookup performance is 
greatly impacted. Combining both strategies minimizes the worst-case 
lookup performance.

Benchmarking using those strategies reveals the average lookup 
performance is consistently better than using CHM:

http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_MethodType_bench.pdf

The last trick to make this happen is stolen from CHM. The method type's 
key is a WeakReference<MethodType> which caches the hashCode of 
MethodType. By using cached hashCode in the key's equals() 
implementation as a means of optimization, we achieve similar effect 
that CHM achieves when it caches key's hashCode(s) in its Entry objects.

Here's the source of above benchmark:

http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_MethodType_bench_src.tgz

3 variations of LinearProbeHashtable are compared with CHM:

     LinearProbeHashtable - the plain one from webrev.04.4
     LinearProbeHashtable1 - using sorting of keys when rehashing to 
optimize their placement
     LinearProbeHashtable2 - combines sorting of keys with quadratic 
probe sequence

I think LinearProbeHashtable2 could be used in MethodType interning 
without fear of degrading lookup performance.

Regards, Peter

[1] https://en.wikipedia.org/wiki/Hopscotch_hashing