RFR: JDK-8031043 ClassValue's backing map should have a smaller initial size
Peter Levart
peter.levart at gmail.com
Mon Jun 13 00:49:56 UTC 2016
Hi, Claes,
Thanks for starting to look into this.
On 06/12/2016 06:28 PM, Claes Redestad wrote:
> Hi,
>
> this seems quite promising.
>
> A bit too much to review in one go, but let me start with some musings
> while I digest it more fully:
>
> 49 /**
> 50 * A special key that is used to overwrite a key when the
> entry is removed.
> 51 */
> 52 private static final class Tombstone {}
> 53
> 54 private static final Tombstone TOMBSTONE = new Tombstone();
>
> Since you're only checking identity for TOMBSTONE, couldn't it simply
> be an Object?
Yes, could be an Object. I initially had statefull Tombstones, but now
it is just a special reference.
> Similarly, could the class Removed you introduced in ClassValue be
> replaced by a similar Object TOMBSTONE?
Removed instances in ClassValue are different. Two distinct instances of
Removed are not the same and by not being the same, ABA problem is
avoided if during computation of some value in first thread, a value is
computed by some other thread, installed and then removed before trying
to install the computed value of the first thread.
>
> Thanks!
In the meanwhile I simplified some methods of LinearProbeHashtable
(including get()) so that they don't have to spin-wait for a removing
thread to install a tombstone. I did that after realizing that I could
reverse writes in remove() method(s). remove() method(s) now 1st
overwrite the key with a tombstone, and after that they clear the value.
Here's a webrev with just that simplification + your comment applied:
http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.4/
Separately and not included in above webrev, I managed to improve
average lookup time for MethodType keys by optimizing the locations of
keys when the table is rehashed. This does not affect ClassValue lookup
performance since ClassValue keys are so perfect that there are no hash
collisions at all. But if LinearProbeHashtable is to be used for
interning MethodType(s) too, than this is good news. I plan to publish
the mentioned change separately with benchmark results when ready...
Regards, Peter
>
>
> /Claes
>
> Note: I'd argue that this might help with some footprint regressions
> in JDK 9 (due to increased use of MHs, which in turn use ClassValues)
> and if getting through review in a reasonable time frame I think this
> should be targetted for JDK 9 as a regression bug fix.
>
> On 2016-06-09 16:42, Peter Levart wrote:
>> Hi,
>>
>> The patch for this issue is a result of the following discussion on
>> mlvm-dev list:
>>
>> http://mail.openjdk.java.net/pipermail/mlvm-dev/2016-May/006602.html
>>
>> I propose the following improvement to the implementation of ClassValue
>> API:
>>
>> http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.3/
>>
>>
>>
>> This is a re-implementation of ClassValue API that leverages a
>> ConcurrentMap for storage. ConcurrentHashMap was initially tested, but
>> using it regressed lookup performance for about ~10%. So a simple
>> ConcurrentMap implementation (LinearProbeHashtable) was created which
>> has smaller footprint, better CPU cache locality and consequently faster
>> lookup performance for the task at hand. The goal of this task was to
>> shrink the memory footprint of ClassValue API. To see the end effect
>> let's first look at the results of a benchmark that simulates a
>> re-deployment of an application in an app server where several apps are
>> deployed and each uses it's own set of classes and ClassValue(s):
>>
>> http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/ClassValueExpungeBench.java
>>
>>
>>
>> Benchmark (classValuesPerPart)
>> (classesPerPart) (impl) (partitions) Mode Cnt Score Error Units
>> ClassValueExpungeBench.redeployPartition 8 1024
>> jdk9 16 ss 16 65.682 ± 1.485 ms/op
>> ClassValueExpungeBench.redeployPartition 8 4096
>> jdk9 16 ss 16 247.040 ± 7.684 ms/op
>> ClassValueExpungeBench.redeployPartition 64 1024
>> jdk9 16 ss 16 302.536 ± 27.750 ms/op
>> ClassValueExpungeBench.redeployPartition 64 4096
>> jdk9 16 ss 16 1174.002 ± 77.183 ms/op
>>
>> Benchmark (classValuesPerPart) (classesPerPart) (impl) (partitions)
>> Mode Cnt Score Error Units
>> ClassValueExpungeBench.redeployPartition 8 1024
>> pl04.3 16 ss 16 47.179 ± 1.436 ms/op
>> ClassValueExpungeBench.redeployPartition 8 4096
>> pl04.3 16 ss 16 163.067 ± 8.118 ms/op
>> ClassValueExpungeBench.redeployPartition 64 1024
>> pl04.3 16 ss 16 67.581 ± 1.718 ms/op
>> ClassValueExpungeBench.redeployPartition 64 4096
>> pl04.3 16 ss 16 240.458 ± 6.616 ms/op
>>
>> A by-product of this simulation is a heap dump histogram taken after the
>> last warmup iteration when all "applications" are deployed:
>>
>> top-10 classes heap dump for 64 ClassValue(s) x 4096 Class(es) x 16
>> partitions
>>
>> jdk9:
>>
>> num #instances #bytes class name (module)
>> -------------------------------------------------------
>> 1: 4194309 167772360 java.util.WeakHashMap$Entry
>> (java.base at 9-internal)
>> 2: 4195329 134250528 java.lang.ClassValue$Entry
>> (java.base at 9-internal)
>> 3: 65539 34603248 [Ljava.util.WeakHashMap$Entry;
>> (java.base at 9-internal)
>> 4: 65537 34603032 [Ljava.lang.ClassValue$Entry;
>> (java.base at 9-internal)
>> 5: 67301 7552448 java.lang.Class
>> (java.base at 9-internal)
>> 6: 65536 4194304 java.lang.ClassValue$ClassValueMap
>> (java.base at 9-internal)
>> 7: 65552 2097664 java.lang.ref.ReferenceQueue
>> (java.base at 9-internal)
>> 8: 67736 1676344 [Ljava.lang.Object;
>> (java.base at 9-internal)
>> 9: 66349 1116848 [I (java.base at 9-internal)
>> 10: 65554 1048864 java.lang.ref.ReferenceQueue$Lock
>> (java.base at 9-internal)
>> -------------------------------------------------------
>> 388915640 == 370 MiB
>>
>> pl04.3:
>>
>> num #instances #bytes class name (module)
>> -------------------------------------------------------
>> 1: 133274 69833848 [Ljava.lang.Object;
>> (java.base at 9-internal)
>> 2: 67300 7552376 java.lang.Class
>> (java.base at 9-internal)
>> 3: 65536 2097152 java.lang.ClassValue$ClassValueMap
>> (java.base at 9-internal)
>> 4: 65536 2097152
>> java.lang.ClassValue$ClassValueMap$WeakNode (java.base at 9-internal)
>> 5: 66349 1116848 [I (java.base at 9-internal)
>> 6: 65991 1055856 java.lang.Object
>> (java.base at 9-internal)
>> 7: 7434 696584 [B (java.base at 9-internal)
>> 8: 1551 299680 [Ljava.lang.Class;
>> (java.base at 9-internal)
>> 9: 2447 176184 java.lang.reflect.Field
>> (java.base at 9-internal)
>> 10: 7154 171696 java.lang.String
>> (java.base at 9-internal)
>> -------------------------------------------------------
>> 85097376 == 81 MiB
>>
>>
>> Footprint is reduced for more than 4 times. The main improvement of this
>> implementation is that to support M Class(es) x N ClassValue(s), the
>> only part of the data structure that has O(M*N) space footprint are the
>> backing array(s) of LinearProbeHashtable. No less than 3 and no more
>> than 6 Object slots are used per Class x ClassValue association (4 in
>> average). Other supporting parts of the data structure are either O(M),
>> O(N) or O(1).
>>
>> The lookup time has improved too with this patch (slightly):
>>
>> http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/ClassValueBench.java
>>
>>
>>
>> http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.3.bench_results.pdf
>>
>>
>>
>>
>> As to the LinearProbeHashtable, it has been updated from initial version
>> to implement the whole ConcurrentMap interface. Not much was needed for
>> that (~150 lines of heavy commented code) comparing to the
>> implementation in webrev.04.2
>> (http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.2/).
>>
>> The end effect was that I was able to test the correctness of the
>> implementation using MOAT tests. Other changes to LinearProbeHashtable
>> from webrev.04.2 (for those following the discussion on mlvm-dev) are:
>>
>> - the lookup logic that was repeated in each and every method was
>> factored-out into a lookup() private method.
>> - further benchmarking revealed that is was not beneficial to have
>> a 'skip' field in tombstone objects. Reading this field to optimize
>> skipping over a consecutive run of tombstones reduces cache-locality of
>> algorithm. On average it is faster to just skip one tombstone at a time
>> and use a singleton tombstone object to be able to compare just the
>> reference.
>> - The following spin loop was added to get() method (and similar to
>> some other methods) to preserve SC properties of the methods:
>>
>> public V get(Object key) {
>> Object[] tab = table;
>> int i = lookup(tab, key);
>> if (i >= 0) {
>> Object value = getVolatile(tab, i + 1);
>> if (value != null) {
>> return (V) value;
>> }
>> // possible race with removing an entry can cause oldValue
>> // to be observed cleared. If this happens, we know the key
>> // has/will be tombstoned just after that but we must wait
>> // for that to happen in order to preserve SC properties of
>> // this method; without this wait one could even observe
>> the
>> // following in one thread:
>> //
>> // table.get(k) == null && table.containsKey(k);
>> //
>> while (getVolatile(tab, i) != TOMBSTONE) {
>> Thread.onSpinWait();
>> }
>> }
>> return null;
>> }
>>
>> ...the write this spin-loop is waiting on is in the remove() method(s):
>>
>> tab[i + 1] = null; // clear the value...
>> tombstoned++;
>> putVolatile(tab, i, TOMBSTONE); // ...publish
>> tombstone
>>
>>
>> The lookup performance of LinearProbeHashtable was put to test comparing
>> it with ConcurrentHashMap in the following benchmark:
>>
>> http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_src.tgz
>>
>>
>> Results:
>>
>> http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_bench.pdf
>>
>>
>>
>>
>> In this benchmark, several variations are tested. The one from
>> webrev.04.2 with stateful tombstones is called LinearProbeHashTable and
>> the one with singleton stateless tombstone, presented in this RFR as
>> webrev.04.3 is called LinearProbeHashTable1. The benchmark shows that
>> even during concurrent removals and insertions, the successful lookup
>> performance (GetExistent) does not degrade much and is on-par with
>> ConcurrentHashMap. This benchmark also explores the suitability of using
>> LinearProbeHashTable for keys derived from MethodType.
>>
>> A note to reviewers: It is useless to compare the old implementation
>> with the new one side by side as they have not much in common (just
>> public API and javadocs). Although the net line count increases with
>> this patch, the complexity of ClassValue implmenetation is reduced,
>> because it is a layered implementation. ClassValue API is implemented in
>> terms of a well-known ConcurrentMap API and correctness of ConcurrentMap
>> implementation is verified by MOAT tests.
>>
>> Regards, Peter
>>
More information about the core-libs-dev
mailing list