RFR: JDK-8031043 ClassValue's backing map should have a smaller initial size
Claes Redestad
claes.redestad at oracle.com
Sun Jun 12 16:28:28 UTC 2016
Hi,
this seems quite promising.
A bit too much to review in one go, but let me start with some musings
while I digest it more fully:
49 /**
50 * A special key that is used to overwrite a key when the
entry is removed.
51 */
52 private static final class Tombstone {}
53
54 private static final Tombstone TOMBSTONE = new Tombstone();
Since you're only checking identity for TOMBSTONE, couldn't it simply be
an Object? Similarly, could the class Removed you introduced in
ClassValue be replaced by a similar Object TOMBSTONE?
Thanks!
/Claes
Note: I'd argue that this might help with some footprint regressions in
JDK 9 (due to increased use of MHs, which in turn use ClassValues) and
if getting through review in a reasonable time frame I think this should
be targetted for JDK 9 as a regression bug fix.
On 2016-06-09 16:42, Peter Levart wrote:
> Hi,
>
> The patch for this issue is a result of the following discussion on
> mlvm-dev list:
>
> http://mail.openjdk.java.net/pipermail/mlvm-dev/2016-May/006602.html
>
> I propose the following improvement to the implementation of ClassValue
> API:
>
> http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.3/
>
>
> This is a re-implementation of ClassValue API that leverages a
> ConcurrentMap for storage. ConcurrentHashMap was initially tested, but
> using it regressed lookup performance for about ~10%. So a simple
> ConcurrentMap implementation (LinearProbeHashtable) was created which
> has smaller footprint, better CPU cache locality and consequently faster
> lookup performance for the task at hand. The goal of this task was to
> shrink the memory footprint of ClassValue API. To see the end effect
> let's first look at the results of a benchmark that simulates a
> re-deployment of an application in an app server where several apps are
> deployed and each uses it's own set of classes and ClassValue(s):
>
> http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/ClassValueExpungeBench.java
>
>
> Benchmark (classValuesPerPart)
> (classesPerPart) (impl) (partitions) Mode Cnt Score Error Units
> ClassValueExpungeBench.redeployPartition 8 1024
> jdk9 16 ss 16 65.682 ± 1.485 ms/op
> ClassValueExpungeBench.redeployPartition 8 4096
> jdk9 16 ss 16 247.040 ± 7.684 ms/op
> ClassValueExpungeBench.redeployPartition 64 1024
> jdk9 16 ss 16 302.536 ± 27.750 ms/op
> ClassValueExpungeBench.redeployPartition 64 4096
> jdk9 16 ss 16 1174.002 ± 77.183 ms/op
>
> Benchmark (classValuesPerPart) (classesPerPart) (impl) (partitions)
> Mode Cnt Score Error Units
> ClassValueExpungeBench.redeployPartition 8 1024
> pl04.3 16 ss 16 47.179 ± 1.436 ms/op
> ClassValueExpungeBench.redeployPartition 8 4096
> pl04.3 16 ss 16 163.067 ± 8.118 ms/op
> ClassValueExpungeBench.redeployPartition 64 1024
> pl04.3 16 ss 16 67.581 ± 1.718 ms/op
> ClassValueExpungeBench.redeployPartition 64 4096
> pl04.3 16 ss 16 240.458 ± 6.616 ms/op
>
> A by-product of this simulation is a heap dump histogram taken after the
> last warmup iteration when all "applications" are deployed:
>
> top-10 classes heap dump for 64 ClassValue(s) x 4096 Class(es) x 16
> partitions
>
> jdk9:
>
> num #instances #bytes class name (module)
> -------------------------------------------------------
> 1: 4194309 167772360 java.util.WeakHashMap$Entry
> (java.base at 9-internal)
> 2: 4195329 134250528 java.lang.ClassValue$Entry
> (java.base at 9-internal)
> 3: 65539 34603248 [Ljava.util.WeakHashMap$Entry;
> (java.base at 9-internal)
> 4: 65537 34603032 [Ljava.lang.ClassValue$Entry;
> (java.base at 9-internal)
> 5: 67301 7552448 java.lang.Class (java.base at 9-internal)
> 6: 65536 4194304 java.lang.ClassValue$ClassValueMap
> (java.base at 9-internal)
> 7: 65552 2097664 java.lang.ref.ReferenceQueue
> (java.base at 9-internal)
> 8: 67736 1676344 [Ljava.lang.Object;
> (java.base at 9-internal)
> 9: 66349 1116848 [I (java.base at 9-internal)
> 10: 65554 1048864 java.lang.ref.ReferenceQueue$Lock
> (java.base at 9-internal)
> -------------------------------------------------------
> 388915640 == 370 MiB
>
> pl04.3:
>
> num #instances #bytes class name (module)
> -------------------------------------------------------
> 1: 133274 69833848 [Ljava.lang.Object;
> (java.base at 9-internal)
> 2: 67300 7552376 java.lang.Class (java.base at 9-internal)
> 3: 65536 2097152 java.lang.ClassValue$ClassValueMap
> (java.base at 9-internal)
> 4: 65536 2097152
> java.lang.ClassValue$ClassValueMap$WeakNode (java.base at 9-internal)
> 5: 66349 1116848 [I (java.base at 9-internal)
> 6: 65991 1055856 java.lang.Object
> (java.base at 9-internal)
> 7: 7434 696584 [B (java.base at 9-internal)
> 8: 1551 299680 [Ljava.lang.Class;
> (java.base at 9-internal)
> 9: 2447 176184 java.lang.reflect.Field
> (java.base at 9-internal)
> 10: 7154 171696 java.lang.String
> (java.base at 9-internal)
> -------------------------------------------------------
> 85097376 == 81 MiB
>
>
> Footprint is reduced for more than 4 times. The main improvement of this
> implementation is that to support M Class(es) x N ClassValue(s), the
> only part of the data structure that has O(M*N) space footprint are the
> backing array(s) of LinearProbeHashtable. No less than 3 and no more
> than 6 Object slots are used per Class x ClassValue association (4 in
> average). Other supporting parts of the data structure are either O(M),
> O(N) or O(1).
>
> The lookup time has improved too with this patch (slightly):
>
> http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/ClassValueBench.java
>
>
> http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.3.bench_results.pdf
>
>
>
> As to the LinearProbeHashtable, it has been updated from initial version
> to implement the whole ConcurrentMap interface. Not much was needed for
> that (~150 lines of heavy commented code) comparing to the
> implementation in webrev.04.2
> (http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.2/).
> The end effect was that I was able to test the correctness of the
> implementation using MOAT tests. Other changes to LinearProbeHashtable
> from webrev.04.2 (for those following the discussion on mlvm-dev) are:
>
> - the lookup logic that was repeated in each and every method was
> factored-out into a lookup() private method.
> - further benchmarking revealed that is was not beneficial to have
> a 'skip' field in tombstone objects. Reading this field to optimize
> skipping over a consecutive run of tombstones reduces cache-locality of
> algorithm. On average it is faster to just skip one tombstone at a time
> and use a singleton tombstone object to be able to compare just the
> reference.
> - The following spin loop was added to get() method (and similar to
> some other methods) to preserve SC properties of the methods:
>
> public V get(Object key) {
> Object[] tab = table;
> int i = lookup(tab, key);
> if (i >= 0) {
> Object value = getVolatile(tab, i + 1);
> if (value != null) {
> return (V) value;
> }
> // possible race with removing an entry can cause oldValue
> // to be observed cleared. If this happens, we know the key
> // has/will be tombstoned just after that but we must wait
> // for that to happen in order to preserve SC properties of
> // this method; without this wait one could even observe the
> // following in one thread:
> //
> // table.get(k) == null && table.containsKey(k);
> //
> while (getVolatile(tab, i) != TOMBSTONE) {
> Thread.onSpinWait();
> }
> }
> return null;
> }
>
> ...the write this spin-loop is waiting on is in the remove() method(s):
>
> tab[i + 1] = null; // clear the value...
> tombstoned++;
> putVolatile(tab, i, TOMBSTONE); // ...publish
> tombstone
>
>
> The lookup performance of LinearProbeHashtable was put to test comparing
> it with ConcurrentHashMap in the following benchmark:
>
> http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_src.tgz
>
> Results:
>
> http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_bench.pdf
>
>
>
> In this benchmark, several variations are tested. The one from
> webrev.04.2 with stateful tombstones is called LinearProbeHashTable and
> the one with singleton stateless tombstone, presented in this RFR as
> webrev.04.3 is called LinearProbeHashTable1. The benchmark shows that
> even during concurrent removals and insertions, the successful lookup
> performance (GetExistent) does not degrade much and is on-par with
> ConcurrentHashMap. This benchmark also explores the suitability of using
> LinearProbeHashTable for keys derived from MethodType.
>
> A note to reviewers: It is useless to compare the old implementation
> with the new one side by side as they have not much in common (just
> public API and javadocs). Although the net line count increases with
> this patch, the complexity of ClassValue implmenetation is reduced,
> because it is a layered implementation. ClassValue API is implemented in
> terms of a well-known ConcurrentMap API and correctness of ConcurrentMap
> implementation is verified by MOAT tests.
>
> Regards, Peter
>
More information about the core-libs-dev
mailing list