RFR: JDK-8031043 ClassValue's backing map should have a smaller initial size

Claes Redestad claes.redestad at oracle.com
Sun Jun 12 16:28:28 UTC 2016


Hi,

this seems quite promising.

A bit too much to review in one go, but let me start with some musings 
while I digest it more fully:

   49     /**
   50      * A special key that is used to overwrite a key when the 
entry is removed.
   51      */
   52     private static final class Tombstone {}
   53
   54     private static final Tombstone TOMBSTONE = new Tombstone();

Since you're only checking identity for TOMBSTONE, couldn't it simply be 
an Object? Similarly, could the class Removed you introduced in 
ClassValue be replaced by a similar Object TOMBSTONE?

Thanks!

/Claes

Note: I'd argue that this might help with some footprint regressions in 
JDK 9 (due to increased use of MHs, which in turn use ClassValues) and 
if getting through review in a reasonable time frame I think this should 
be targetted for JDK 9 as a regression bug fix.

On 2016-06-09 16:42, Peter Levart wrote:
> Hi,
>
> The patch for this issue is a result of the following discussion on
> mlvm-dev list:
>
> http://mail.openjdk.java.net/pipermail/mlvm-dev/2016-May/006602.html
>
> I propose the following improvement to the implementation of ClassValue
> API:
>
> http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.3/
>
>
> This is a re-implementation of ClassValue API that leverages a
> ConcurrentMap for storage. ConcurrentHashMap was initially tested, but
> using it regressed lookup performance for about ~10%. So a simple
> ConcurrentMap implementation (LinearProbeHashtable) was created which
> has smaller footprint, better CPU cache locality and consequently faster
> lookup performance for the task at hand. The goal of this task was to
> shrink the memory footprint of ClassValue API. To see the end effect
> let's first look at the results of a benchmark that simulates a
> re-deployment of an application in an app server where several apps are
> deployed and each uses it's own set of classes and ClassValue(s):
>
> http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/ClassValueExpungeBench.java
>
>
> Benchmark                                 (classValuesPerPart)
> (classesPerPart)  (impl)  (partitions)  Mode  Cnt     Score Error  Units
> ClassValueExpungeBench.redeployPartition 8              1024
> jdk9            16    ss   16    65.682 ± 1.485  ms/op
> ClassValueExpungeBench.redeployPartition 8              4096
> jdk9            16    ss   16   247.040 ± 7.684  ms/op
> ClassValueExpungeBench.redeployPartition 64              1024
> jdk9            16    ss   16   302.536 ± 27.750  ms/op
> ClassValueExpungeBench.redeployPartition 64              4096
> jdk9            16    ss   16  1174.002 ± 77.183  ms/op
>
> Benchmark (classValuesPerPart)  (classesPerPart)  (impl)  (partitions)
> Mode  Cnt    Score   Error  Units
> ClassValueExpungeBench.redeployPartition 8              1024
> pl04.3            16    ss   16   47.179 ± 1.436  ms/op
> ClassValueExpungeBench.redeployPartition 8              4096
> pl04.3            16    ss   16  163.067 ± 8.118  ms/op
> ClassValueExpungeBench.redeployPartition 64              1024
> pl04.3            16    ss   16   67.581 ± 1.718  ms/op
> ClassValueExpungeBench.redeployPartition 64              4096
> pl04.3            16    ss   16  240.458 ± 6.616  ms/op
>
> A by-product of this simulation is a heap dump histogram taken after the
> last warmup iteration when all "applications" are deployed:
>
> top-10 classes heap dump for 64 ClassValue(s) x 4096 Class(es) x 16
> partitions
>
> jdk9:
>
>   num     #instances         #bytes  class name (module)
> -------------------------------------------------------
>     1:       4194309      167772360 java.util.WeakHashMap$Entry
> (java.base at 9-internal)
>     2:       4195329      134250528 java.lang.ClassValue$Entry
> (java.base at 9-internal)
>     3:         65539       34603248 [Ljava.util.WeakHashMap$Entry;
> (java.base at 9-internal)
>     4:         65537       34603032 [Ljava.lang.ClassValue$Entry;
> (java.base at 9-internal)
>     5:         67301        7552448  java.lang.Class (java.base at 9-internal)
>     6:         65536        4194304 java.lang.ClassValue$ClassValueMap
> (java.base at 9-internal)
>     7:         65552        2097664 java.lang.ref.ReferenceQueue
> (java.base at 9-internal)
>     8:         67736        1676344  [Ljava.lang.Object;
> (java.base at 9-internal)
>     9:         66349        1116848  [I (java.base at 9-internal)
>    10:         65554        1048864 java.lang.ref.ReferenceQueue$Lock
> (java.base at 9-internal)
> -------------------------------------------------------
>                           388915640 == 370 MiB
>
> pl04.3:
>
>   num     #instances         #bytes  class name (module)
> -------------------------------------------------------
>     1:        133274       69833848  [Ljava.lang.Object;
> (java.base at 9-internal)
>     2:         67300        7552376  java.lang.Class (java.base at 9-internal)
>     3:         65536        2097152 java.lang.ClassValue$ClassValueMap
> (java.base at 9-internal)
>     4:         65536        2097152
> java.lang.ClassValue$ClassValueMap$WeakNode (java.base at 9-internal)
>     5:         66349        1116848  [I (java.base at 9-internal)
>     6:         65991        1055856  java.lang.Object
> (java.base at 9-internal)
>     7:          7434         696584  [B (java.base at 9-internal)
>     8:          1551         299680  [Ljava.lang.Class;
> (java.base at 9-internal)
>     9:          2447         176184  java.lang.reflect.Field
> (java.base at 9-internal)
>    10:          7154         171696  java.lang.String
> (java.base at 9-internal)
> -------------------------------------------------------
>                            85097376 == 81 MiB
>
>
> Footprint is reduced for more than 4 times. The main improvement of this
> implementation is that to support M Class(es) x N ClassValue(s), the
> only part of the data structure that has O(M*N) space footprint are the
> backing array(s) of LinearProbeHashtable. No less than 3 and no more
> than 6 Object slots are used per Class x ClassValue association (4 in
> average). Other supporting parts of the data structure are either O(M),
> O(N) or O(1).
>
> The lookup time has improved too with this patch (slightly):
>
> http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/ClassValueBench.java
>
>
> http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.3.bench_results.pdf
>
>
>
> As to the LinearProbeHashtable, it has been updated from initial version
> to implement the whole ConcurrentMap interface. Not much was needed for
> that (~150 lines of heavy commented code) comparing to the
> implementation in webrev.04.2
> (http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.2/).
> The end effect was that I was able to test the correctness of the
> implementation using MOAT tests. Other changes to LinearProbeHashtable
> from webrev.04.2 (for those following the discussion on mlvm-dev) are:
>
>      - the lookup logic that was repeated in each and every method was
> factored-out into a lookup() private method.
>      - further benchmarking revealed that is was not beneficial to have
> a 'skip' field in tombstone objects. Reading this field to optimize
> skipping over a consecutive run of tombstones reduces cache-locality of
> algorithm. On average it is faster to just skip one tombstone at a time
> and use a singleton tombstone object to be able to compare just the
> reference.
>      - The following spin loop was added to get() method (and similar to
> some other methods) to preserve SC properties of the methods:
>
>      public V get(Object key) {
>          Object[] tab = table;
>          int i = lookup(tab, key);
>          if (i >= 0) {
>              Object value = getVolatile(tab, i + 1);
>              if (value != null) {
>                  return (V) value;
>              }
>              // possible race with removing an entry can cause oldValue
>              // to be observed cleared. If this happens, we know the key
>              // has/will be tombstoned just after that but we must wait
>              // for that to happen in order to preserve SC properties of
>              // this method; without this wait one could even observe the
>              // following in one thread:
>              //
>              //      table.get(k) == null && table.containsKey(k);
>              //
>              while (getVolatile(tab, i) != TOMBSTONE) {
>                  Thread.onSpinWait();
>              }
>          }
>          return null;
>      }
>
> ...the write this spin-loop is waiting on is in the remove() method(s):
>
>                          tab[i + 1] = null; // clear the value...
>                          tombstoned++;
>                          putVolatile(tab, i, TOMBSTONE); // ...publish
> tombstone
>
>
> The lookup performance of LinearProbeHashtable was put to test comparing
> it with ConcurrentHashMap in the following benchmark:
>
> http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_src.tgz
>
> Results:
>
> http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_bench.pdf
>
>
>
> In this benchmark, several variations are tested. The one from
> webrev.04.2 with stateful tombstones is called LinearProbeHashTable and
> the one with singleton stateless tombstone, presented in this RFR as
> webrev.04.3 is called LinearProbeHashTable1. The benchmark shows that
> even during concurrent removals and insertions, the successful lookup
> performance (GetExistent) does not degrade much and is on-par with
> ConcurrentHashMap. This benchmark also explores the suitability of using
> LinearProbeHashTable for keys derived from MethodType.
>
> A note to reviewers: It is useless to compare the old implementation
> with the new one side by side as they have not much in common (just
> public API and javadocs). Although the net line count increases with
> this patch, the complexity of ClassValue implmenetation is reduced,
> because it is a layered implementation. ClassValue API is implemented in
> terms of a well-known ConcurrentMap API and correctness of ConcurrentMap
> implementation is verified by MOAT tests.
>
> Regards, Peter
>


More information about the core-libs-dev mailing list