[10?] RFR: 8193128: Reduce number of implementation classes returned by List/Set/Map.of()

Wed Dec 6 21:20:15 UTC 2017

Guava struggled with this as well with their immutable collections.  You
could look at their revision history (I haven't). Maybe they got rid of
SingletonImmutableList for the same reason?

On Wed, Dec 6, 2017 at 12:21 PM, Claes Redestad <claes.redestad at oracle.com>
wrote:

> Hi,
>
> please help review this patch to consolidate the number of implementation
> classes returned by the static collection factories:
>
> http://cr.openjdk.java.net/~redestad/8193128/open.00/
>
> I set out to explore options for addressing small inefficiencies we've
> been running into, the latest after replacing use of @Stable arrays in
> java.lang.invoke with List.of() (JDK-8184777):
>
> - List.indexOf deferred to the iterator in AbstractList, which check for
> concurrent modification
> - Warmup takes a bit longer due to having to warm up four different
> classes and associated methods
> - Slowdowns in certain code paths attributable to certain calls becoming
> megamorphic
>
> Microbenchmark: http://cr.openjdk.java.net/~re
> destad/scratch/less_immutables/ListMorphism.java
> http://cr.openjdk.java.net/~redestad/scratch/less_immutables
> /benchmarks.jar
>
> The benchmark explores how call-sites behave when performing some naive
> operations on a few different Lists.
>
> For every benchmark using List.of() there's a variant using ArrayList for
> comparison:
>
> Baseline:
>
> Benchmark                             Mode  Cnt    Score Error   Units
> ListMorphism.finalGetFromArrayList   thrpt   25   92.659 ±  3.058 ops/us
> ListMorphism.finalGetFromList        thrpt   25  335.245 ±  0.244 ops/us
> # 3.6x
> ListMorphism.finalSumSizesArrayList  thrpt   25  245.020 ±  0.106 ops/us
> ListMorphism.finalSumSizesList       thrpt   25  335.107 ±  0.439 ops/us
> # 1.4x
> ListMorphism.getFromArrayList        thrpt   25   70.343 ±  0.972 ops/us
> ListMorphism.getFromList             thrpt   25   37.121 ±  0.135 ops/us
> # 0.53x
> ListMorphism.getFromArrayList12      thrpt   25 109.890 ±  0.286  ops/us
> ListMorphism.getFromList12           thrpt   25  109.552 ± 6.972  ops/us
> # 1.0x
> ListMorphism.sumSizesArrayList       thrpt   25  131.467 ±  4.672 ops/us
> ListMorphism.sumSizesList            thrpt   25   57.877 ±  0.102 ops/us
> # 0.45x
> ListMorphism.sumSizesArrayList12     thrpt   25 208.652 ± 11.294  ops/us
> ListMorphism.sumSizesList12          thrpt   25  227.269 ± 0.961  ops/us
> # 1.1x
>
> The good: When dealing with List literals (the final* benchmarks),
> List.of() allows really nice speed-ups compared to ArrayList.
>
> The bad: When not used as a literal, List.of() implementations at
> call-sites can cause a substantial slowdown (megamorphic dispatch)
>
> The ugly:
>
> After some explorations[1], I narrowed in on the following experiment:
> http://cr.openjdk.java.net/~redestad/scratch/less_immutables/webrev/
>
> Basically: Merge List1 and List2 into a single class, List12, merge List0
> into ListN (List0 has a singleton instance, so might as well be an instance
> of ListN). Same for Set0,1,2,N. Map0 is merged into MapN.
>
> This strikes a balance between throughput, footprint and slightly better
> startup/warmup behavior.
>
> According to jol estimates[2] the size of List12 is the same as both List1
> and List2 in the current JDK implementation. Set12 is footprint neutral
> compared to Set2 on all platforms but lose a few bytes on 64-bit VMs
> compared to Set1.
>
> Benchmark                             Mode  Cnt    Score   Error Units
> ListMorphism.finalGetFromArrayList   thrpt   25   93.046 ± 0.569 ops/us
> ListMorphism.finalGetFromList        thrpt   25  335.280 ± 0.154 ops/us #
> 3.6x
> ListMorphism.finalSumSizesArrayList  thrpt   25  244.595 ± 1.085 ops/us
> ListMorphism.finalSumSizesList       thrpt   25  303.351 ± 0.182 ops/us #
> 1.2x
> ListMorphism.getFromArrayList        thrpt   25   70.631 ± 0.070 ops/us
> ListMorphism.getFromList             thrpt   25   73.258 ± 2.955 ops/us #
> 1.04x
> ListMorphism.getFromArrayList12      thrpt   25 109.921 ± 0.096  ops/us
> ListMorphism.getFromList12           thrpt   25  127.392 ± 0.088  ops/us
> # 1.16x
> ListMorphism.sumSizesArrayList       thrpt   25  131.393 ± 4.882 ops/us
> ListMorphism.sumSizesList            thrpt   25  107.686 ± 5.286 ops/us #
> 0.82x
> ListMorphism.sumSizesArrayList12     thrpt   25  212.350 ± 0.134  ops/us
> ListMorphism.sumSizesList12          thrpt   25  198.778 ± 0.479  ops/us
> # 0.94x
>
> The experiment has a flag to change number of specialized List/Set/Map
> classes (-Djdk.ImmutableCollections.specializations=0|1|2, default=2).
>
> 1 specialization (List1 + ListN, Set1 + SetN) is more or less the same as
> 2, some ~1-2% improvements, mainly in sumSizes micros.
>
> 0 specializations (List-, Set, MapN only) achieves a small increase in
> throughput on some micros by ensuring callsites stay monomorphic, but it's
> not very substantial by my measures (~5%, but mostly in sumSizes micros).
>
> Keeping the footprint more or less the same, while evening out a few rough
> edges and improving startup and static footprint seems like the overall
> best option. An alternative would be to keep the experimental flag, but I
> don't think a 5% gain on micros warrants the extra maintenance burden and
> testing that entails.
>
> The proposed patch is more or less this experiment with 2 specializations,
> but having removed the flag and code movement needed to support it removed
> (along with a fix in the writeReplace methods of List12/Set12)
>
> Thanks!
>
> /Claes
>
> [1] Older experiments:
> http://cr.openjdk.java.net/~redestad/immutable/list12N.0/
>  -- simply merge L0 into LN (still have L1, L2 and LN) - nothing really
> changed, though
>
> http://cr.openjdk.java.net/~redestad/immutable/list1N.0/
>  -- L0 merged into LN, drop L2. Substantial performance boost on micros.
> Footprint drop for 2-element lists.
>
> http://cr.openjdk.java.net/~redestad/immutable/listNdouble.0/
>  -- L0+L1+L2+LN merged into one implementation. Same footprint with a
> single class, but loses a lot on throughput in micros.
>
> http://cr.openjdk.java.net/~redestad/immutable/listNsingle.0/
>  -- L0+L1+LN merged, drop L2. Simplification of the previous. Like the
> list1N.0 experiment in footprint, but a loss in throughput on all measures.
>
> No approach seemed a win across the board here: we either had to accept a
> footprint reduction, or see throughput suffer dramatically.
>
> [2] http://openjdk.java.net/projects/code-tools/jol/
>