Proposal: optimization of Map.copyOf and Collectors.toUnmodifiableMap
Roger Riggs
Roger.Riggs at Oracle.com
Mon Jun 18 14:25:37 UTC 2018
Hi Stuart,
In regard to new SharedSecret interfaces, one option is move shared (but
private) implementation classes
to a jdk.internal.xx package (not exported). This only works well if
they are not tightly coupled to other
package private classes.
SpinedBuffer might be a good candidate, I have some IO cases in mind
that could benefit from
the allocation/reallocation savings. (ByteArrayOutputStream for 1).
Regards, Roger
On 6/15/2018 5:22 PM, Stuart Marks wrote:
> Hi Peter,
>
> Finally getting back to this.
>
> I think there's clearly room for improvement in the creation of the
> unmodifiable collections. Right now the creation paths (mostly) use
> only public APIs, which can result in unnecessary allocation and
> copying. Certainly Map.copyOf does this, but there are also other
> cases as well.
>
> For copying from an unknown map, there are a couple approaches:
>
> * accumulate keys and values in a single ArrayList, which can be
> presized as necessary, but which will grow if necessary; then copy
> elements from a subrange of ArrayList's internal array (similar to
> your MapN(Object[], len) overload)
>
> * accumulate keys and values into a SpinedBuffer, which doesn't
> require copying to grow, which is preferable if for some reason we
> can't pre-size accurately; and then copy the elements out of it
>
> The Collectors.toUnmodifiableMap implementations are known to create
> HashMap instances, so they could pass the HashMap directly to a
> private MapN constructor that in turn could talk directly to HashMap
> to get the keys and values. This avoids allocation of a full-sized
> buffer and one copy.
>
> Note that these techniques involve creating new interfaces, sometimes
> that cross package boundaries. It's a bit of an irritant to have to
> plumb new paths that go through SharedSecrets, but it seems likely to
> be worthwhile if we can avoid bulk allocation and copying steps.
>
> Given these, it doesn't seem to me that the BiBuffer approach helps
> very much. I think there are many other avenues that would be
> worthwhile to explore, and that possibly can provide bigger savings.
>
> s'marks
>
>
>
>
>
> On 6/11/18 3:57 AM, Peter Levart wrote:
>> Hi,
>>
>> Those two methods were added in JDK 10 and they are not very optimal.
>> Map.copyOf(Map map) 1st dumps the source map into an array of
>> Map.Entry(s) (map.entrySet().toArray(new Entry[0])), which typically
>> creates new Map.Entry objects, then passes the array to
>> Map.ofEntries(Map.Entry[] entries) factory method that iterates the
>> array and constructs a key/value interleaved array from it which is
>> passed to ImmutableCollections.MapN constructor, which then
>> constructs a linear-probing hash table from it. So each key and value
>> is copied 3 times, while several temporary objects are created in the
>> process.
>>
>> One copying step could be eliminated and construction of temporary
>> Map.Entry objects too:
>>
>> http://cr.openjdk.java.net/~plevart/jdk-dev/UnmodifiableMap_copyOf/webrev.01/
>>
>>
>> Collecting stream(s) using Collectors.toUnmodifiableMap() 1st
>> collects key/value pairs into a HashMap, then it performs equivalent
>> operations as Map.copyOf(hashMap) at the end. Using Map.copyOf()
>> directly benefits this collection operation too.
>>
>> The following benchmark:
>>
>> http://cr.openjdk.java.net/~plevart/jdk-dev/UnmodifiableMap_copyOf/UnmodifiableMapBench.java
>>
>>
>>
>> Shows up to 30% improvement of .copyOf operation with this patch
>> applied:
>>
>> Original:
>>
>> Benchmark (size) Mode Cnt Score Error
>> Units
>> UnmodifiableMapBench.copyOf 10 avgt 10 403.633 ±
>> 2.640 ns/op
>> UnmodifiableMapBench.copyOf 100 avgt 10 3489.623 ±
>> 44.590 ns/op
>> UnmodifiableMapBench.copyOf 1000 avgt 10 40030.572 ±
>> 277.075 ns/op
>> UnmodifiableMapBench.toUnmodifiableMap 10 avgt 10 831.221 ±
>> 3.816 ns/op
>> UnmodifiableMapBench.toUnmodifiableMap 100 avgt 10 9783.519 ±
>> 43.097 ns/op
>> UnmodifiableMapBench.toUnmodifiableMap 1000 avgt 10 96524.536 ±
>> 670.818 ns/op
>>
>> Patched:
>>
>> Benchmark (size) Mode Cnt Score Error
>> Units
>> UnmodifiableMapBench.copyOf 10 avgt 10 264.172 ±
>> 1.882 ns/op
>> UnmodifiableMapBench.copyOf 100 avgt 10 2318.974 ±
>> 15.877 ns/op
>> UnmodifiableMapBench.copyOf 1000 avgt 10 29291.782 ±
>> 3139.737 ns/op
>> UnmodifiableMapBench.toUnmodifiableMap 10 avgt 10 771.221 ±
>> 65.432 ns/op
>> UnmodifiableMapBench.toUnmodifiableMap 100 avgt 10 9275.016 ±
>> 725.722 ns/op
>> UnmodifiableMapBench.toUnmodifiableMap 1000 avgt 10 82204.342 ±
>> 851.741 ns/op
>>
>>
>> Production of garbage is also reduced, since no Map.Entry temporary
>> objects are constructed:
>>
>> Original:
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> UnmodifiableMapBench.copyOf:·gc.alloc.rate.norm 10 avgt 10 416.001
>> ± 0.002 B/op
>> UnmodifiableMapBench.copyOf:·gc.alloc.rate.norm 100 avgt 10
>> 2936.005 ± 0.019 B/op
>> UnmodifiableMapBench.copyOf:·gc.alloc.rate.norm 1000 avgt 10
>> 28136.059 ± 0.199 B/op
>> UnmodifiableMapBench.toUnmodifiableMap:·gc.alloc.rate.norm 10 avgt
>> 10 1368.001 ± 0.004 B/op
>> UnmodifiableMapBench.toUnmodifiableMap:·gc.alloc.rate.norm 100 avgt
>> 10 10208.139 ± 0.045 B/op
>> UnmodifiableMapBench.toUnmodifiableMap:·gc.alloc.rate.norm 1000 avgt
>> 10 93025.923 ± 0.573 B/op
>>
>> Patched:
>>
>> Benchmark (size) Mode Cnt Score Error Units
>> UnmodifiableMapBench.copyOf:·gc.alloc.rate.norm 10 avgt 10 304.000
>> ± 0.001 B/op
>> UnmodifiableMapBench.copyOf:·gc.alloc.rate.norm 100 avgt 10
>> 2464.004 ± 0.012 B/op
>> UnmodifiableMapBench.copyOf:·gc.alloc.rate.norm 1000 avgt 10
>> 24064.040 ± 0.137 B/op
>> UnmodifiableMapBench.toUnmodifiableMap:·gc.alloc.rate.norm 10 avgt
>> 10 1256.001 ± 0.003 B/op
>> UnmodifiableMapBench.toUnmodifiableMap:·gc.alloc.rate.norm 100 avgt
>> 10 9720.153 ± 0.055 B/op
>> UnmodifiableMapBench.toUnmodifiableMap:·gc.alloc.rate.norm 1000 avgt
>> 10 88905.688 ± 0.574 B/op
>>
>>
>> So what do you think? Is this an improvement?
>>
>> Regards, Peter
>>
More information about the core-libs-dev
mailing list