RFR: 8324573: HashMap::putAll should resize to sum of both map sizes [v3]

Sat Jan 27 09:12:34 UTC 2024

On Thu, 25 Jan 2024 00:29:40 GMT, Joshua Cao <duke at openjdk.org> wrote:

>> This change mirrors what we did for ConcurrentHashMap in https://github.com/openjdk/jdk/pull/17116. When we add all entries from one map to anther, we should resize that map to the size of the sum of both maps.
>> 
>> I used the command below to run the benchmarks. I set a high heap to reduce garbage collection noise.
>> 
>> java -Xms25G -jar benchmarks.jar -p size=100000 -p addSize=100000 -gc true org.openjdk.bench.java.util.HashMapBench
>> 
>> 
>> Before change
>> 
>> 
>> Benchmark            (addSize)        (mapType)  (size)  Mode  Cnt   Score   Error  Units
>> HashMapBench.putAll     100000         HASH_MAP  100000  avgt    4  22.927 ± 3.170  ms/op
>> HashMapBench.putAll     100000  LINKED_HASH_MAP  100000  avgt    4  25.198 ± 2.189  ms/op
>> 
>> 
>> After change
>> 
>> 
>> Benchmark            (addSize)        (mapType)  (size)  Mode  Cnt   Score   Error  Units
>> HashMapBench.putAll     100000         HASH_MAP  100000  avgt    4  16.780 ± 0.526  ms/op
>> HashMapBench.putAll     100000  LINKED_HASH_MAP  100000  avgt    4  19.721 ± 0.349  ms/op
>> 
>> 
>> We see about average time improvements of 26% in HashMap and 20% in LinkedHashMap.
>> 
>> ---
>> 
>> In the worse case, we may have two maps with identical keys. In this case, we would aggressively resize when we do not need to. I'm also adding an additional `putAllSameKeys` benchmark.
>> 
>> Before change:
>> 
>> 
>> Benchmark                                       (addSize)        (mapType)  (size)  Mode  Cnt        Score   Error   Units
>> HashMapBench.putAllSameKeys                        100000         HASH_MAP  100000  avgt             6.956           ms/op
>> HashMapBench.putAllSameKeys:gc.alloc.rate          100000         HASH_MAP  100000  avgt          1091.383          MB/sec
>> HashMapBench.putAllSameKeys:gc.alloc.rate.norm     100000         HASH_MAP  100000  avgt       7968871.917            B/op
>> HashMapBench.putAllSameKeys:gc.count               100000         HASH_MAP  100000  avgt               ≈ 0          counts
>> HashMapBench.putAllSameKeys                        100000  LINKED_HASH_MAP  100000  avgt             8.417           ms/op
>> HashMapBench.putAllSameKeys:gc.alloc.rate          100000  LINKED_HASH_MAP  100000  avgt           992.543          MB/sec
>> HashMapBench.putAllSameKeys:gc.alloc.rate.norm     100000  LINKED_HASH_MAP  100000  avgt       8768892.941            B/op
>> HashMapBench.putAllSameKeys:gc.count               100000  LINKED_HASH_MAP  100000  avgt               ≈ 0          counts
>> 
>> 
>> Af...
>
> Joshua Cao has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Use max of both sizes and other maps size in case of overflow

src/java.base/share/classes/java/util/HashMap.java line 503:

> 501:      */
> 502:     final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
> 503:         int s = Math.max(size() + m.size(), m.size());

If we decide this approach is best, shouldn't we do the same for ConcurrentHashMap?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17544#discussion_r1468425767