RFR: 8324573: HashMap::putAll should resize to sum of both map sizes [v4]
Volker Simonis
simonis at openjdk.org
Fri Feb 9 14:08:57 UTC 2024
On Fri, 2 Feb 2024 17:38:13 GMT, Joshua Cao <duke at openjdk.org> wrote:
>> This change mirrors what we did for ConcurrentHashMap in https://github.com/openjdk/jdk/pull/17116. When we add all entries from one map to anther, we should resize that map to the size of the sum of both maps.
>>
>> I used the command below to run the benchmarks. I set a high heap to reduce garbage collection noise.
>>
>> java -Xms25G -jar benchmarks.jar -p size=100000 -p addSize=100000 -gc true org.openjdk.bench.java.util.HashMapBench
>>
>>
>> Before change
>>
>>
>> Benchmark (addSize) (mapType) (size) Mode Cnt Score Error Units
>> HashMapBench.putAll 100000 HASH_MAP 100000 avgt 4 22.927 ± 3.170 ms/op
>> HashMapBench.putAll 100000 LINKED_HASH_MAP 100000 avgt 4 25.198 ± 2.189 ms/op
>>
>>
>> After change
>>
>>
>> Benchmark (addSize) (mapType) (size) Mode Cnt Score Error Units
>> HashMapBench.putAll 100000 HASH_MAP 100000 avgt 4 16.780 ± 0.526 ms/op
>> HashMapBench.putAll 100000 LINKED_HASH_MAP 100000 avgt 4 19.721 ± 0.349 ms/op
>>
>>
>> We see about average time improvements of 26% in HashMap and 20% in LinkedHashMap.
>>
>> ---
>>
>> In the worse case, we may have two maps with identical keys. In this case, we would aggressively resize when we do not need to. I'm also adding an additional `putAllSameKeys` benchmark.
>>
>> Before change:
>>
>>
>> Benchmark (addSize) (mapType) (size) Mode Cnt Score Error Units
>> HashMapBench.putAllSameKeys 100000 HASH_MAP 100000 avgt 6.956 ms/op
>> HashMapBench.putAllSameKeys:gc.alloc.rate 100000 HASH_MAP 100000 avgt 1091.383 MB/sec
>> HashMapBench.putAllSameKeys:gc.alloc.rate.norm 100000 HASH_MAP 100000 avgt 7968871.917 B/op
>> HashMapBench.putAllSameKeys:gc.count 100000 HASH_MAP 100000 avgt ≈ 0 counts
>> HashMapBench.putAllSameKeys 100000 LINKED_HASH_MAP 100000 avgt 8.417 ms/op
>> HashMapBench.putAllSameKeys:gc.alloc.rate 100000 LINKED_HASH_MAP 100000 avgt 992.543 MB/sec
>> HashMapBench.putAllSameKeys:gc.alloc.rate.norm 100000 LINKED_HASH_MAP 100000 avgt 8768892.941 B/op
>> HashMapBench.putAllSameKeys:gc.count 100000 LINKED_HASH_MAP 100000 avgt ≈ 0 counts
>>
>>
>> Af...
>
> Joshua Cao has updated the pull request incrementally with one additional commit since the last revision:
>
> extract msize variable
I agree that a comment would be helpful but I'm afraid that it will not be of much help if its buried in the code. So instead maybe better add an `@implNote` to the `putAll()` method which describes the current, conservative resizing heuristic and give the advice that users can presize the destination map accordingly if they want to avoid resizing when merging in other maps.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17544#issuecomment-1935992895
More information about the core-libs-dev
mailing list