RFR: 8324573: HashMap::putAll should resize to sum of both map sizes [v4]

Fri Feb 9 06:38:02 UTC 2024

On Thu, 8 Feb 2024 17:01:15 GMT, Joshua Cao <duke at openjdk.org> wrote:

>> I think we should step back from benchmarks a bit and examine the various tradeoffs occurring here. Certainly we can speed things up by pre-resizing the map to be larger. However, this can result in a map that is too large for the number of mappings it contains, in the case where this map and the argument map have keys in common. In other words, it might waste space. We really have little idea of how frequently this occurs. It's easy to imagine scenarios where there is no commonality (which this change will speed up) and also where there is 100% commonality (where this change will result in wasted space). What's the right tradeoff?
>> 
>> I've linked some older bugs to the bug report for some historical perspective. In particular, see [this comment](https://bugs.openjdk.org/browse/JDK-4710319?focusedId=12240692&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-12240692) from Josh Bloch on JDK-4710319:
>> 
>>> The conservatism of the resizing heuristic for putAll is intentional. It can cause at most one extra resizing, and can result in substantial footprint savings. This too should be documented in the code.
>> 
>> The comment he added to putAll() for this change is still visible [here](https://github.com/openjdk/jdk/blob/e1b3c5b5ba5cfb8243d760e99887bbe1015a9d69/jdk/src/share/classes/java/util/HashMap.java#L1271), but unfortunately it was removed by a later refactoring. The comment is:
>> 
>> 
>>         /*
>>          * Expand the map if the map if the number of mappings to be added
>>          * is greater than or equal to threshold.  This is conservative; the
>>          * obvious condition is (m.size() + size) >= threshold, but this
>>          * condition could result in a map with twice the appropriate capacity,
>>          * if the keys to be added overlap with the keys already in this map.
>>          * By using the conservative calculation, we subject ourself
>>          * to at most one extra resize.
>>          */
>> 
>> 
>> Note that this comment addresses fairly directly the essence of the change being discussed here. I think it's still applicable; the current code in putMapEntries() compares `m.size()` to `threshold` in deciding whether to resize immediately. We're not constrained by a 20-year-old comment, though. We can change the resizing policy if we have good reason to do so. 
>> 
>> However, I think the concern about space wastage is still relevant, even after 20 years of increased memory capacity and decreased memory cost. Mem...
>
> @stuart-marks Thanks for reviewing. I think there is some history that is hard to find right now. I'd suggest the following:
> 
> 1. We don't need to close this PR. We can rename it to "Add comments for HashMap::putAll conservative expanding"
> 2. Undo the aggressive expanding in ConcurrentHashMap::putAll in a separate PR
> 
> @simonis what are your thoughts?

@caojoshua Sure, no need to close this right away or anything. My main point is to open up a different line of discussion and see where it leads us.

If we were to decide not to change HashMap as discussed here, I don't think it _necessarily_ follows that the ConcurrentHashMap::putAll change (https://github.com/openjdk/jdk/pull/17116, [JDK-8322149](https://bugs.openjdk.org/browse/JDK-8322149)) should be reverted. Maybe it should be, or maybe it shouldn't. There might be different circumstances around CHM. First, @DougLea said that the change was OK. :-) Second, although CHM allows updates to proceed concurrently with resizes, it may be that resizing CHM has additional overhead that is helpful to avoid. Third, users of CHM may have different expectations than users of HashMap; they might prefer higher throughput to lower memory usage, for example. I haven't looked at the CHM code in detail though. I'd caution against jumping to the conclusion that the other fix needs to be reverted.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17544#issuecomment-1935405832