RFR: 8324573: HashMap::putAll should resize to sum of both map sizes [v4]
Joshua Cao
duke at openjdk.org
Thu Feb 8 17:04:05 UTC 2024
On Wed, 7 Feb 2024 20:50:57 GMT, Stuart Marks <smarks at openjdk.org> wrote:
>> Joshua Cao has updated the pull request incrementally with one additional commit since the last revision:
>>
>> extract msize variable
>
> I think we should step back from benchmarks a bit and examine the various tradeoffs occurring here. Certainly we can speed things up by pre-resizing the map to be larger. However, this can result in a map that is too large for the number of mappings it contains, in the case where this map and the argument map have keys in common. In other words, it might waste space. We really have little idea of how frequently this occurs. It's easy to imagine scenarios where there is no commonality (which this change will speed up) and also where there is 100% commonality (where this change will result in wasted space). What's the right tradeoff?
>
> I've linked some older bugs to the bug report for some historical perspective. In particular, see [this comment](https://bugs.openjdk.org/browse/JDK-4710319?focusedId=12240692&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-12240692) from Josh Bloch on JDK-4710319:
>
>> The conservatism of the resizing heuristic for putAll is intentional. It can cause at most one extra resizing, and can result in substantial footprint savings. This too should be documented in the code.
>
> The comment he added to putAll() for this change is still visible [here](https://github.com/openjdk/jdk/blob/e1b3c5b5ba5cfb8243d760e99887bbe1015a9d69/jdk/src/share/classes/java/util/HashMap.java#L1271), but unfortunately it was removed by a later refactoring. The comment is:
>
>
> /*
> * Expand the map if the map if the number of mappings to be added
> * is greater than or equal to threshold. This is conservative; the
> * obvious condition is (m.size() + size) >= threshold, but this
> * condition could result in a map with twice the appropriate capacity,
> * if the keys to be added overlap with the keys already in this map.
> * By using the conservative calculation, we subject ourself
> * to at most one extra resize.
> */
>
>
> Note that this comment addresses fairly directly the essence of the change being discussed here. I think it's still applicable; the current code in putMapEntries() compares `m.size()` to `threshold` in deciding whether to resize immediately. We're not constrained by a 20-year-old comment, though. We can change the resizing policy if we have good reason to do so.
>
> However, I think the concern about space wastage is still relevant, even after 20 years of increased memory capacity and decreased memory cost. Memory is cheap but not free. Larger memory cons...
@stuart-marks Thanks for reviewing. I think there is some history that is hard to find right now. I'd suggest the following:
1. We don't need to close this PR. We can rename it to "Add comments for HashMap::putAll conservative expanding"
2. Undo the aggressive expanding in ConcurrentHashMap::putAll in a separate PR
@simonis what are your thoughts?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17544#issuecomment-1934554352
More information about the core-libs-dev
mailing list