RFR: 8324573: HashMap::putAll should resize to sum of both map sizes

Wed Jan 24 20:57:26 UTC 2024

On Wed, 24 Jan 2024 19:06:49 GMT, Volker Simonis <simonis at openjdk.org> wrote:

> The current benchmark and the change don't really cover the case where many keys exist in _both_ maps. Could you add a benchmark for that?

I added a benchmark that assumes the worse case. Please see the top post. Yes, this change is not always beneficial. I'd like to think the original benchmark, which uses random keys, is closer to real life scenarios. There may be some duplicate keys, but most keys are unique. But this is just my intuition, I do not have data to back this up.

One solution would be to loop through both maps and count number of unique keys. My guess is that this would mostly add unnecessary computations. We can compromise with something like `size() / 2 + m.size()`. My guess is the current aggressive approach is the best most of the time, but I don't have strong opinions and am happy to take suggestions.

> Also `int s = size() + m.size();` can overflow now, leading to different behavior. It might make sense to just cap the value?

Is it necessary? We can go with @simonis 's suggestion. I think if we move forward with this, it would make sense to change all instances of `size` to long type, which would then suggest we should change the `Map::size()` api to return a long. The reason I am not concerned about overflow is that the Map API implies that we should only care about maps with at most 2^32 items.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17544#issuecomment-1908899013