HashMap.putAll can cause redundant space waste
Stuart Marks
stuart.marks at oracle.com
Mon Feb 7 20:06:48 UTC 2022
Hi,
I'm not sure where you ended up in this succession of messages. I do think there are
some things going on that are worthy of discussion and possibly fixing. Let me try
to break them down.
1) With the default load factor of 0.75, it's possible to have 12 entries in a map
whose table length is 16. This works as expected if the entries are added
individually using the put() method. However, adding 12 entries into a map via the
copy constructor or via putAll() results in a table length of 32. That seems wrong.
Well, if not exactly wrong, it's suboptimal, in that it uses more space than necessary.
2) The root cause of the problem is with expressions such as these:
(int) (expected / 0.75f + 1.0f)
or
(int) (expected / 0.75f) + 1
(where "expected" is the expected number of entries). They attempt to round the
division result up to the nearest int by adding 1 or 1.0f. This results in a value
that's one too large if the result of the division exactly equals an integer. So,
when "expected" is 12, this results in 17 instead of 16. This in turn is inflated to
the next power-of-two, which is 32.
3) This was discussed previously in a different context. [1] I agree that it would
be preferable for the expression to be something like this:
(int) Math.ceil(expected / 0.75)
[1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/066258.html
This uses doubles instead of floats. But I think this is a fairly low-frequency
operation, so I don't think using doubles is a problem. Thus I don't think we need
to add a Math.ceil(float) overload.
**
So what should be done about this? Possibly, the computations in HashMap and related
classes should be adjusted to use Math.ceil() instead. Some tests use the suboptimal
calculation as well; those might need to be adjusted also.
There are a variety of places around the JDK that use a similar expression in order
to pre-size HashMaps and HashSets. We could go around and fix all them, but it might
be worth considering adding an API that creates a HashMap (or LinkedHashMap) that is
pre-sized for the expected number of elements. Guava has such an API,
Maps.newHashMapWithExpectedSize [2]. But note that it tries (but does not guarantee)
to size the map such that it can accommodate the expected number of elements without
resizing, but it doesn't promise that the map isn't oversized.
[2]
https://guava.dev/releases/31.0.1-jre/api/docs/com/google/common/collect/Maps.html#newHashMapWithExpectedSize(int)
(Also note that the ConcurrentHashMap(int) constructor takes the expected number of
elements, not the initial table length like HashMap(int). CHM does what probably
most users want, but it's confusing that its behavior differs from HashMap in this
regard.)
If we don't add a Java SE utility like "newHashMapWithExpectedSize", fallbacks would
be to introduce an internal JDK utility method that is called from various places,
or just to update the individual call sites to use the more precise calculation.
s'marks
On 2/4/22 10:48 AM, Xeno Amess wrote:
> wait, things get interesting now.
> We do have such a test named ConstantDirectoryOptimalCapacity to make sure
> this does not happen, but my tests still fill under the idea debugger.
> Is there any specialist for help? I would dig in but it is a little
> complicated.
> [image: image.png]
>
> Xeno Amess <xenoamess at gmail.com> 于2022年2月5日周六 02:39写道:
>
>> Sorry for my mistake.
>> After a more throughout dig in, this thing has already fixed several years
>> ago, at year 2015.
>> Issue close.
>>
>> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 21:40写道:
>>
>>> Besides, is it worthy to develop a public float Math.ceil(float) function?
>>>
>>> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 21:39写道:
>>>
>>>> patch applied.
>>>>
>>>> Index: src/java.base/share/classes/java/lang/Module.java
>>>> IDEA additional info:
>>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
>>>> <+>UTF-8
>>>> ===================================================================
>>>> diff --git a/src/java.base/share/classes/java/lang/Module.java
>>>> b/src/java.base/share/classes/java/lang/Module.java
>>>> --- a/src/java.base/share/classes/java/lang/Module.java (revision
>>>> 3d926dd66ef6551e91a4ebbbc59dcff58f5ede5a)
>>>> +++ b/src/java.base/share/classes/java/lang/Module.java (revision
>>>> deeba25d15398fea8bc971ac915e348878b2c27a)
>>>> @@ -1133,7 +1133,7 @@
>>>> boolean isBootLayer = (ModuleLayer.boot() == null);
>>>>
>>>> int numModules = cf.modules().size();
>>>> - int cap = (int)(numModules / 0.75f + 1.0f);
>>>> + int cap = (int)Math.ceil(numModules / 0.75f);
>>>> Map<String, Module> nameToModule = new HashMap<>(cap);
>>>>
>>>> // to avoid repeated lookups and reduce iteration overhead, we
>>>> create
>>>> Index: src/java.base/share/classes/java/util/HashMap.java
>>>> IDEA additional info:
>>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
>>>> <+>UTF-8
>>>> ===================================================================
>>>> diff --git a/src/java.base/share/classes/java/util/HashMap.java
>>>> b/src/java.base/share/classes/java/util/HashMap.java
>>>> --- a/src/java.base/share/classes/java/util/HashMap.java (revision
>>>> 3d926dd66ef6551e91a4ebbbc59dcff58f5ede5a)
>>>> +++ b/src/java.base/share/classes/java/util/HashMap.java (revision
>>>> deeba25d15398fea8bc971ac915e348878b2c27a)
>>>> @@ -495,9 +495,9 @@
>>>> int s = m.size();
>>>> if (s > 0) {
>>>> if (table == null) { // pre-size
>>>> - float ft = ((float)s / loadFactor) + 1.0F;
>>>> - int t = ((ft < (float)MAXIMUM_CAPACITY) ?
>>>> - (int)ft : MAXIMUM_CAPACITY);
>>>> + double dt = Math.ceil(s / loadFactor);
>>>> + int t = ((dt < (double)MAXIMUM_CAPACITY) ?
>>>> + (int)dt : MAXIMUM_CAPACITY);
>>>> if (t > threshold)
>>>> threshold = tableSizeFor(t);
>>>> } else {
>>>> @@ -1527,12 +1527,12 @@
>>>> } else if (mappings == 0) {
>>>> // use defaults
>>>> } else if (mappings > 0) {
>>>> - float fc = (float)mappings / lf + 1.0f;
>>>> - int cap = ((fc < DEFAULT_INITIAL_CAPACITY) ?
>>>> + double dc = Math.ceil(mappings / lf);
>>>> + int cap = ((dc < DEFAULT_INITIAL_CAPACITY) ?
>>>> DEFAULT_INITIAL_CAPACITY :
>>>> - (fc >= MAXIMUM_CAPACITY) ?
>>>> + (dc >= MAXIMUM_CAPACITY) ?
>>>> MAXIMUM_CAPACITY :
>>>> - tableSizeFor((int)fc));
>>>> + tableSizeFor((int)dc));
>>>> float ft = (float)cap * lf;
>>>> threshold = ((cap < MAXIMUM_CAPACITY && ft <
>>>> MAXIMUM_CAPACITY) ?
>>>> (int)ft : Integer.MAX_VALUE);
>>>> Index: src/java.base/share/classes/java/util/WeakHashMap.java
>>>> IDEA additional info:
>>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
>>>> <+>UTF-8
>>>> ===================================================================
>>>> diff --git a/src/java.base/share/classes/java/util/WeakHashMap.java
>>>> b/src/java.base/share/classes/java/util/WeakHashMap.java
>>>> --- a/src/java.base/share/classes/java/util/WeakHashMap.java (revision
>>>> 3d926dd66ef6551e91a4ebbbc59dcff58f5ede5a)
>>>> +++ b/src/java.base/share/classes/java/util/WeakHashMap.java (revision
>>>> deeba25d15398fea8bc971ac915e348878b2c27a)
>>>> @@ -251,7 +251,7 @@
>>>> * @since 1.3
>>>> */
>>>> public WeakHashMap(Map<? extends K, ? extends V> m) {
>>>> - this(Math.max((int) ((float)m.size() / DEFAULT_LOAD_FACTOR +
>>>> 1.0F),
>>>> + this(Math.max((int) Math.ceil(m.size() / DEFAULT_LOAD_FACTOR),
>>>> DEFAULT_INITIAL_CAPACITY),
>>>> DEFAULT_LOAD_FACTOR);
>>>> putAll(m);
>>>>
>>>> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 18:45写道:
>>>>
>>>>> also find some other places have same problem.
>>>>> If some of your already-in people aggree, I would create a pr, but
>>>>> according to the rules seems I should wait for now.
>>>>>
>>>>> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 18:42写道:
>>>>>
>>>>>> import java.lang.reflect.Array;
>>>>>> import java.lang.reflect.Field;
>>>>>> import java.util.HashMap;
>>>>>> import java.util.Map;
>>>>>>
>>>>>> public class TestMap {
>>>>>>
>>>>>> public static void main(String[] args) throws NoSuchFieldException, IllegalAccessException {
>>>>>> HashMap<Object, Object> a = new HashMap<>();
>>>>>> fill12(a);
>>>>>> HashMap<Object, Object> b = new HashMap<>(12);
>>>>>> fill12(b);
>>>>>> HashMap<Object, Object> c = new HashMap<>(a);
>>>>>> HashMap<Object, Object> d = new HashMap<>();
>>>>>> d.putAll(a);
>>>>>> System.out.println("a : " + getArrayLength(a));
>>>>>> System.out.println("b : " + getArrayLength(b));
>>>>>> System.out.println("c : " + getArrayLength(c));
>>>>>> System.out.println("d : " + getArrayLength(d));
>>>>>> }
>>>>>>
>>>>>> public static void fill12(Map<Object, Object> map) {
>>>>>> for (int i = 0; i < 12; i++) {
>>>>>> map.put(i, i);
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> public static int getArrayLength(Map<Object, Object> map) throws NoSuchFieldException, IllegalAccessException {
>>>>>> Field field = HashMap.class.getDeclaredField("table");
>>>>>> field.setAccessible(true);
>>>>>> Object table = field.get(map);
>>>>>> return Array.getLength(table);
>>>>>> }
>>>>>>
>>>>>> }
>>>>>>
>>>>>> run this and we get the output:
>>>>>>
>>>>>> a : 16
>>>>>> b : 16
>>>>>> c : 32
>>>>>> d : 32
>>>>>>
>>>>>> So I go see the codes.
>>>>>>
>>>>>> /**
>>>>>> * Implements Map.putAll and Map constructor.
>>>>>> *
>>>>>> * @param m the map
>>>>>> * @param evict false when initially constructing this map, else
>>>>>> * true (relayed to method afterNodeInsertion).
>>>>>> */
>>>>>> final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
>>>>>> int s = m.size();
>>>>>> if (s > 0) {
>>>>>> if (table == null) { // pre-size
>>>>>> float ft = ((float)s / loadFactor) + 1.0F;
>>>>>> int t = ((ft < (float)MAXIMUM_CAPACITY) ?
>>>>>> (int)ft : MAXIMUM_CAPACITY);
>>>>>> if (t > threshold)
>>>>>> threshold = tableSizeFor(t);
>>>>>> } else {
>>>>>> // Because of linked-list bucket constraints, we cannot
>>>>>> // expand all at once, but can reduce total resize
>>>>>> // effort by repeated doubling now vs later
>>>>>> while (s > threshold && table.length < MAXIMUM_CAPACITY)
>>>>>> resize();
>>>>>> }
>>>>>>
>>>>>> for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
>>>>>> K key = e.getKey();
>>>>>> V value = e.getValue();
>>>>>> putVal(hash(key), key, value, false, evict);
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> yep I do think *((float)s / loadFactor) + 1.0F* here is wrong.
>>>>>>
>>>>>> It should be *Math.ceil((float)s / loadFactor)*
>>>>>>
>>>>>> So I wish to generate a pull request.
>>>>>>
>>>>>> Anyone interested?
>>>>>>
>>>>>>
More information about the core-libs-dev
mailing list