HashMap.putAll can cause redundant space waste

Xeno Amess xenoamess at gmail.com
Thu Feb 10 18:14:26 UTC 2022


bug reported as 9072610
pr open at https://github.com/openjdk/jdk/pull/7431

I investigated most of the usages.
They just give a size, and get a capacity, even not change the 0.75
So maybe we can use some int calculation to replace the 0.75, thus replace
Math.ceil for such situations.

The only problem is that need adding some public api to HashMap or
Collections.
Yep, sounds like a better version of guava.newHashMap.

Xeno Amess <xenoamess at gmail.com> 于2022年2月9日周三 09:51写道:

> I agree that it would
>> be preferable for the expression to be something like this:
>
>      (int) Math.ceil(expected / 0.75)
>
>
> Agree.
>
> If we don't add a Java SE utility like "newHashMapWithExpectedSize",
>> fallbacks would
>> be to introduce an internal JDK utility method that is called from
>> various places,
>> or just to update the individual call sites to use the more precise
>> calculation.
>
>
> Both seem acceptable.
>
> So what should I do next?
>
> Stuart Marks <stuart.marks at oracle.com> 于2022年2月8日周二 04:06写道:
>
>> Hi,
>>
>> I'm not sure where you ended up in this succession of messages. I do
>> think there are
>> some things going on that are worthy of discussion and possibly fixing.
>> Let me try
>> to break them down.
>>
>> 1) With the default load factor of 0.75, it's possible to have 12 entries
>> in a map
>> whose table length is 16. This works as expected if the entries are added
>> individually using the put() method. However, adding 12 entries into a
>> map via the
>> copy constructor or via putAll() results in a table length of 32. That
>> seems wrong.
>> Well, if not exactly wrong, it's suboptimal, in that it uses more space
>> than necessary.
>>
>> 2) The root cause of the problem is with expressions such as these:
>>
>>      (int) (expected / 0.75f + 1.0f)
>> or
>>      (int) (expected / 0.75f) + 1
>>
>> (where "expected" is the expected number of entries). They attempt to
>> round the
>> division result up to the nearest int by adding 1 or 1.0f. This results
>> in a value
>> that's one too large if the result of the division exactly equals an
>> integer. So,
>> when "expected" is 12, this results in 17 instead of 16. This in turn is
>> inflated to
>> the next power-of-two, which is 32.
>>
>> 3) This was discussed previously in a different context. [1] I agree that
>> it would
>> be preferable for the expression to be something like this:
>>
>>      (int) Math.ceil(expected / 0.75)
>>
>> [1]
>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/066258.html
>>
>> This uses doubles instead of floats. But I think this is a fairly
>> low-frequency
>> operation, so I don't think using doubles is a problem. Thus I don't
>> think we need
>> to add a Math.ceil(float) overload.
>>
>> **
>>
>> So what should be done about this? Possibly, the computations in HashMap
>> and related
>> classes should be adjusted to use Math.ceil() instead. Some tests use the
>> suboptimal
>> calculation as well; those might need to be adjusted also.
>>
>> There are a variety of places around the JDK that use a similar
>> expression in order
>> to pre-size HashMaps and HashSets. We could go around and fix all them,
>> but it might
>> be worth considering adding an API that creates a HashMap (or
>> LinkedHashMap) that is
>> pre-sized for the expected number of elements. Guava has such an API,
>> Maps.newHashMapWithExpectedSize [2]. But note that it tries (but does not
>> guarantee)
>> to size the map such that it can accommodate the expected number of
>> elements without
>> resizing, but it doesn't promise that the map isn't oversized.
>>
>> [2]
>>
>> https://guava.dev/releases/31.0.1-jre/api/docs/com/google/common/collect/Maps.html#newHashMapWithExpectedSize(int)
>>
>> (Also note that the ConcurrentHashMap(int) constructor takes the expected
>> number of
>> elements, not the initial table length like HashMap(int). CHM does what
>> probably
>> most users want, but it's confusing that its behavior differs from
>> HashMap in this
>> regard.)
>>
>> If we don't add a Java SE utility like "newHashMapWithExpectedSize",
>> fallbacks would
>> be to introduce an internal JDK utility method that is called from
>> various places,
>> or just to update the individual call sites to use the more precise
>> calculation.
>>
>> s'marks
>>
>>
>> On 2/4/22 10:48 AM, Xeno Amess wrote:
>> > wait, things get interesting now.
>> > We do have such a test named ConstantDirectoryOptimalCapacity to make
>> sure
>> > this does not happen, but my tests still fill under the idea debugger.
>> > Is there any specialist for help? I would dig in but it is a little
>> > complicated.
>> > [image: image.png]
>> >
>> > Xeno Amess <xenoamess at gmail.com> 于2022年2月5日周六 02:39写道:
>> >
>> >> Sorry for my mistake.
>> >> After a more throughout dig in, this thing has already fixed several
>> years
>> >> ago, at year 2015.
>> >> Issue close.
>> >>
>> >> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 21:40写道:
>> >>
>> >>> Besides, is it worthy to develop a public float Math.ceil(float)
>> function?
>> >>>
>> >>> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 21:39写道:
>> >>>
>> >>>> patch applied.
>> >>>>
>> >>>> Index: src/java.base/share/classes/java/lang/Module.java
>> >>>> IDEA additional info:
>> >>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
>> >>>> <+>UTF-8
>> >>>> ===================================================================
>> >>>> diff --git a/src/java.base/share/classes/java/lang/Module.java
>> >>>> b/src/java.base/share/classes/java/lang/Module.java
>> >>>> --- a/src/java.base/share/classes/java/lang/Module.java (revision
>> >>>> 3d926dd66ef6551e91a4ebbbc59dcff58f5ede5a)
>> >>>> +++ b/src/java.base/share/classes/java/lang/Module.java (revision
>> >>>> deeba25d15398fea8bc971ac915e348878b2c27a)
>> >>>> @@ -1133,7 +1133,7 @@
>> >>>>           boolean isBootLayer = (ModuleLayer.boot() == null);
>> >>>>
>> >>>>           int numModules = cf.modules().size();
>> >>>> -        int cap = (int)(numModules / 0.75f + 1.0f);
>> >>>> +        int cap = (int)Math.ceil(numModules / 0.75f);
>> >>>>           Map<String, Module> nameToModule = new HashMap<>(cap);
>> >>>>
>> >>>>           // to avoid repeated lookups and reduce iteration
>> overhead, we
>> >>>> create
>> >>>> Index: src/java.base/share/classes/java/util/HashMap.java
>> >>>> IDEA additional info:
>> >>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
>> >>>> <+>UTF-8
>> >>>> ===================================================================
>> >>>> diff --git a/src/java.base/share/classes/java/util/HashMap.java
>> >>>> b/src/java.base/share/classes/java/util/HashMap.java
>> >>>> --- a/src/java.base/share/classes/java/util/HashMap.java (revision
>> >>>> 3d926dd66ef6551e91a4ebbbc59dcff58f5ede5a)
>> >>>> +++ b/src/java.base/share/classes/java/util/HashMap.java (revision
>> >>>> deeba25d15398fea8bc971ac915e348878b2c27a)
>> >>>> @@ -495,9 +495,9 @@
>> >>>>           int s = m.size();
>> >>>>           if (s > 0) {
>> >>>>               if (table == null) { // pre-size
>> >>>> -                float ft = ((float)s / loadFactor) + 1.0F;
>> >>>> -                int t = ((ft < (float)MAXIMUM_CAPACITY) ?
>> >>>> -                         (int)ft : MAXIMUM_CAPACITY);
>> >>>> +                double dt = Math.ceil(s / loadFactor);
>> >>>> +                int t = ((dt < (double)MAXIMUM_CAPACITY) ?
>> >>>> +                         (int)dt : MAXIMUM_CAPACITY);
>> >>>>                   if (t > threshold)
>> >>>>                       threshold = tableSizeFor(t);
>> >>>>               } else {
>> >>>> @@ -1527,12 +1527,12 @@
>> >>>>           } else if (mappings == 0) {
>> >>>>               // use defaults
>> >>>>           } else if (mappings > 0) {
>> >>>> -            float fc = (float)mappings / lf + 1.0f;
>> >>>> -            int cap = ((fc < DEFAULT_INITIAL_CAPACITY) ?
>> >>>> +            double dc = Math.ceil(mappings / lf);
>> >>>> +            int cap = ((dc < DEFAULT_INITIAL_CAPACITY) ?
>> >>>>                          DEFAULT_INITIAL_CAPACITY :
>> >>>> -                       (fc >= MAXIMUM_CAPACITY) ?
>> >>>> +                       (dc >= MAXIMUM_CAPACITY) ?
>> >>>>                          MAXIMUM_CAPACITY :
>> >>>> -                       tableSizeFor((int)fc));
>> >>>> +                       tableSizeFor((int)dc));
>> >>>>               float ft = (float)cap * lf;
>> >>>>               threshold = ((cap < MAXIMUM_CAPACITY && ft <
>> >>>> MAXIMUM_CAPACITY) ?
>> >>>>                            (int)ft : Integer.MAX_VALUE);
>> >>>> Index: src/java.base/share/classes/java/util/WeakHashMap.java
>> >>>> IDEA additional info:
>> >>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
>> >>>> <+>UTF-8
>> >>>> ===================================================================
>> >>>> diff --git a/src/java.base/share/classes/java/util/WeakHashMap.java
>> >>>> b/src/java.base/share/classes/java/util/WeakHashMap.java
>> >>>> --- a/src/java.base/share/classes/java/util/WeakHashMap.java
>> (revision
>> >>>> 3d926dd66ef6551e91a4ebbbc59dcff58f5ede5a)
>> >>>> +++ b/src/java.base/share/classes/java/util/WeakHashMap.java
>> (revision
>> >>>> deeba25d15398fea8bc971ac915e348878b2c27a)
>> >>>> @@ -251,7 +251,7 @@
>> >>>>        * @since   1.3
>> >>>>        */
>> >>>>       public WeakHashMap(Map<? extends K, ? extends V> m) {
>> >>>> -        this(Math.max((int) ((float)m.size() / DEFAULT_LOAD_FACTOR +
>> >>>> 1.0F),
>> >>>> +        this(Math.max((int) Math.ceil(m.size() /
>> DEFAULT_LOAD_FACTOR),
>> >>>>                   DEFAULT_INITIAL_CAPACITY),
>> >>>>                DEFAULT_LOAD_FACTOR);
>> >>>>           putAll(m);
>> >>>>
>> >>>> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 18:45写道:
>> >>>>
>> >>>>> also find some other places have same problem.
>> >>>>> If some of your already-in people aggree, I would create a pr, but
>> >>>>> according to the rules seems I should wait for now.
>> >>>>>
>> >>>>> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 18:42写道:
>> >>>>>
>> >>>>>> import java.lang.reflect.Array;
>> >>>>>> import java.lang.reflect.Field;
>> >>>>>> import java.util.HashMap;
>> >>>>>> import java.util.Map;
>> >>>>>>
>> >>>>>> public class TestMap {
>> >>>>>>
>> >>>>>>      public static void main(String[] args) throws
>> NoSuchFieldException, IllegalAccessException {
>> >>>>>>          HashMap<Object, Object> a = new HashMap<>();
>> >>>>>>          fill12(a);
>> >>>>>>          HashMap<Object, Object> b = new HashMap<>(12);
>> >>>>>>          fill12(b);
>> >>>>>>          HashMap<Object, Object> c = new HashMap<>(a);
>> >>>>>>          HashMap<Object, Object> d = new HashMap<>();
>> >>>>>>          d.putAll(a);
>> >>>>>>          System.out.println("a : " + getArrayLength(a));
>> >>>>>>          System.out.println("b : " + getArrayLength(b));
>> >>>>>>          System.out.println("c : " + getArrayLength(c));
>> >>>>>>          System.out.println("d : " + getArrayLength(d));
>> >>>>>>      }
>> >>>>>>
>> >>>>>>      public static void fill12(Map<Object, Object> map) {
>> >>>>>>          for (int i = 0; i < 12; i++) {
>> >>>>>>              map.put(i, i);
>> >>>>>>          }
>> >>>>>>      }
>> >>>>>>
>> >>>>>>      public static int getArrayLength(Map<Object, Object> map)
>> throws NoSuchFieldException, IllegalAccessException {
>> >>>>>>          Field field = HashMap.class.getDeclaredField("table");
>> >>>>>>          field.setAccessible(true);
>> >>>>>>          Object table = field.get(map);
>> >>>>>>          return Array.getLength(table);
>> >>>>>>      }
>> >>>>>>
>> >>>>>> }
>> >>>>>>
>> >>>>>> run this and we get the output:
>> >>>>>>
>> >>>>>> a : 16
>> >>>>>> b : 16
>> >>>>>> c : 32
>> >>>>>> d : 32
>> >>>>>>
>> >>>>>> So I go see the codes.
>> >>>>>>
>> >>>>>> /**
>> >>>>>>   * Implements Map.putAll and Map constructor.
>> >>>>>>   *
>> >>>>>>   * @param m the map
>> >>>>>>   * @param evict false when initially constructing this map, else
>> >>>>>>   * true (relayed to method afterNodeInsertion).
>> >>>>>>   */
>> >>>>>> final void putMapEntries(Map<? extends K, ? extends V> m, boolean
>> evict) {
>> >>>>>>      int s = m.size();
>> >>>>>>      if (s > 0) {
>> >>>>>>          if (table == null) { // pre-size
>> >>>>>>              float ft = ((float)s / loadFactor) + 1.0F;
>> >>>>>>              int t = ((ft < (float)MAXIMUM_CAPACITY) ?
>> >>>>>>                       (int)ft : MAXIMUM_CAPACITY);
>> >>>>>>              if (t > threshold)
>> >>>>>>                  threshold = tableSizeFor(t);
>> >>>>>>          } else {
>> >>>>>>              // Because of linked-list bucket constraints, we
>> cannot
>> >>>>>>              // expand all at once, but can reduce total resize
>> >>>>>>              // effort by repeated doubling now vs later
>> >>>>>>              while (s > threshold && table.length <
>> MAXIMUM_CAPACITY)
>> >>>>>>                  resize();
>> >>>>>>          }
>> >>>>>>
>> >>>>>>          for (Map.Entry<? extends K, ? extends V> e :
>> m.entrySet()) {
>> >>>>>>              K key = e.getKey();
>> >>>>>>              V value = e.getValue();
>> >>>>>>              putVal(hash(key), key, value, false, evict);
>> >>>>>>          }
>> >>>>>>      }
>> >>>>>> }
>> >>>>>>
>> >>>>>> yep I do think *((float)s / loadFactor) + 1.0F* here is wrong.
>> >>>>>>
>> >>>>>> It should be *Math.ceil((float)s / loadFactor)*
>> >>>>>>
>> >>>>>> So I wish to generate a pull request.
>> >>>>>>
>> >>>>>> Anyone interested?
>> >>>>>>
>> >>>>>>
>>
>


More information about the core-libs-dev mailing list