HashMap.putAll can cause redundant space waste

Xeno Amess xenoamess at gmail.com
Wed Feb 9 01:51:57 UTC 2022


>
> I agree that it would
> be preferable for the expression to be something like this:

     (int) Math.ceil(expected / 0.75)


Agree.

If we don't add a Java SE utility like "newHashMapWithExpectedSize",
> fallbacks would
> be to introduce an internal JDK utility method that is called from various
> places,
> or just to update the individual call sites to use the more precise
> calculation.


Both seem acceptable.

So what should I do next?

Stuart Marks <stuart.marks at oracle.com> 于2022年2月8日周二 04:06写道:

> Hi,
>
> I'm not sure where you ended up in this succession of messages. I do think
> there are
> some things going on that are worthy of discussion and possibly fixing.
> Let me try
> to break them down.
>
> 1) With the default load factor of 0.75, it's possible to have 12 entries
> in a map
> whose table length is 16. This works as expected if the entries are added
> individually using the put() method. However, adding 12 entries into a map
> via the
> copy constructor or via putAll() results in a table length of 32. That
> seems wrong.
> Well, if not exactly wrong, it's suboptimal, in that it uses more space
> than necessary.
>
> 2) The root cause of the problem is with expressions such as these:
>
>      (int) (expected / 0.75f + 1.0f)
> or
>      (int) (expected / 0.75f) + 1
>
> (where "expected" is the expected number of entries). They attempt to
> round the
> division result up to the nearest int by adding 1 or 1.0f. This results in
> a value
> that's one too large if the result of the division exactly equals an
> integer. So,
> when "expected" is 12, this results in 17 instead of 16. This in turn is
> inflated to
> the next power-of-two, which is 32.
>
> 3) This was discussed previously in a different context. [1] I agree that
> it would
> be preferable for the expression to be something like this:
>
>      (int) Math.ceil(expected / 0.75)
>
> [1]
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/066258.html
>
> This uses doubles instead of floats. But I think this is a fairly
> low-frequency
> operation, so I don't think using doubles is a problem. Thus I don't think
> we need
> to add a Math.ceil(float) overload.
>
> **
>
> So what should be done about this? Possibly, the computations in HashMap
> and related
> classes should be adjusted to use Math.ceil() instead. Some tests use the
> suboptimal
> calculation as well; those might need to be adjusted also.
>
> There are a variety of places around the JDK that use a similar expression
> in order
> to pre-size HashMaps and HashSets. We could go around and fix all them,
> but it might
> be worth considering adding an API that creates a HashMap (or
> LinkedHashMap) that is
> pre-sized for the expected number of elements. Guava has such an API,
> Maps.newHashMapWithExpectedSize [2]. But note that it tries (but does not
> guarantee)
> to size the map such that it can accommodate the expected number of
> elements without
> resizing, but it doesn't promise that the map isn't oversized.
>
> [2]
>
> https://guava.dev/releases/31.0.1-jre/api/docs/com/google/common/collect/Maps.html#newHashMapWithExpectedSize(int)
>
> (Also note that the ConcurrentHashMap(int) constructor takes the expected
> number of
> elements, not the initial table length like HashMap(int). CHM does what
> probably
> most users want, but it's confusing that its behavior differs from HashMap
> in this
> regard.)
>
> If we don't add a Java SE utility like "newHashMapWithExpectedSize",
> fallbacks would
> be to introduce an internal JDK utility method that is called from various
> places,
> or just to update the individual call sites to use the more precise
> calculation.
>
> s'marks
>
>
> On 2/4/22 10:48 AM, Xeno Amess wrote:
> > wait, things get interesting now.
> > We do have such a test named ConstantDirectoryOptimalCapacity to make
> sure
> > this does not happen, but my tests still fill under the idea debugger.
> > Is there any specialist for help? I would dig in but it is a little
> > complicated.
> > [image: image.png]
> >
> > Xeno Amess <xenoamess at gmail.com> 于2022年2月5日周六 02:39写道:
> >
> >> Sorry for my mistake.
> >> After a more throughout dig in, this thing has already fixed several
> years
> >> ago, at year 2015.
> >> Issue close.
> >>
> >> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 21:40写道:
> >>
> >>> Besides, is it worthy to develop a public float Math.ceil(float)
> function?
> >>>
> >>> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 21:39写道:
> >>>
> >>>> patch applied.
> >>>>
> >>>> Index: src/java.base/share/classes/java/lang/Module.java
> >>>> IDEA additional info:
> >>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> >>>> <+>UTF-8
> >>>> ===================================================================
> >>>> diff --git a/src/java.base/share/classes/java/lang/Module.java
> >>>> b/src/java.base/share/classes/java/lang/Module.java
> >>>> --- a/src/java.base/share/classes/java/lang/Module.java (revision
> >>>> 3d926dd66ef6551e91a4ebbbc59dcff58f5ede5a)
> >>>> +++ b/src/java.base/share/classes/java/lang/Module.java (revision
> >>>> deeba25d15398fea8bc971ac915e348878b2c27a)
> >>>> @@ -1133,7 +1133,7 @@
> >>>>           boolean isBootLayer = (ModuleLayer.boot() == null);
> >>>>
> >>>>           int numModules = cf.modules().size();
> >>>> -        int cap = (int)(numModules / 0.75f + 1.0f);
> >>>> +        int cap = (int)Math.ceil(numModules / 0.75f);
> >>>>           Map<String, Module> nameToModule = new HashMap<>(cap);
> >>>>
> >>>>           // to avoid repeated lookups and reduce iteration overhead,
> we
> >>>> create
> >>>> Index: src/java.base/share/classes/java/util/HashMap.java
> >>>> IDEA additional info:
> >>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> >>>> <+>UTF-8
> >>>> ===================================================================
> >>>> diff --git a/src/java.base/share/classes/java/util/HashMap.java
> >>>> b/src/java.base/share/classes/java/util/HashMap.java
> >>>> --- a/src/java.base/share/classes/java/util/HashMap.java (revision
> >>>> 3d926dd66ef6551e91a4ebbbc59dcff58f5ede5a)
> >>>> +++ b/src/java.base/share/classes/java/util/HashMap.java (revision
> >>>> deeba25d15398fea8bc971ac915e348878b2c27a)
> >>>> @@ -495,9 +495,9 @@
> >>>>           int s = m.size();
> >>>>           if (s > 0) {
> >>>>               if (table == null) { // pre-size
> >>>> -                float ft = ((float)s / loadFactor) + 1.0F;
> >>>> -                int t = ((ft < (float)MAXIMUM_CAPACITY) ?
> >>>> -                         (int)ft : MAXIMUM_CAPACITY);
> >>>> +                double dt = Math.ceil(s / loadFactor);
> >>>> +                int t = ((dt < (double)MAXIMUM_CAPACITY) ?
> >>>> +                         (int)dt : MAXIMUM_CAPACITY);
> >>>>                   if (t > threshold)
> >>>>                       threshold = tableSizeFor(t);
> >>>>               } else {
> >>>> @@ -1527,12 +1527,12 @@
> >>>>           } else if (mappings == 0) {
> >>>>               // use defaults
> >>>>           } else if (mappings > 0) {
> >>>> -            float fc = (float)mappings / lf + 1.0f;
> >>>> -            int cap = ((fc < DEFAULT_INITIAL_CAPACITY) ?
> >>>> +            double dc = Math.ceil(mappings / lf);
> >>>> +            int cap = ((dc < DEFAULT_INITIAL_CAPACITY) ?
> >>>>                          DEFAULT_INITIAL_CAPACITY :
> >>>> -                       (fc >= MAXIMUM_CAPACITY) ?
> >>>> +                       (dc >= MAXIMUM_CAPACITY) ?
> >>>>                          MAXIMUM_CAPACITY :
> >>>> -                       tableSizeFor((int)fc));
> >>>> +                       tableSizeFor((int)dc));
> >>>>               float ft = (float)cap * lf;
> >>>>               threshold = ((cap < MAXIMUM_CAPACITY && ft <
> >>>> MAXIMUM_CAPACITY) ?
> >>>>                            (int)ft : Integer.MAX_VALUE);
> >>>> Index: src/java.base/share/classes/java/util/WeakHashMap.java
> >>>> IDEA additional info:
> >>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> >>>> <+>UTF-8
> >>>> ===================================================================
> >>>> diff --git a/src/java.base/share/classes/java/util/WeakHashMap.java
> >>>> b/src/java.base/share/classes/java/util/WeakHashMap.java
> >>>> --- a/src/java.base/share/classes/java/util/WeakHashMap.java (revision
> >>>> 3d926dd66ef6551e91a4ebbbc59dcff58f5ede5a)
> >>>> +++ b/src/java.base/share/classes/java/util/WeakHashMap.java (revision
> >>>> deeba25d15398fea8bc971ac915e348878b2c27a)
> >>>> @@ -251,7 +251,7 @@
> >>>>        * @since   1.3
> >>>>        */
> >>>>       public WeakHashMap(Map<? extends K, ? extends V> m) {
> >>>> -        this(Math.max((int) ((float)m.size() / DEFAULT_LOAD_FACTOR +
> >>>> 1.0F),
> >>>> +        this(Math.max((int) Math.ceil(m.size() /
> DEFAULT_LOAD_FACTOR),
> >>>>                   DEFAULT_INITIAL_CAPACITY),
> >>>>                DEFAULT_LOAD_FACTOR);
> >>>>           putAll(m);
> >>>>
> >>>> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 18:45写道:
> >>>>
> >>>>> also find some other places have same problem.
> >>>>> If some of your already-in people aggree, I would create a pr, but
> >>>>> according to the rules seems I should wait for now.
> >>>>>
> >>>>> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 18:42写道:
> >>>>>
> >>>>>> import java.lang.reflect.Array;
> >>>>>> import java.lang.reflect.Field;
> >>>>>> import java.util.HashMap;
> >>>>>> import java.util.Map;
> >>>>>>
> >>>>>> public class TestMap {
> >>>>>>
> >>>>>>      public static void main(String[] args) throws
> NoSuchFieldException, IllegalAccessException {
> >>>>>>          HashMap<Object, Object> a = new HashMap<>();
> >>>>>>          fill12(a);
> >>>>>>          HashMap<Object, Object> b = new HashMap<>(12);
> >>>>>>          fill12(b);
> >>>>>>          HashMap<Object, Object> c = new HashMap<>(a);
> >>>>>>          HashMap<Object, Object> d = new HashMap<>();
> >>>>>>          d.putAll(a);
> >>>>>>          System.out.println("a : " + getArrayLength(a));
> >>>>>>          System.out.println("b : " + getArrayLength(b));
> >>>>>>          System.out.println("c : " + getArrayLength(c));
> >>>>>>          System.out.println("d : " + getArrayLength(d));
> >>>>>>      }
> >>>>>>
> >>>>>>      public static void fill12(Map<Object, Object> map) {
> >>>>>>          for (int i = 0; i < 12; i++) {
> >>>>>>              map.put(i, i);
> >>>>>>          }
> >>>>>>      }
> >>>>>>
> >>>>>>      public static int getArrayLength(Map<Object, Object> map)
> throws NoSuchFieldException, IllegalAccessException {
> >>>>>>          Field field = HashMap.class.getDeclaredField("table");
> >>>>>>          field.setAccessible(true);
> >>>>>>          Object table = field.get(map);
> >>>>>>          return Array.getLength(table);
> >>>>>>      }
> >>>>>>
> >>>>>> }
> >>>>>>
> >>>>>> run this and we get the output:
> >>>>>>
> >>>>>> a : 16
> >>>>>> b : 16
> >>>>>> c : 32
> >>>>>> d : 32
> >>>>>>
> >>>>>> So I go see the codes.
> >>>>>>
> >>>>>> /**
> >>>>>>   * Implements Map.putAll and Map constructor.
> >>>>>>   *
> >>>>>>   * @param m the map
> >>>>>>   * @param evict false when initially constructing this map, else
> >>>>>>   * true (relayed to method afterNodeInsertion).
> >>>>>>   */
> >>>>>> final void putMapEntries(Map<? extends K, ? extends V> m, boolean
> evict) {
> >>>>>>      int s = m.size();
> >>>>>>      if (s > 0) {
> >>>>>>          if (table == null) { // pre-size
> >>>>>>              float ft = ((float)s / loadFactor) + 1.0F;
> >>>>>>              int t = ((ft < (float)MAXIMUM_CAPACITY) ?
> >>>>>>                       (int)ft : MAXIMUM_CAPACITY);
> >>>>>>              if (t > threshold)
> >>>>>>                  threshold = tableSizeFor(t);
> >>>>>>          } else {
> >>>>>>              // Because of linked-list bucket constraints, we cannot
> >>>>>>              // expand all at once, but can reduce total resize
> >>>>>>              // effort by repeated doubling now vs later
> >>>>>>              while (s > threshold && table.length <
> MAXIMUM_CAPACITY)
> >>>>>>                  resize();
> >>>>>>          }
> >>>>>>
> >>>>>>          for (Map.Entry<? extends K, ? extends V> e : m.entrySet())
> {
> >>>>>>              K key = e.getKey();
> >>>>>>              V value = e.getValue();
> >>>>>>              putVal(hash(key), key, value, false, evict);
> >>>>>>          }
> >>>>>>      }
> >>>>>> }
> >>>>>>
> >>>>>> yep I do think *((float)s / loadFactor) + 1.0F* here is wrong.
> >>>>>>
> >>>>>> It should be *Math.ceil((float)s / loadFactor)*
> >>>>>>
> >>>>>> So I wish to generate a pull request.
> >>>>>>
> >>>>>> Anyone interested?
> >>>>>>
> >>>>>>
>


More information about the core-libs-dev mailing list