HashMap.putAll can cause redundant space waste
Xeno Amess
xenoamess at gmail.com
Wed Feb 9 01:51:57 UTC 2022
>
> I agree that it would
> be preferable for the expression to be something like this:
(int) Math.ceil(expected / 0.75)
Agree.
If we don't add a Java SE utility like "newHashMapWithExpectedSize",
> fallbacks would
> be to introduce an internal JDK utility method that is called from various
> places,
> or just to update the individual call sites to use the more precise
> calculation.
Both seem acceptable.
So what should I do next?
Stuart Marks <stuart.marks at oracle.com> 于2022年2月8日周二 04:06写道:
> Hi,
>
> I'm not sure where you ended up in this succession of messages. I do think
> there are
> some things going on that are worthy of discussion and possibly fixing.
> Let me try
> to break them down.
>
> 1) With the default load factor of 0.75, it's possible to have 12 entries
> in a map
> whose table length is 16. This works as expected if the entries are added
> individually using the put() method. However, adding 12 entries into a map
> via the
> copy constructor or via putAll() results in a table length of 32. That
> seems wrong.
> Well, if not exactly wrong, it's suboptimal, in that it uses more space
> than necessary.
>
> 2) The root cause of the problem is with expressions such as these:
>
> (int) (expected / 0.75f + 1.0f)
> or
> (int) (expected / 0.75f) + 1
>
> (where "expected" is the expected number of entries). They attempt to
> round the
> division result up to the nearest int by adding 1 or 1.0f. This results in
> a value
> that's one too large if the result of the division exactly equals an
> integer. So,
> when "expected" is 12, this results in 17 instead of 16. This in turn is
> inflated to
> the next power-of-two, which is 32.
>
> 3) This was discussed previously in a different context. [1] I agree that
> it would
> be preferable for the expression to be something like this:
>
> (int) Math.ceil(expected / 0.75)
>
> [1]
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/066258.html
>
> This uses doubles instead of floats. But I think this is a fairly
> low-frequency
> operation, so I don't think using doubles is a problem. Thus I don't think
> we need
> to add a Math.ceil(float) overload.
>
> **
>
> So what should be done about this? Possibly, the computations in HashMap
> and related
> classes should be adjusted to use Math.ceil() instead. Some tests use the
> suboptimal
> calculation as well; those might need to be adjusted also.
>
> There are a variety of places around the JDK that use a similar expression
> in order
> to pre-size HashMaps and HashSets. We could go around and fix all them,
> but it might
> be worth considering adding an API that creates a HashMap (or
> LinkedHashMap) that is
> pre-sized for the expected number of elements. Guava has such an API,
> Maps.newHashMapWithExpectedSize [2]. But note that it tries (but does not
> guarantee)
> to size the map such that it can accommodate the expected number of
> elements without
> resizing, but it doesn't promise that the map isn't oversized.
>
> [2]
>
> https://guava.dev/releases/31.0.1-jre/api/docs/com/google/common/collect/Maps.html#newHashMapWithExpectedSize(int)
>
> (Also note that the ConcurrentHashMap(int) constructor takes the expected
> number of
> elements, not the initial table length like HashMap(int). CHM does what
> probably
> most users want, but it's confusing that its behavior differs from HashMap
> in this
> regard.)
>
> If we don't add a Java SE utility like "newHashMapWithExpectedSize",
> fallbacks would
> be to introduce an internal JDK utility method that is called from various
> places,
> or just to update the individual call sites to use the more precise
> calculation.
>
> s'marks
>
>
> On 2/4/22 10:48 AM, Xeno Amess wrote:
> > wait, things get interesting now.
> > We do have such a test named ConstantDirectoryOptimalCapacity to make
> sure
> > this does not happen, but my tests still fill under the idea debugger.
> > Is there any specialist for help? I would dig in but it is a little
> > complicated.
> > [image: image.png]
> >
> > Xeno Amess <xenoamess at gmail.com> 于2022年2月5日周六 02:39写道:
> >
> >> Sorry for my mistake.
> >> After a more throughout dig in, this thing has already fixed several
> years
> >> ago, at year 2015.
> >> Issue close.
> >>
> >> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 21:40写道:
> >>
> >>> Besides, is it worthy to develop a public float Math.ceil(float)
> function?
> >>>
> >>> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 21:39写道:
> >>>
> >>>> patch applied.
> >>>>
> >>>> Index: src/java.base/share/classes/java/lang/Module.java
> >>>> IDEA additional info:
> >>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> >>>> <+>UTF-8
> >>>> ===================================================================
> >>>> diff --git a/src/java.base/share/classes/java/lang/Module.java
> >>>> b/src/java.base/share/classes/java/lang/Module.java
> >>>> --- a/src/java.base/share/classes/java/lang/Module.java (revision
> >>>> 3d926dd66ef6551e91a4ebbbc59dcff58f5ede5a)
> >>>> +++ b/src/java.base/share/classes/java/lang/Module.java (revision
> >>>> deeba25d15398fea8bc971ac915e348878b2c27a)
> >>>> @@ -1133,7 +1133,7 @@
> >>>> boolean isBootLayer = (ModuleLayer.boot() == null);
> >>>>
> >>>> int numModules = cf.modules().size();
> >>>> - int cap = (int)(numModules / 0.75f + 1.0f);
> >>>> + int cap = (int)Math.ceil(numModules / 0.75f);
> >>>> Map<String, Module> nameToModule = new HashMap<>(cap);
> >>>>
> >>>> // to avoid repeated lookups and reduce iteration overhead,
> we
> >>>> create
> >>>> Index: src/java.base/share/classes/java/util/HashMap.java
> >>>> IDEA additional info:
> >>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> >>>> <+>UTF-8
> >>>> ===================================================================
> >>>> diff --git a/src/java.base/share/classes/java/util/HashMap.java
> >>>> b/src/java.base/share/classes/java/util/HashMap.java
> >>>> --- a/src/java.base/share/classes/java/util/HashMap.java (revision
> >>>> 3d926dd66ef6551e91a4ebbbc59dcff58f5ede5a)
> >>>> +++ b/src/java.base/share/classes/java/util/HashMap.java (revision
> >>>> deeba25d15398fea8bc971ac915e348878b2c27a)
> >>>> @@ -495,9 +495,9 @@
> >>>> int s = m.size();
> >>>> if (s > 0) {
> >>>> if (table == null) { // pre-size
> >>>> - float ft = ((float)s / loadFactor) + 1.0F;
> >>>> - int t = ((ft < (float)MAXIMUM_CAPACITY) ?
> >>>> - (int)ft : MAXIMUM_CAPACITY);
> >>>> + double dt = Math.ceil(s / loadFactor);
> >>>> + int t = ((dt < (double)MAXIMUM_CAPACITY) ?
> >>>> + (int)dt : MAXIMUM_CAPACITY);
> >>>> if (t > threshold)
> >>>> threshold = tableSizeFor(t);
> >>>> } else {
> >>>> @@ -1527,12 +1527,12 @@
> >>>> } else if (mappings == 0) {
> >>>> // use defaults
> >>>> } else if (mappings > 0) {
> >>>> - float fc = (float)mappings / lf + 1.0f;
> >>>> - int cap = ((fc < DEFAULT_INITIAL_CAPACITY) ?
> >>>> + double dc = Math.ceil(mappings / lf);
> >>>> + int cap = ((dc < DEFAULT_INITIAL_CAPACITY) ?
> >>>> DEFAULT_INITIAL_CAPACITY :
> >>>> - (fc >= MAXIMUM_CAPACITY) ?
> >>>> + (dc >= MAXIMUM_CAPACITY) ?
> >>>> MAXIMUM_CAPACITY :
> >>>> - tableSizeFor((int)fc));
> >>>> + tableSizeFor((int)dc));
> >>>> float ft = (float)cap * lf;
> >>>> threshold = ((cap < MAXIMUM_CAPACITY && ft <
> >>>> MAXIMUM_CAPACITY) ?
> >>>> (int)ft : Integer.MAX_VALUE);
> >>>> Index: src/java.base/share/classes/java/util/WeakHashMap.java
> >>>> IDEA additional info:
> >>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> >>>> <+>UTF-8
> >>>> ===================================================================
> >>>> diff --git a/src/java.base/share/classes/java/util/WeakHashMap.java
> >>>> b/src/java.base/share/classes/java/util/WeakHashMap.java
> >>>> --- a/src/java.base/share/classes/java/util/WeakHashMap.java (revision
> >>>> 3d926dd66ef6551e91a4ebbbc59dcff58f5ede5a)
> >>>> +++ b/src/java.base/share/classes/java/util/WeakHashMap.java (revision
> >>>> deeba25d15398fea8bc971ac915e348878b2c27a)
> >>>> @@ -251,7 +251,7 @@
> >>>> * @since 1.3
> >>>> */
> >>>> public WeakHashMap(Map<? extends K, ? extends V> m) {
> >>>> - this(Math.max((int) ((float)m.size() / DEFAULT_LOAD_FACTOR +
> >>>> 1.0F),
> >>>> + this(Math.max((int) Math.ceil(m.size() /
> DEFAULT_LOAD_FACTOR),
> >>>> DEFAULT_INITIAL_CAPACITY),
> >>>> DEFAULT_LOAD_FACTOR);
> >>>> putAll(m);
> >>>>
> >>>> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 18:45写道:
> >>>>
> >>>>> also find some other places have same problem.
> >>>>> If some of your already-in people aggree, I would create a pr, but
> >>>>> according to the rules seems I should wait for now.
> >>>>>
> >>>>> Xeno Amess <xenoamess at gmail.com> 于2022年2月4日周五 18:42写道:
> >>>>>
> >>>>>> import java.lang.reflect.Array;
> >>>>>> import java.lang.reflect.Field;
> >>>>>> import java.util.HashMap;
> >>>>>> import java.util.Map;
> >>>>>>
> >>>>>> public class TestMap {
> >>>>>>
> >>>>>> public static void main(String[] args) throws
> NoSuchFieldException, IllegalAccessException {
> >>>>>> HashMap<Object, Object> a = new HashMap<>();
> >>>>>> fill12(a);
> >>>>>> HashMap<Object, Object> b = new HashMap<>(12);
> >>>>>> fill12(b);
> >>>>>> HashMap<Object, Object> c = new HashMap<>(a);
> >>>>>> HashMap<Object, Object> d = new HashMap<>();
> >>>>>> d.putAll(a);
> >>>>>> System.out.println("a : " + getArrayLength(a));
> >>>>>> System.out.println("b : " + getArrayLength(b));
> >>>>>> System.out.println("c : " + getArrayLength(c));
> >>>>>> System.out.println("d : " + getArrayLength(d));
> >>>>>> }
> >>>>>>
> >>>>>> public static void fill12(Map<Object, Object> map) {
> >>>>>> for (int i = 0; i < 12; i++) {
> >>>>>> map.put(i, i);
> >>>>>> }
> >>>>>> }
> >>>>>>
> >>>>>> public static int getArrayLength(Map<Object, Object> map)
> throws NoSuchFieldException, IllegalAccessException {
> >>>>>> Field field = HashMap.class.getDeclaredField("table");
> >>>>>> field.setAccessible(true);
> >>>>>> Object table = field.get(map);
> >>>>>> return Array.getLength(table);
> >>>>>> }
> >>>>>>
> >>>>>> }
> >>>>>>
> >>>>>> run this and we get the output:
> >>>>>>
> >>>>>> a : 16
> >>>>>> b : 16
> >>>>>> c : 32
> >>>>>> d : 32
> >>>>>>
> >>>>>> So I go see the codes.
> >>>>>>
> >>>>>> /**
> >>>>>> * Implements Map.putAll and Map constructor.
> >>>>>> *
> >>>>>> * @param m the map
> >>>>>> * @param evict false when initially constructing this map, else
> >>>>>> * true (relayed to method afterNodeInsertion).
> >>>>>> */
> >>>>>> final void putMapEntries(Map<? extends K, ? extends V> m, boolean
> evict) {
> >>>>>> int s = m.size();
> >>>>>> if (s > 0) {
> >>>>>> if (table == null) { // pre-size
> >>>>>> float ft = ((float)s / loadFactor) + 1.0F;
> >>>>>> int t = ((ft < (float)MAXIMUM_CAPACITY) ?
> >>>>>> (int)ft : MAXIMUM_CAPACITY);
> >>>>>> if (t > threshold)
> >>>>>> threshold = tableSizeFor(t);
> >>>>>> } else {
> >>>>>> // Because of linked-list bucket constraints, we cannot
> >>>>>> // expand all at once, but can reduce total resize
> >>>>>> // effort by repeated doubling now vs later
> >>>>>> while (s > threshold && table.length <
> MAXIMUM_CAPACITY)
> >>>>>> resize();
> >>>>>> }
> >>>>>>
> >>>>>> for (Map.Entry<? extends K, ? extends V> e : m.entrySet())
> {
> >>>>>> K key = e.getKey();
> >>>>>> V value = e.getValue();
> >>>>>> putVal(hash(key), key, value, false, evict);
> >>>>>> }
> >>>>>> }
> >>>>>> }
> >>>>>>
> >>>>>> yep I do think *((float)s / loadFactor) + 1.0F* here is wrong.
> >>>>>>
> >>>>>> It should be *Math.ceil((float)s / loadFactor)*
> >>>>>>
> >>>>>> So I wish to generate a pull request.
> >>>>>>
> >>>>>> Anyone interested?
> >>>>>>
> >>>>>>
>
More information about the core-libs-dev
mailing list