HashMap sizing was a terrible design mistake, but too much software depends on it, e.g. https://google.github.io/guava/releases/23.0/api/docs/com/google/common/coll... On Tue, Dec 11, 2018 at 1:26 AM Claes Redestad <claes.redestad@oracle.com> wrote:
Hi Peter,
On 2018-12-11 10:02, Peter Levart wrote:
Hi Claes,
Haven't checked all changes yet, although it looks like a straightforward swap of Properties for HashMap for intermediary result, but I noticed the following in SystemProps:
265 var cmdProps = new HashMap<String, String>((vmProps.length / 2) + Raw.FIXED_LENGTH);
The HashMap(int) constructor is different from Properties(int) in that for the former, the argument represents the lower bound on the initial size of the table (which is just a rounding of this parameter up to the nearest power of 2). The threshold where the table is resized is calculated as (initialCapacity rounded up to nearest power of 2) * loadFactor. The default load factor is 0.75 which means that the table will be resized in worst case after 3/4 * initialCapacity of elements are inserted into it. In order to guarantee that the table is not resized you have to pass (size * 4 + 2) / 3 to the HashMap constructor, where size is the number of elements added...
I hope I'm not misleading you, I just think this is how HashMap has been from the beginning. Peeking at HashMap code (line 693) it seems that it is doing that:
float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE);
newCap above is holding the initialCapacity constructor parameter rounded up to the nearest power of 2. newThr is the threshold at which the resize occurs.
The Properties(int) constuctor behaves differently as it passes the parameter directly to the underlying ConcurrentHashMap, which says:
* @param initialCapacity The implementation performs internal * sizing to accommodate this many elements.
Fun story, but I noticed this unfortunate difference when I was working on this very patch and brought it up in the team. I think most agrees the CHM behavior is the more intuitive and would have loved to consolidate to that behavior, but the message I've gotten is that it's probably too late to fix: Whichever way you go you get lots of little subtle changes to sizes that may lead to incompabilities in applications/tests that inadvertedly depend on the iteration order or HashMap etc..
That said: Raw typically contains quite a few empty/null values that'll never be put in the map. Enough so that for the applications I've looked at the initialCapacity is already a fair bit higher than it needs to be to avoid resizing. Thus it made little sense to take the maximum possible size and adjust it up even further by factoring in the default load factor.
(Unless you have a *lot* of properties coming in via command line, but I think we should optimize for the common cases...)
Thanks!
/Claes
Regards, Peter
On 12/10/18 10:17 PM, Claes Redestad wrote:
Hi,
by inverting the order in which the internal property maps are created, we avoid some classloading and get a slightly more efficient code execution profile in System.initPhase1.
Webrev: http://cr.openjdk.java.net/~redestad/8215159/jdk.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8215159
This reduces bytecode executed in initPhase1 from ~48k to ~36k. Not much by any measure, but minimizing System.initPhase1 is important since certain parts of the VM (JIT etc) are blocked from initializing until it's done.
Thanks!
/Claes