RFR: 8281631: HashMap copy constructor and putAll can over-allocate table [v25]

Fri Mar 4 20:09:07 UTC 2022

On Fri, 4 Mar 2022 17:30:48 GMT, XenoAmess <duke at openjdk.java.net> wrote:

>> 8281631: HashMap copy constructor and putAll can over-allocate table
>
> XenoAmess has updated the pull request incrementally with one additional commit since the last revision:
> 
>   refine test

> I see codes WhiteBoxResizeTest. If you want to add some class who not extend HashMap into it, it need to change very much contents, nearly a rewrite. Though if you really want this I would like to have a try.

> done but cannot pass that test, because WeakHashMap have no such mechanism that do not create table[] when empty, only create it when first element exist, but HashMap do.

Yes, I'd like for the WhiteBoxResizeTest to continue to test (most of) what it's currently testing, in addition to new tests for the cases you're fixing here. The commonality here is that they're all hash tables using separate chaining, they use power-of-two table sizes, and they have a load factor that governs when the table is resized.

The alternative is to have another test somewhere else that tests things that are very similar and that might or might not overlap in some ways. In addition, if we end up with a really comprehensive set of tests here in WhiteBoxResizeTest, it might reduce the need for the Enum/ConstantDirectoryOptimalCapacity.java test and the stuff in the test library that it uses. (But again, that cleanup should be separate from this.)

That said, I agree, WhiteBoxResizeTest will need some rewriting. There are two main areas that need some attention: infrastructure to get internals like the `table` field, and handling of test cases themselves.

(Also, I agree, let's not add lazy initialization to WeakHashMap as part of this PR. This can be handled by refactoring the test cases; see below.)

For getting at the HashMap internals, the test has kind of an internal API `table()` that just gets the contents of the `table` field. Clearly this needs to be adjusted to deal with WeakHashMap vs. HashMap. This can be done in a variety of ways, e.g., by reflection, or by looking up a method handle from a preloaded map of classes to method handles. I'd encourage you to try the latter, as it's more easily extensible in the future should we decide to add Hashtable or ConcurrentHashMap here.

Then, the test cases need to be pulled apart. For example, there is this:

    void capacityTestDefaultConstructor(HashMap<Integer, Integer> map) {
        assertNull(table(map));

        map.put(1, 1);
        assertEquals(capacity(map), 16); // default initial capacity

        map.putAll(IntStream.range(0, 64).boxed().collect(toMap(i -> i, i -> i)));
        assertEquals(capacity(map), 128);
    }

This actually tests three things: 1) table is lazily allocated, 2) default capacity is 16, and 3) using putAll to populate the map with 64 elements results in a table size of 128. This should really be broken into three separate test methods. Once they're separated, the lazy allocation test should only called for HashMap and LinkedHashMap but not WeakHashMap.

Then there is this:

    @Test
    public void capacityTestInitialCapacity() {
        int initialCapacity = rnd.nextInt(2, 128);
        List<Supplier<HashMap<Integer, Integer>>> suppliers = List.of(
            () -> new HashMap<>(initialCapacity),
            () -> new HashMap<>(initialCapacity, 0.75f),
            () -> new LinkedHashMap<>(initialCapacity),
            () -> new LinkedHashMap<>(initialCapacity, 0.75f));

        for (Supplier<HashMap<Integer, Integer>> supplier : suppliers) {
            HashMap<Integer, Integer> map = supplier.get();
            assertNull(table(map));

            map.put(1, 1);
            assertEquals(capacity(map), tableSizeFor(initialCapacity));
        }
    }

This too needs to be pulled apart, as it's also testing different cases. The lazy-allocation cases should be moved to new cases of the lazy allocation test created above. What remains is cases that test whether the actual table size is the right size for the requested capacity. This is uncomfortably close to a test that checks the system-under-test against itself. Also, it's not clear to me the value of using random values here. Seems to me that it would be better to have a small list of edge cases to make sure that the table length resulting from a particular capacity request mathces what we expect. For example:

    requested  expected
    15         16
    16         16
    17         32

(I think I got those numbers right.)

Once these test methods have been straightened out, it should be clearer how to add the test cases we're interested in. Specifically,

1. putting N mappings into a map using `put()` in a loop results in a table of the expected length;
2. putting N mappings into a map using `putAll(mapWithNMappings)` results in a table of the expected length;
3. creating a map with N mappings using the copy constructor results in a table of the expected length.

This is a lot to do all at once. It might be reasonable to work incrementally, for example, first working on the MethodHandle infrastructure to deal with different Map classes. I can provide you with some code snippets if it's helpful.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7431