RFR: 8281631: HashMap copy constructor and putAll can over-allocate table [v27]

Thu Mar 10 16:00:48 UTC 2022

On Thu, 10 Mar 2022 01:11:03 GMT, Stuart Marks <smarks at openjdk.org> wrote:

> Sorry, the test changes look like they're heading in the wrong direction. I tried to provide some hints for what I was looking for in my previous comments. At this point, I felt it would have been too time-consuming to provide a bunch of review comments to try to get your proposed tests closer to what I was looking for, so instead I've extended/rewritten the existing `WhiteBoxResizeTest` file:
> 
> https://github.com/stuart-marks/jdk/blob/JDK-8281631-smarks/test/jdk/java/util/HashMap/WhiteBoxResizeTest.java
> 
> I hope you're ok with this.
> 
> Here are some issues I have with the multiple-file approach:
> 
> * The separate classes tends to obscure things, as the separate files are actually tightly coupled; infrastructure (reflection stuff) and test case setup (Util, TestSuite) are separated from the test, making it hard to see what's going on.
> * Each test method still does too much work. In general, test methods should do one thing in three phases: setup, perform action, assert results -- the [GivenWhenThen](https://martinfowler.com/bliki/GivenWhenThen.html) pattern. (Yes, I know a lot of tests currently in the regression suite violate this, but I think it's reasonable for new fine-grained combo tests like this one to be held to this standard.) Several of the test methods here do setup, perform an op, make assertions, perform another op, make assertions, perform another op, make more assertions, etc. The reason this is bad is that it's hard to separate out special cases. Consider `WhiteBoxHashMapInitialCapacityTest`. Since this test method tests both lazy allocation and initial capacity allocation, it can't handle the different behavior of `WeakHashMap`.
> * The need for separate classes is possibly driven by JUnit 4, where the test parameterization mechanism seems to only be available on the granularity of an entire class. (I'm not a JUnit expert, so I might have missed something.) This makes it difficult to structure parameterized tests on a per-method basis, as can be done in TestNG.
> 
> The existing `WhiteBoxResizeTest` is TestNG, so it seems sensible to extend that, and add several parameterized test methods. I believe the new `WhiteBoxResizeTest` covers all the existing cases as well as the new ones in your tests, plus some additional ones that we've uncovered during this discussion. It covers a reasonable set of cases in the following areas:
> 
> * combo of various constructors plus populating the map in different ways all get the same expected table size
> * lazy allocation of table
> * correct default allocation of table
> * verification of fix for `WeakHashMap` premature resizing
> * verification of fix using double instead of float calculation for certain large sizes
> 
> Here are some notes about what I did. Unfortunately all that's here is the end result. I tried a bunch of experiments and different alternatives, which aren't present here, but I'll try to document some of the processes I went through to get to this result.
> 
> * Rearranged the infrastructure to accommodate `WeakHashMap`. The `table()` and `capacity()` methods now work for all map classes under test. Really, there are only three classes, and two (`HashMap` and `LinkedHashMap`) have the table in the `HashMap` class, so a simple conditional will work. If additional classes are added, the conditional might turn into a case expression.
> * Simplified VarHandle setup. The old code had to load some private classes in order to get the exact types to provide to `findVarHandle()`. It's easier to find the field using reflection and then to "unreflect" it into a VarHandle.
> * Rearranged test methods into pairs of data provider and data-provider-consuming test methods, possibly with some helper methods, demarcated into different sections in the test class. Using TestNG data providers is a bit clunky with `Object[][]` or `Iterator<Object[]>` but it ends up working reasonably well.
> * The `tableSizeFor()` test is also now a data provider. It probably would have been ok to leave it as a single method with a bunch of asserts, but using a data provider enables using a tabular form for the test data, which I think makes it easier to manage the test cases.
> * Most test data is tabular. It's fairly straightforward but for some providers it's bit a repetitive to add more test cases. However, I didn't go the next step of compressing out the repetition, because doing so would have made it difficult to exclude special cases. For example, the `WeakHashMap` growth policy for `putAll` differs from the others.
> * Of note is that some of the larger sizes/capacities are tested using a "fake" map, which reports some size but whose `entrySet` iterator returns only one entry. This works for some of the implementations, so only cases for those are included.
> * I used helper methods called from the data provider in order to establish a target type for lambdas. This reduces the clutter in the table data.
> * It's much easier to parameterize constructors using `Supplier` lambdas as method arguments instead of as fields in a helper class. Using a `record` might be helpful here though.
> * If a parameterized test fails, TestNG will print out the arguments obtained from the data provider that caused the test method to fail. The string form of a lambda isn't useful. Providing a string label is a technique (hack) to identify the failing test case. It's a bit cryptic but it's quite effective.
> 
> There is a certain amount of brittleness in these tests such that they're prone to failure if the internals of one of the implementations changes. These are whitebox tests after all. But the new organization should facilitate rearrangement of the test cases and test methods if one of the implementations diverges. In addition, if changing one of the implementations causes test failures here, it will point to other implementations that might also need to be changed.
> 
> The new test is comprehensive enough that we might be able to remove the `Enum/ConstantDirectoryOptimalCapacity` test and the related test that uses those utilities to verify map table sizes. However, that evaluation should be done separately.
> 
> I suggest you merge this file into this PR in place of the other test files. Once you get this done I can run everything through our internal test system and sponsor this PR for integration.

Thanks for your help.

A quick glance: that is really big differences between the designs, but I do accept that sometimes there be no absolute right and wrong in designs, and consistency to the group is important.

So I will have a look at your version of this test and it functionally correct and can cover our newly found cases, I'm glad to change the tests to your version.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7431