RFR: 8311188: Simplify and modernize equals and hashCode in java.text [v4]

Wed Jul 12 22:29:13 UTC 2023

On Fri, 7 Jul 2023 00:39:20 GMT, Pavel Rappo <prappo at openjdk.org> wrote:

>> Thanks @rose00 for the writeup and @pavelrappo for asking pertinent followup questions.
>> 
>> For me the issue here is that there is a bunch of lore about avoiding `Objects::equals` and it's embodied in comments like this:
>> 
>>> NB: Do not replace with Object.equals until JDK-8015417 is resolved. 
>> 
>> These comments are almost exactly ten years old, and we can't seem to find any evidence showing that a slowdown occurred if `Objects::equals` were used. The comments are a "dead hand" at this point. Is there a way to demonstrate whether there is or is not any difference when using `Objects::equals`?
>> 
>> As a side note, I observe that the `eq` method in Collections.java and AbstractMap.java does this:
>> 
>>     return o1 == null ? o2 == null : o1.equals(o2);
>> 
>> whereas `Objects::equals` will test `o1 == o2` and skip the `equals()` call if the references are non-null and equals. This might confound the comparison a bit.
>
>> For me the issue here is that there is a bunch of lore about avoiding `Objects::equals` and it's embodied in comments like this:
>> 
>> > NB: Do not replace with Object.equals until JDK-8015417 is resolved.
>> 
>> These comments are almost exactly ten years old, and we can't seem to find any evidence showing that a slowdown occurred if `Objects::equals` were used. The comments are a "dead hand" at this point. Is there a way to demonstrate whether there is or is not any difference when using `Objects::equals`?
> 
> Agree on all of the above; this is "rumor control".

@pavelrappo

> Let's further test that model by checking its predictions.
> 
> …Which means that in this case, we can reasonably refactor that `equals` as follows:
>    ```
>     public boolean equals(Object o) {
>         return o instanceof Map.Entry<?, ?> e
>                 && Objects.equals(key, e.getKey())
>                 && Objects.equals(value, e.getValue());
>     }
>    ```

Yes, I believe that is so.  Discounting (probably polluted) profile effects, it is no better and no worse to have your own helper `eq` or use the one in `Objects`.

>      
>    Similar reasoning could be applied to that same class' `hashCode`, to refactor it from this:
>    ```
>     public int hashCode() {
>         return (key   == null ? 0 :   key.hashCode()) ^
>                (value == null ? 0 : value.hashCode());
>     }
>    ```
>      
>    to this:
>    ```
>     public int hashCode() {
>         return (Objects.hashCode(key)) ^
>                (Objects.hashCode(value));
>     }
>    ```

Correct.

>    Unlike the previous example, however, where we trade one inlining for another, here we might lose the first layer of inlining. But assuming that `Objects.hashCode` is relatively small and additionally might get annotated with `@ForceInline` as a result of this discussion later on, it will be inlined.

Correct.

> 2. Consider an opposite example…
>      
>    Would it be reasonable to change `hashCode` like this?
>    ```
>     public int hashCode() {
>         int value = m << 5;
>         value += Objects.hashCode(rp);
>         // no need to involve ks here since ks and rp
>         // should be equivalent.
>         return value;
>     }
>    ```

Yes, it would be, because `rp` has a statically observable type, `BigInteger`.  And that observation will flow across the call to the helper method.

>    Firstly, we lose inlining.

I don’t get that; I’d rather say “we assume the helper is inlined”.  Or are you talking about the hoped-for inlining of `recv.hashCode()`?  That works as long as a good-enough type of `recv` (`rp` or the formal to the helper) is observable.  In this case `BigInteger` is good.  If this were generic code and `rp` were `Object` then that would be not so good.

> Secondly, the fact that `rp` is `BigInteger` is no longer immediately available to the actual hash code computation. Assuming that `java.util.Objects.hashCode` will be inlined at runtime, and given enough invocations, it will be profiled and subjected to class hierarchy analysis (CHA), which in the absence of `BigInteger.hashCode` overrides will hopefully determine that the call could be further inlined.

This is mostly correct, except that we rely on the static type of `rp`, smuggled across the inlining barrier.  There is no need for profiling.  Indeed, it is rare for a profiled type to be used for CHA; that is mainly for statically observed types.

@stuart-marks

> These comments are almost exactly ten years old, and we can't seem to find any evidence showing that a slowdown occurred if `Objects::equals` were used. The comments are a “dead hand” at this point. Is there a way to demonstrate whether there is or is not any difference when using `Objects::equals`?

(Ooh: The Mortmain Strikes Back!  I guess I scared somebody a decade ago.  Better get this sorted…)

I think we should mark these helpers `@ForceInline` and then take what winnings we can get from them.  That means either refactoring all candidate sites in the JDK to use the helpers, or refactoring all but those which are risky.

We need a risk assessment in either case, because even if we refactor everything we will have to be ready to fix performance outages (if significant) by either recoding or adding a touch more profiling into the JVM.  Just a touch; it can be focused on just these helpers, if they are made intrinsics.

(Definitions: The “receiver” is the reference that will get the call to the `Object` method.  As it happens it is always the lead argument to the helper.  A “usefully restrictive” type is one for which the eventual virtual method call will reach exactly one method.  Every `final` class is a usefully restrictive type, almost every interface is not usefully restrictive.)

Here are less risky use cases:

 - The receiver in the use case is statically typed to a usefully restrictive type
 - The dynamic type of the receiver is variable and can take on many values.

In the first case, inlining the helper will provide the JIT all the information it needs to devirtualize and inline the method it calls.

In the second case, we give up hope that the JIT will inline the method, whether or not we perform the refactorization.

The more risky use cases are all others:

 - The dynamic type of the receiver takes only one or two values, even if its type is not usefully restrictive.

This can happen with generic code, but mainly in microbenchmarks, which are artificially small programs.  At scale most generic code is multi-purpose, which means the values that flow through them have many dynamic types.  So I think refactoring inside widely-used generic code is safe, not risky.  Beware making policy here only with microbenchmarks.

In risky cases like this, the JVM’s profiling machinery captures useful information on the receiver at the use site, information that would be absent after refactoring the virtual call inside the helper method.  This can and should be fixed by targeted profiling on the arguments of exactly those helpers.

I think it is hard to find the risky cases.

(We can generalize this later as well, and HotSpot already has a flag for that.  I think setting `TypeProfileLevel` to 112 instead of its current default 111 would fix the helpers, and many other problems as well.  But it would have an unpredictable effect on startup and footprint.  Targeting it to just our special helpers is likely to have no adverse effects.)

The above discussion applies to both `equals` and `hashCode`.  For `equals` there is a secondary effect from type observations on the second argument, but it only matters if we are already winning and inlining the virtual method.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14752#discussion_r1261783207