RFR: 8370914: C2: Reimplement Type::join [v7]

Wed Jan 7 16:26:58 UTC 2026

On Wed, 7 Jan 2026 09:56:26 GMT, Quan Anh Mai <qamai at openjdk.org> wrote:

>> Hi,
>> 
>> Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection.
>> 
>> However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`).
>> 
>> In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand.
>> 
>> This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this:
>> 
>> - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values.
>> - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about.
>> 
>> This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these p...
>
> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits:
> 
>  - copyright year
>  - Merge branch 'master' into typejoin
>  - sort order
>  - Merge branch 'master' into typejoin
>  - Merge branch 'master' into typejoin
>  - Move dual to ASSERT only
>  - Keep old version for verification
>  - whitespace
>  - Reimplement Type::join

Good improvement!

It's really much simpler, easy to reason about... If we have problems with operations, it's probably not by lack of dual.

I'm not super familiar with the abstraction of some derived clases of `TypePtr` (like `TypeMetadataPtr`), but I didn't see anything shocking. I do have a few little comments.

I can think of only one type-related thing that would be even better: swapping names of top, bottom, join and meet to match the terminology of the rest of the world. Seriously, that would be nice...

src/hotspot/share/opto/memnode.cpp line 2019:

> 2017:   if (is_mismatched_access()) {
> 2018:     return _type;
> 2019:   }

How is that related to the reimplementation of join?

src/hotspot/share/opto/rangeinference.cpp line 700:

> 698:     return int_type_union(t1, t2);
> 699:   } else {
> 700:     return CT::make_or_top(TypeIntPrototype<S, U>{{MAX2(t1->_lo, t2->_lo), MIN2(t1->_hi, t2->_hi)},

Nothing dramatic, but why is this branch implemented directly, why the other branch is calling another function `int_type_union`. This function seems to be used only here and in testing. If I understand correctly, this branch is meant to go away when the `_is_dual` is removed, but it is (almost) the same as in `int_type_xjoin` (which makes sense).

I'm not very decided what would be the best, but I'm slightly annoyed they look different. Should we have a `int_type_intersection`? But I don't see what it would do that `int_type_xjoin` doesn't do. Or simply, why do we need `int_type_union`? Once this else-branch removed, wouldn't `int_type_xmeet` simply be a call to `int_type_union`, and so maybe we could avoid this extra-step?

I'm fine if the answer is "it's part of the future clean up work", but then, I think we could have a tracking ticket and a TODO comment.

src/hotspot/share/opto/rangeinference.cpp line 710:

> 708: 
> 709: template <class CT>
> 710: const Type* TypeIntHelper::int_type_xjoin(const CT* t1, const CT* t2) {

Unlike `int_type_union`, I couldn't find test for `int_type_xjoin`. have I missed something? Is that expected? I agree it's not the the most scary function, but... why not! And it would at least be there for skeleton when adding other abstract domains that might be less obvious.

src/hotspot/share/opto/subnode.cpp line 2014:

> 2012:   const Type* in_type = phase->type(in1);
> 2013:   if ((in_type->isa_int() && in_type->is_int()->_lo >= 0) ||
> 2014:       (in_type->isa_long() && in_type->is_long()->_lo >= 0)) {

I'm a bit annoyed here because we look quite a lot inside the details of the implementation of the types. I'm not fan of the previous situation either, as at most one side of the `||` would make sense.

For instance, if we want to make use of bitwise information for this, we could look at the highest bit (that is a sign bit in two complement). That is just an example, we could find other tricks to conclude that an abstract int is non-negative that may not involve ranges (or not only ranges), and I find unfortunate to look inside the type, rather than the type telling. I think the usual approach would be to check that the guard/intersection (depending on the formalism) with the negative numbers is empty (or with non-negative numbers is the same), but we quickly have again the problem that `in_type` can be either int or long and a lot of ways to write that would need to split cases. Maybe an approximation, would be to have `TypeInt::is_non_negative` and alike not to overengineer guarding with arbitrary expression, but still limit the scope of who looks into the implementation?

src/hotspot/share/opto/type.cpp line 1029:

> 1027:     tty->print("t1 meets t2 = "); mt1->dump(); tty->cr();
> 1028:     tty->print("t2 meets t1 = "); mt2->dump(); tty->cr();
> 1029:     fatal("meet not commutative");

I see it was like that before, but I think it's discouraged to have many tty->print to avoid interleaved output of concurrent prints. The preferred solution is to use a stringstream rather than a lock. Also, do we need a flush before the fatal? I've seen some (other) prints that are cut before the end on assert failures.

Same under.

src/hotspot/share/opto/type.cpp line 1515:

> 1513:       return this;
> 1514:     case FloatCon:
> 1515:       assert(jint_cast(_f) != jint_cast(t->getf()), "Equivalent instances should not appear here");

Because of the

  if (t1 == t2) {
    return t1;
  }

in the binary `Type::xmeet`, right? (similar under).

src/hotspot/share/opto/type.cpp line 2401:

> 2399: 
> 2400: //------------------------------meet-------------------------------------------
> 2401: // Compute the MEET of two types.  It returns a new Type object.

Should we clean up these comments? I think we can have it on the base method, but maybe not that useful on each override? Also, it's not a great comment: it doesn't bring much more compared to the signature, it speaks abotu two types, but one of them is `this` and so implicit which is somewhat confusing (it would make more sense on `const Type* Type::xmeet(const Type* t1, const Type* t2)`). The comments says it returns a new Type object, which is ambiguous: these methods can return `this` or `t`, which are not new objects, for most definitions I'd give to this. I think it means that it doesn't mutate `this` or `t`, but actually return the result, but that is clear from the `const`s.

Some of these comments have a header line of dashes where it's written `xmeet` and some (like this one) only `meet`.

Overall, I think these comments bring nothing, or possibly confusion, and it would be a good opportunity to get rid of them.

-------------

PR Review: https://git.openjdk.org/jdk/pull/28051#pullrequestreview-3635235449
PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2668746050
PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2668821963
PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2668829347
PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2668881848
PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2668967484
PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2669031908
PR Review Comment: https://git.openjdk.org/jdk/pull/28051#discussion_r2669077305