<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    Let's try and separate the various things going on, and then we can

    see if there are attractive fictions we want to spin.  First, let's

    talk about kinds of polymorphism.  Cardelli and Wegner's "On

    understanding types, data abstraction, and polymorphism" (1985)

    divides polymorphism into a hierarchy (though these distinctions

    predate this paper): <br>

    <br>

    <font face="monospace">Polymorphism<br>

          Universal<br>

              Parametric<br>

              Inclusion<br>

          Ad Hoc<br>

              Overloading<br>

              Coercion<br>

    </font><br>

    Inclusion polymorphism is subtyping; the value set of String is

    included in the value set of Object.  <br>

    Coercion polymorphism is conversions; we can use a `short` where an

    `int` is called for, because we can summon an `int` with the same

    value as the `short` at will.  <br>

    Overloading refers to the fact that we can declare `f(int)` and

    `f(long)` so that at the use site, `f` appears to take multiple

    types.  <br>

    (Pattern matching fits into ad-hoc polymorphism, but in Java it gets

    funneled through the other forms first.  Union types are another

    form of ad-hoc polymorphism.)  <br>

    <br>

    The special behavior of `null` could be explained by multiple paths:

    <br>

     - Subtyping, with the Null type as a bottom type for all reference

    types, or<br>

     - Ad-hoc, where it is understood that `null` is in the value set of

    every reference type, and treating an unbounded T as an (infinite)

    union type.<br>

    <br>

    I think the latter path for explaining null is more useful in

    general, and it is probably closer to what the JLS actually says - 

    this is how interfaces "inherit" members like Object::equals.  (I

    think it also offers a more useful way to talk about non-nullable

    reference types, but I'll come back to that.)<br>

    <br>

    Java exhibits all of these forms of polymorphism.   Parametric and

    inclusion are on prominent display, but the others are there too,

    and coercion is actually quite relevant, both in general and to the

    point you are making which is about how the user thinks about what

    it means to instantiate a generic class with `int`.  I think we will

    need all of these tools to get to "everything is an Object" (which I

    think we agree is a desirable unification.)  <br>

    <br>

    A `String` is an Object through inclusion.  An `int` is an Object

    through coercion; if we have an `int` and we need an `Object`, we

    can box the `int` to `Integer`.  Today we do this only for

    assignment, but going forward we will do this in other contexts,

    such as member access (e.g., `1.toString()`, equality, array

    covariance, and serialization.  We heal the multiple rifts through a

    combination of subtyping and coercion.  <br>

    <br>

    So, in the world of universal type variables, what is a T?  I claim

    it is a union over the set of all types that conform to T's bound. 

    Today this includes only reference types, but once we extend bounds

    conformance to admit types that are convertible to T's bound, this

    includes value types as well.  <br>

    <br>

    This union offers a rational explanation for why we can say

    `t.toString()` -- because `toString()` is a member of every type in

    the union (when enumerate the members of a union type, you take the

    _intersection_ of the members in all the types in the union).  We

    leave it to the implementation as to how to actually dispatch

    `toString()`, which will be different depending on whether we

    specialize `Foo<T>` or not.  It also offers a rational

    explanation of why `T` has `null` in its value set today -- and why

    we are going to adjust this to generate unchecked warnings tomorrow

    -- because now we'll be intersecting in some types that don't have

    null.  The same is true for `synchronized` -- which has nothing to

    do with reference vs value, but with identity -- and again, we're

    now adding new types to the union that don't have a previously

    universal property.  <br>

    <br>

    The union model is based on the "stand in" model -- T can stand for

    some unknown type, so you can at most do things on a T that you can

    do on *all* the unknown types.  (Even when we get to specialized

    generics, we might still not allow all such operations, such as `new

    T[n]`; the union offers an upper bound on what is sensible, but

    languages can be more restrictive.)  <br>

    <br>

    The best way I've found to think about types like `String!` in Java

    is as _refinement types_.  (See Liquid Haskell

    (<a class="moz-txt-link-freetext" href="https://ucsd-progsys.github.io/liquidhaskell-tutorial/">https://ucsd-progsys.github.io/liquidhaskell-tutorial/</a>), or Clojure

    Spec (<a class="moz-txt-link-freetext" href="https://clojure.org/guides/spec">https://clojure.org/guides/spec</a>)).  A refinement type takes a

    type and a _predicate_ which refines its value set, such as "even

    integer", and can contain arbitrary predicative logic.  The compiler

    then attempts to prove the desired properties (easier in functional

    languages).  In other words, the type `String!` takes as its base

    the reference type `String`, along with a predicate `s -> s !=

    null`.  Taking away the null doesn't change the reference-ness of

    it, it just restricts the value set.<br>

    <br>

    Interestingly, the languages that have the most direct claim to

    modifiers like `!` and `?` treat them as _cardinalities_, such as X#

    and to a lesser degree XSL.  In X#, where "everything is a

    sequence", cardinality modifiers are: refinement types!  They

    constrain the length of the sequence (imagine a refinement type on

    List which said "size() > 3".)  <br>

    <br>

    We're clearly never going to plunk for arbitrary predicative logic

    in our type system and the theorem provers that come with them, but

    ad-hoc predicates like "not null", "has identity" and "is reference"

    are already swimming under the surface of the type system we have,

    and we'll see more like this when we get to specialization (where we

    will model specialized instantiations as refinements rather than

    substitution.)  <br>

    <br>

    <br>

    OK, with that as background, let's dive into your mail.  <br>

    <br>

    <blockquote type="cite" cite="mid:CAGKkBkuGgaxoNOJyT-iZRiKptUbk1qntbXEVLQ0EjjzrhKKx-A@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <div>

            <div>I'm sure the theoretic argument is fine as far as it

              goes, but it's not much help for the end user. My issue is

              with the user model we present to the world; what "useful

              fictions" are we securing for them, that enable them to

              read and write code with confidence?</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    One locus of potential fiction is what we mean by "is" in

    "everything is an Object".  If a T is an Object, do we really just

    mean "things that are subtypes of Object", or do we mean "things

    that can be bounded by Object" (which includes value types via

    conversion/coercion, rather than via subtyping.)  I think ultimately

    the latter is more helpful, because when someone says

    `ArrayList<long>`, what they really want is an ArrayList that

    is backed by a long[], with all the non-nullability, flatness, and

    tearability that long already has.  `ArrayList<T>` can be

    thought of something that "has Ts" in it; if we are substituting in

    T=long, we will want all the properties of long because that allows

    for greater compositionality of semantics.  <br>

    <br>

    <blockquote type="cite" cite="mid:CAGKkBkuGgaxoNOJyT-iZRiKptUbk1qntbXEVLQ0EjjzrhKKx-A@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <div><b>Some "T always a reference type" advantages:</b>

            <div><br>

            </div>

            <div>* With subtype polymorphism, the user enjoys a solid

              understanding that "reference types are polymorphic, value

              types are monomorphic". As I'd put it: you can never have

              a value (say as a field) without statically knowing its

              exact type, because its exact type governs the shape and

              interpretation of the bits actually making up the value.

              Don't know the exact type --> you need a reference. But

              parametric polymorphism (thanks for laying out these terms

              in the JEP draft, Dan) feels very similar! I'd expect the

              user to consult the same intuitions we just drilled into

              them about subtype polymorphism. It would be nice if the

              same simple rule held there too.<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    I think this tries to flip around "reference types are polymorphic"

    into "polymorphic types are references."   T is polymorphic, users

    will get that without trouble.  But does it have to be inclusion

    polymorphism?  I think it is an ad-hoc union between coercion (value

    types) and inclusion (reference types).  <br>

    <br>

    If we push towards the fiction of "they're all reference types",

    then Foo<long> really means Foo<Long>, with all the

    nullability and tearability differences between long and Long. <br>

    <br>

    <blockquote type="cite" cite="mid:CAGKkBkuGgaxoNOJyT-iZRiKptUbk1qntbXEVLQ0EjjzrhKKx-A@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <div>

            <div>

              <div>

                <div>* When my class gets used as `MyClass<int>`,

                  I would get to reason like so:<br>

                </div>

              </div>

            </div>

            <div>    * When that code runs on some JVM that doesn't do

              specialization yet, then my class gets used directly, so

              those `int`s are really `Integer`s; of course they are,

              because T is a reference type. (I expect I can't tear a

              value this way.)</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    I would say it differently: in this world, `long` *erases to*

    `Object`, just as `String` does.  Which means it will inherit some

    of the properties of Object that String doesn't have, such the

    chance for heap pollution.  Similarly, when we erase `long` to

    `Object`, we pick up some of these properties too, including the

    additional chance of null pollution, as well as some atomicity we

    didn't ask for.  But that's because of the erasure, not for any

    intrinsic property of type variables.  And the compiler will try to

    claw back some of that nullability with unchecked warnings anyway,

    just as we try to claw back some of the vectors for heap pollution. 

    The nullity of T is the same erasure-driven pollution we already

    know and tolerate.  <br>

    <br>

    <blockquote type="cite" cite="mid:CAGKkBkuGgaxoNOJyT-iZRiKptUbk1qntbXEVLQ0EjjzrhKKx-A@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <div>

            <div>    * When that code runs on some JVM that has

              specialization, then different "species" of my class are

              being forked off from my template, each one physically <i>replacing</i> T

              with some value type. So <i>those</i> are value types, but

              once again T is still a reference type. (And here I do

              expect tearing risk, for non-atomic types.)</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    When I specialize `Foo<long>`, any T-valued fields or arrays

    or method parameters really are long, with all the characteristics

    of long.  Treating them as references (which have properties long

    doesn't have) seems more confusing.  "Placeholder, which collapses

    to its instantiation" feels more natural here?<br>

    <br>

    <blockquote type="cite" cite="mid:CAGKkBkuGgaxoNOJyT-iZRiKptUbk1qntbXEVLQ0EjjzrhKKx-A@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <div>

            <div>* If Java might ever to have non-nullable reference

              types, I suspect it might immediately expose this whole

              type variable issue as having been, at its essence, never

              really about ref-vs-val in the first place. What it's

              really about is that there used to be one value in the

              union of every Object type's value set, and now there

              isn't anymore.</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Agree -- it was always about the union of types / intersection of

    properties of those types.  Null used to be in that intersection,

    but now things got more complicated -- but doesn't this argue

    against the reference interpretation, and towards the

    placeholder/union interpretation?  <br>

    <br>

    <blockquote type="cite" cite="mid:CAGKkBkuGgaxoNOJyT-iZRiKptUbk1qntbXEVLQ0EjjzrhKKx-A@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <div>

            <div>* The best way a user can prepare their generic class

              for becoming "universal" in the future is to adopt

              aftermarket nullness analysis (such as I'm working on

              standardizing the semantics for in JSpecify). They'll mark

              type parameters like `V extends @Nullable Object`, and

              methods like `Map.get` will return `@Nullable V`. That

              will shake out any obstacles up front. Then once V becomes

              a UTP, they'd just change that `V` to `V.ref`, and they

              could presumably drop the `@Nullable` too because `.ref`

              implies it (why else would it be used?). So the language

              feature you're introducing for ref-vs-val universality is

              immediately doing double duty, capturing nullness

              information for reference types too.</div>

            <div><br>

            </div>

            <div>This would probably mean rethinking the `T.ref` syntax

              to something that more closely evokes "T or null" (the

              fact this would, for an <int> species, have to box

              to `Integer` in the process seems intuitive enough).</div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Open to finding a better way to spell "T or null"; I think the path

    to this involves having this conversation converge :)<br>

    <br>

    <br>

  </body>

</html>