The storage hint model

Thu Jul 21 13:29:03 UTC 2022

----- Original Message -----
> From: "Maurizio Cimadamore" <maurizio.cimadamore at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>, "Brian Goetz" <brian.goetz at oracle.com>
> Cc: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Thursday, July 21, 2022 1:29:47 PM
> Subject: Re: The storage hint model

> Hi Remi,
> I've been thinking along similar lines in the past few weeks.

I think we are all hunting on similar grounds :)

I think you are mixing two different ideas one is .val propagation and the other is the flat vs box model for generics.
It's my fault because in my previous mail i mix both things too.

The .val propagation.
My aha! moment was discovering that we do not need .val propagation. In the parametric VM, the instantiation of specialized generics is done at runtime inside bootstrap methods (triggered by opcodes linkage resolution or method descriptor type restriction). Combined with the idea that at runtime there is only one class corresponding to a value class, then the runtime can check if a type argument is a value class or not. So with C a value class, an ArrayList<C> is enough to trigger the specialization.

The type propagation we all want is done by the parametric VM.
With your example,

  class Sub<X> extends Foo<X> { ... }

If X is C at runtime, then the X of Foo is also a C because the VM propagates the type arguments.

The .flat model for value class
We know since quite some time that we can describe the behavior of value classes using a storage hints model. The appeal of such models is that it is independent on the type system, such hints are not propagated with the types. Unlike the previous models, i propose to add such hints not only on field type and array type but also on parameter type so the VM has enough information at class preparation time and at JIT time so the VM has enough information to generate specialized assembly codes.

The flat vs box model for generics
My mistake is to not have discussed whenever we want a .flat or a .box model for generics. Like with value classes, for a type parameter doing specialization can be opt-in (using .flat) or opt-out (using .box). Brian, John and you seems to prefer the box model, where a storage hint has to be used only when a L-type is required.

As an example, here is the ArrayList example, using the .box model.

public classs ArrayList<E> {
  private E.box[] array;   // can be flat or box, so declared as box
  private int size;

  public ArrayList() {
    array = new E[16];   // flat by default !
  }

  public boolean add(E.box element) {  // E is not flat
    if (element == null && !array.getClass().isNullable()) {
      var newArray = new E.box[array.length];  // need to store null, use a nullable array
      System.arraycopy(array, 0, newArray, 0, array.length);
      array = newArray;
    }
    if (array.length == size) {
      array = Arrays.copyOf(array, size * 2);
    }
    array[size++] = element:
    return true;
  }
}

for me, using .flat or .box is a separate decision than using a storage hint model vs a type based model.

> 
> I think that, as with every approach, there are pros and cons to what
> you propose. In a way the difference between type-based and the
> storage-based approaches remind me of the distinction between
> homogeneous and heterogeneous generic reification translation strategies
> (for more details, please refer to the good read in [1]).

Not sure to follow you on this one, homogenous vs heterogeneous is more a parametric VM discussion. The current plan is homogenous bytecode and heterogeneous when specialized if asked by bootstrap methods whatever the model we choose.

> 
> In the strage-based model, the user of a generic class doesn't know if a
> type-variable will be used 20 levels down the stack; this calls, I
> think, for some sort of type-passing approach, where the generic type
> information is made available when the class is created, but not
> necessarily acted upon by the JVM. That is, an object whose static type
> is `Foo<Point>` is just an object whose type is `Foo` which has its
> "type-token" saved somewhere (in my thesis [2] I did that with an
> indirection in the oop - an approach that is sometimes referred to as
> near/far classes). You need to pass this info around everywhere because
> you don't know who's gonna use this information (e.g. in your strategy,
> which class is going to use some T.flat). Granted, in the model you
> propose it would be possible to see if a generic class uses T.flat at
> all, and, if it doesn't, maybe no type token is required - but that's an
> orthogonal optimization. As there's only one Foo (albeit used w/ or w/o
> side type information), it is a bit easier to deal with pathologically
> polymorphic cases such as wildcards, or to deal with use cases where
> type information is either missing, or not fit for purpose (think javac
> inferring a grotesque non-denotable type in a generic method call). One
> last point: in the storage-based model, clients do not have to opt in to
> get a version of `Foo<Point>` that exhibits some flatness features. It's
> up to the owner of `Foo` to decide whether to use `.flat` inside it or
> not. This can be seen as a pro, or a cons: on the one hands, there's no
> need to rewrite client code to take advantage of specialization (good!)
> - on the other hand, it is impossible for a client to make sure that
> existing code keeps behaving like it did in the past (bad!).

yes, that's the main difference, as a client to have less control, you are only free to not upgrade when a generics class is recompiled to use specialized generics (apart using raw type which is a special kind of ugly). But at the same time, if you are using a library, you are trusting the maintainers that they will do a proper job when upgrading.

> 
> Conversely, in the type-driven approach, simply "uttering" a specialized
> type like `Foo<Point.val>` brings a new runtime type into existence,
> possibly with a different layout. In this world it's easier to see where
> the type information is flowing into (as Brian pointed out), as that's
> part of the type signature. Also, since `Foo<Point.val>` is its own
> little class (or species), you get a place where to store type-static
> metadata for free. For instance, the type parameter `Point.val` might be
> represented as a static field of type `Class<?>` inside the
> `Foo<Point.val>` species. Overall, a type-driven approach seems to fit
> better with the physics of the VMs we have, given that different
> parameterization can be given different runtime types, thus avoiding
> some of the profile pollutions that are otherwise hard to address when
> using a storage-based approach (something similar has been discussed for
> Scala miniboxing, see [3]). That said, in this model, dealing with
> absence of type information can be tricky, as shown in [4]. As noted
> above, clients here need an explicit opt-in into specialization to take
> advantage of it. Creating `Foo<Point>` is one thing, creating
> `Foo<Point.val>` is another, and clients can decide if they are ok with
> the costs associated with specialization.

I believe both models propagate enough information because the specialization occurs at runtime.

> 
> Finally, as Brian pointed out, under the storage-based translation, in
> order for things to work when type information is missing, you have to
> assume that T.flat doesn't really mean "flat all the time", but only
> "flat if you can". That is, if there's some side-channel available, then
> read T's true form from there, otherwise just take T's erasure and use
> that. That said, this problem is not entirely new in this approach.
> Consider:
> 
> ```
> class Foo<X> {
>    X x;
> }
> 
> class Sub<X> extends Foo<X> { ... }
> ```
> 
> Under the type-driven approach, if I create `Sub<Point.val>`, I'd expect
> that species to have a super-species `Foo<Point.val>` (which means `x`
> will have sharp type `Point.val`). But if I create `Sub<String>`, then
> the super-species is just erased Foo, and the type of `x` is simply
> Object. So, the "flat if you can behavior" is there even in the
> type-driven approach (e.g. `extends Foo<X>` doesn't mean the same thing
> in all cases), perhaps more in disguise.

yes, the propagation is the same but the client has less control of the specialization.

> 
> Overall, I don' think either model is "clearly" better than the other -
> they have different trade-offs which might work better in some contexts
> and worse in others. What we pick depends primarily, I think, on whether
> we see specialization as a conscious, opt-in decision performed by the
> user, or if we see specialization more as something happening "under the
> hood" (or, put in better terms, under control of library developers).

yes !

> While the latter sounds attractive, some figments of the specialized
> generic type system unfortunately will result in seams (e.g. new
> NullPointerExceptions) which are _visible_ to clients. So encapsulating
> specialization choices is not something that can be achieved 100%, and I
> think that is where some of us might feel uncomfortable about.

I think this one is more an artifact of .flat vs .box.

> 
> Maurizio

Rémi

> 
> 
> [1] -
> https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.53.8658&rep=rep1&type=pdf
> [2] - http://amsdottorato.unibo.it/2476/
> [3] -
> https://www.semanticscholar.org/paper/Compile-Time-Type-Driven-Data-Representation-in-Ureche/df5831814318ff11d189c4de0485745603fb7afe
> [4] - http://cr.openjdk.java.net/~jrose/values/parametric-vm.html
> 
> 
> 
> 
> 
> 
> 
> On 20/07/2022 21:05, forax at univ-mlv.fr wrote:
>> ----- Original Message -----
>>> From: "Brian Goetz" <brian.goetz at oracle.com>
>>> To: "Remi Forax" <forax at univ-mlv.fr>
>>> Cc: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
>>> Sent: Wednesday, July 20, 2022 7:34:04 PM
>>> Subject: Re: The storage hint model
>>>> Yes, i know, we have already discuss several models like that. But i think, it's
>>>> a good idea to re-examine those because i believe they are more attractive
>>>> today.
>>> Indeed, this has come up several times.  It is attractive to think of flattening
>>> entirely as a ’storage class’, and fair to reexamine it (this also came up in
>>> an internal discussion recently) but I think in the end this still will be a
>>> choice that we regret.
>>>
>>>> The main issue with the .val model is that it presents two *types* to the user
>>>> while we really want is mostly to flatten the storage and have a precise the
>>>> method calling convention.
>>>> Those two goals are not equals, the first is far more important than the second,
>>>> to the point where the coding guideline proposed by Brian is to use .ref for
>>>> the parameters and .val for the fields and arrays.
>>> FTR, the motivation for the the guideline here is “use .val where it makes the
>>> most difference.”  There’s nothing *wrong* with using val types on the stack,
>>> you just don’t get the enormous payback you do with heap variables.  But I can
>>> imagine — especially in a specialized-generics world — that there is value to
>>> using .val in APIs as well, because it carries the semantic “not null”
>>> information as well as the flattening hint.
>> T.flat carries the same semantics, the difference is that you have to explicitly
>> use T.flat where you want the flattening in the generic code.
>>
>> class Container<T> {
>>    private T.flat value;  // here
>>
>>    public void set(T.flat value) {  // but also here
>>      this.value = value;
>>    }
>>
>>    public T.flat get() {  // and here too
>>      return value;
>>    }
>> }
>>
>> so yes it makes the generic code more cumbersome to write but it also makes
>> generic classes easier to use because the writer of the generics decide what
>> can be flattened (or not) and not the user of the generics.
>>
>>>> We still need .val and .ref to be able to specialize generics, right ? No, i
>>>> don't think so, we technically do not have to pass a .val as type argument to
>>>> be able to specialize a generic class, we just need to pass a type argument
>>>> that can be flatten if it's possible.
>>> Here’s where I disagree.  If field declaration and array creation expressions
>>> were the only places you needed to say .val, I’d be much more sympathetic to
>>> the container-properties model.  But in a world with specialized generics, we
>>> want to flow the types throughout, not only to field layout, but flowing the
>>> non-null constraint to the JIT, etc.  The `T.flat` approach will feel like a
>>> hack, because it is, and as an unbonus, people will forget almost all the time
>>> because having to select a storage class for an abstractly typed variable will
>>> feel unnatural.
>> People will forget T.flat as much as they will forget C.flat (C.val if you
>> prefer), that's true, but that the price to pay to be safe by default, in both
>> cases.
>> If you want to "fix" the potential missing T.flat, it's the same fix as with a
>> potential missing C.flat, have a way to declare a value class flat by default
>> at declaration site. But that's a separate discussion.
>>
>>> When I say ArrayList<Foo.val>, I want the properties of
>>> Foo.val to flow to *all* the places where a T is being moved around.
>> Maybe you want or maybe you don't, here is an interesting implementation of
>> ArrayList
>>
>> public classs ArrayList<E> {
>>    private E[] array;
>>    private int size;
>>
>>    public ArrayList() {
>>      array = new E.flat[16];   // ahah, flat by default !
>>    }
>>
>>    public boolean add(E element) {  // E is not flat
>>      if (element == null && !array.getClass().isNullable()) {
>>        var newArray = new E[array.length];  // need to store null, use a nullable array
>>        System.arraycopy(array, 0, newArray, 0, array.length);
>>        array = newArray;
>>      }
>>      if (array.length == size) {
>>        array = Arrays.copyOf(array, size * 2);
>>      }
>>      array[size++] = element:
>>      return true;
>>    }
>> }
>>
>> It starts with a flat array and if an element null is added, it "unflat" itself.
>> This implementation is interesting because once recompiled with the new
>> generics, a new ArrayList<Integer>() will use a flatten array by default.
>>
>> I've no idea about the performance of such kind of implementations, but using
>> T.flat give better control on what is flattenable or not in the implementation.
>>
>>> (This scheme rests on a clever but implicit assumption: that `T.flat` really
>>> means “as flat as T can be”, which for a ref, is “not at all.”  Its clever, but
>>> for this reason `T.flat` is kind of a misnomer.).
>> If it's a value class, T.flat can still flatten the value if the size is <= 128
>> bits but yes, T.flat means as flat as T can be.
>>
>>>> we can write instead
>>>>   value class C {
>>>>     // ...
>>>>   }
>>>>
>>>>   class Container<T> {
>>>>     private T.flat value;
>>> Yeah, this is where you lose me.  When you’re writing a generic class like
>>> ArrayList<T>, you’re abstracted from the details of heap layout, and it seems
>>> overwhelmingly likely you’d forget to say T.flat somewhere.  It also feels very
>>> “nonparametric”, because we’ve created a second, ad-hoc channel through which
>>> information flows, and that channel is “bumpier".  But its worse than that,
>>> because there’s less type information in the program, and therefore the VM has
>>> to make more conservative assumptions about nullity.
>> This have been true with the previous proposed storage hint models, but unlike
>> those, this model allows parameters to be declared as T.flat.
>> I think it is the missing piece so the VM as enough information by propagating
>> the T.flat so it does not need to make conservative assumptions.
>>
>>> I get what you are trying to accomplish; the ref/val distinction feels like it
>>> is almost something we can get rid of.  But I think swapping it for a storage
>>> class model is worse, because it is asking users to think about low-level
>>> details in more places, rather than using types and having the information flow
>>> with the types.
>> In more places inside the generic code, in less places inside the user code.
>> It's a trade i'm happy to make.
>>
>>> And as you point out, it means there are more possible ways
>>> nulls can get deeper into the system before NPEing.
>> yes, it can be as late as reaching a putField but it's because as a class writer
>> you have more control.
>> For example with List.of() which never allows null, delaying the NPE may provide
>> better error messages, a requireNonNull may be better than having a NPE at the
>> callsite like List.<C.val>of(null) will do.
>>
> > Rémi