The storage hint model

forax at univ-mlv.fr forax at univ-mlv.fr
Wed Jul 20 20:05:34 UTC 2022


----- Original Message -----
> From: "Brian Goetz" <brian.goetz at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Wednesday, July 20, 2022 7:34:04 PM
> Subject: Re: The storage hint model

>> Yes, i know, we have already discuss several models like that. But i think, it's
>> a good idea to re-examine those because i believe they are more attractive
>> today.
> 
> Indeed, this has come up several times.  It is attractive to think of flattening
> entirely as a ’storage class’, and fair to reexamine it (this also came up in
> an internal discussion recently) but I think in the end this still will be a
> choice that we regret.
> 
>> The main issue with the .val model is that it presents two *types* to the user
>> while we really want is mostly to flatten the storage and have a precise the
>> method calling convention.
>> Those two goals are not equals, the first is far more important than the second,
>> to the point where the coding guideline proposed by Brian is to use .ref for
>> the parameters and .val for the fields and arrays.
> 
> FTR, the motivation for the the guideline here is “use .val where it makes the
> most difference.”  There’s nothing *wrong* with using val types on the stack,
> you just don’t get the enormous payback you do with heap variables.  But I can
> imagine — especially in a specialized-generics world — that there is value to
> using .val in APIs as well, because it carries the semantic “not null”
> information as well as the flattening hint.

T.flat carries the same semantics, the difference is that you have to explicitly use T.flat where you want the flattening in the generic code.

class Container<T> {
  private T.flat value;  // here

  public void set(T.flat value) {  // but also here
    this.value = value;
  }

  public T.flat get() {  // and here too
    return value;
  }
}

so yes it makes the generic code more cumbersome to write but it also makes generic classes easier to use because the writer of the generics decide what can be flattened (or not) and not the user of the generics.

> 
>> We still need .val and .ref to be able to specialize generics, right ? No, i
>> don't think so, we technically do not have to pass a .val as type argument to
>> be able to specialize a generic class, we just need to pass a type argument
>> that can be flatten if it's possible.
> 
> Here’s where I disagree.  If field declaration and array creation expressions
> were the only places you needed to say .val, I’d be much more sympathetic to
> the container-properties model.  But in a world with specialized generics, we
> want to flow the types throughout, not only to field layout, but flowing the
> non-null constraint to the JIT, etc.  The `T.flat` approach will feel like a
> hack, because it is, and as an unbonus, people will forget almost all the time
> because having to select a storage class for an abstractly typed variable will
> feel unnatural.  

People will forget T.flat as much as they will forget C.flat (C.val if you prefer), that's true, but that the price to pay to be safe by default, in both cases.
If you want to "fix" the potential missing T.flat, it's the same fix as with a potential missing C.flat, have a way to declare a value class flat by default at declaration site. But that's a separate discussion.

> When I say ArrayList<Foo.val>, I want the properties of
> Foo.val to flow to *all* the places where a T is being moved around.

Maybe you want or maybe you don't, here is an interesting implementation of ArrayList

public classs ArrayList<E> {
  private E[] array;
  private int size;

  public ArrayList() {
    array = new E.flat[16];   // ahah, flat by default !
  }

  public boolean add(E element) {  // E is not flat
    if (element == null && !array.getClass().isNullable()) {
      var newArray = new E[array.length];  // need to store null, use a nullable array
      System.arraycopy(array, 0, newArray, 0, array.length);
      array = newArray;
    }
    if (array.length == size) { 
      array = Arrays.copyOf(array, size * 2);
    }
    array[size++] = element:
    return true;
  }
}

It starts with a flat array and if an element null is added, it "unflat" itself.
This implementation is interesting because once recompiled with the new generics, a new ArrayList<Integer>() will use a flatten array by default.

I've no idea about the performance of such kind of implementations, but using T.flat give better control on what is flattenable or not in the implementation.

> 
> (This scheme rests on a clever but implicit assumption: that `T.flat` really
> means “as flat as T can be”, which for a ref, is “not at all.”  Its clever, but
> for this reason `T.flat` is kind of a misnomer.).

If it's a value class, T.flat can still flatten the value if the size is <= 128 bits but yes, T.flat means as flat as T can be.

> 
>> we can write instead
>>  value class C {
>>    // ...
>>  }
>> 
>>  class Container<T> {
>>    private T.flat value;
> 
> Yeah, this is where you lose me.  When you’re writing a generic class like
> ArrayList<T>, you’re abstracted from the details of heap layout, and it seems
> overwhelmingly likely you’d forget to say T.flat somewhere.  It also feels very
> “nonparametric”, because we’ve created a second, ad-hoc channel through which
> information flows, and that channel is “bumpier".  But its worse than that,
> because there’s less type information in the program, and therefore the VM has
> to make more conservative assumptions about nullity.

This have been true with the previous proposed storage hint models, but unlike those, this model allows parameters to be declared as T.flat.
I think it is the missing piece so the VM as enough information by propagating the T.flat so it does not need to make conservative assumptions.

> 
> I get what you are trying to accomplish; the ref/val distinction feels like it
> is almost something we can get rid of.  But I think swapping it for a storage
> class model is worse, because it is asking users to think about low-level
> details in more places, rather than using types and having the information flow
> with the types.  

In more places inside the generic code, in less places inside the user code. It's a trade i'm happy to make.

> And as you point out, it means there are more possible ways
> nulls can get deeper into the system before NPEing.

yes, it can be as late as reaching a putField but it's because as a class writer you have more control.
For example with List.of() which never allows null, delaying the NPE may provide better error messages, a requireNonNull may be better than having a NPE at the callsite like List.<C.val>of(null) will do.

Rémi


More information about the valhalla-spec-observers mailing list