value type hygiene

Tue May 15 01:53:31 UTC 2018

Hi John,

The answers below might depend on experimentation but what might you propose the behavior should be for the following code, assuming we have no specialized generics, ArrayList is not yet modified to cope better:

value class Point { … }

class VW {
  public static void main(String[] s) {
    List<Point> l = new ArrayList<>();
    l.add(P.default);
    l.add(P.default); // assuming this works :-)

    Point[] p = new Point[10]; // Flattened array is created
    l.toArray(p); // What should happen here?  
  }
}

(I know toArray is value hostile and maybe should be deprecated or changed but I find it a useful example to think about as it may be indicative of legacy code in general.)

Should the call to l.toArray link? If so then i presume some form of array store exception will be thrown when ArrayList attempts to store null into the flattened array at index 2?

Or:

  Point[] p = l.toArray(new Point[2]);

a flattened array is returned (the argument)? assuming System.arraycopy works.

Or:

  Point[] p = l.toArray(new Point[1]);

a non-flattened array is returned? since Arrays.copyOf operates reflectively on the argument’s class and not additional runtime properties. 

What about:

  Object[] o = l.toArray();

A non-flattened array is returned containing elements that are instances of boxed Point?

Paul.

> On May 14, 2018, at 4:02 PM, John Rose <john.r.rose at oracle.com> wrote:
> 
> On May 11, 2018, at 7:39 AM, Frederic Parain <frederic.parain at oracle.com> wrote:
>> 
>> John,
>> 
>> I have a question about the semantic within legacy class files (class
>> files lacking a ValueTypes attribute). Your document clarifies the semantic
>> for fields as follow:
>> 
>> "Meanwhile, C is allowed to putfield and getfield null all day long into its own fields (and fields of other benighted legacy classes that it may be friends with). Thus, the getfield and putfield instructions link to slightly different behavior, not only based on the format of the field, but also based on “who’s asking”. Code in C is allowed to witness nulls in its Q fields, but code in A (upgraded) is not allowed to see them, even though it’s the same getfield to the same symbolic reference. Happily, fields are not shared widely across uncoordinated classfiles, so this is a corner case mainly for testers to worry about.”
>> 
>> But what’s about arrays? If I follow the same logic that “old code needs to
>> be left undisturbed if possible”, if a legacy class C creates an array of Q,
>> without knowing that Q is now a value type, C would expect to be allowed
>> to write and read null from this array, as it does from its own fields. Is it a
>> correct assumption?
> 
> Yes, I avoided this question in the write-up.  To apply the same move
> as fields, we could try to say that arrays of type V[] created by a legacy
> class C do not reject nulls, while arrays of type V[] created by normal
> classes (that recognize V as value types) are created as flattened.
> 
> But the analogy between fields and array elements doesn't work in this
> case.  While a class C can only define fields in itself, by creating arrays
> it is working with a common global type.  Think of V[] as a global type,
> and you'll see that it needs a global definition of what is flattened and
> what is nullable.  I think we will get away with migrating types and
> declaring that legacy classes that use their arrays will fail.  The mode
> of failure needs engineering via experiment.  We could go so far as
> to reject legacy classes that use anewarray to build arrays of value
> type, without putting those types on the ValueTypes list.
> 
> This means that if there is a current class C out there that is creating
> arrays of type Optional[] or type LocalDate[], then if one of those types
> is migrated to a value type, then C becomes a legacy class, and it will
> probably fail to operate correctly.  OTOH, since those classes use
> factories to create non-null values of type Optional or LocalDate, such
> a legacy class is likely to refrain from using nulls.  I think it's possible
> but not likely that the author of a legacy class will make some clever
> use of nulls, storing them into an array of upgraded type V.
> 
> In the end, some legacy code will not port forward without recompilation
> and even recoding.  Let's do what we can to make it easier to diagnose
> and upgrade such code, as long as it doesn't hurt the basic requirement
> of making values flattenable.  The idea of making fields nullable seems
> a reasonable low-cost compromise, but making elements nullable a
> much higher cost.
> 
> Any need for a boxy or nullable array is more easily served by an explicit
> reference array, of type Object[] or ValueRef<VT>[].  Overloading that behavior
> into V[] is asking for long-term trouble with performance surprises.  Erased
> Object or interface arrays will fill this gap just as well as a first-class nullable
> VT.BOX[], with few exceptions.  I think those exceptions are manageable by
> other means than complicating (un-flattening) the basic data types of the VM.
> 
>> This would mean that the JVM would have to make the distinction between
>> an array of nullable elements, and an array of non-nullable elements.
> 
> We could try this, but let's prove that it's worth the trouble before pulling
> on that string.  I'm proposing Object[] and ValueRef<V>[] as workaround
> types.
> 
>> Which
>> could be a good thing if we want to catch leaking of arrays with potentially
>> null elements from old code to new code, instead of waiting for new code
>> to access a null element to throw an exception.
> 
> Why not try to catch the problem when the array is created?  Have the
> anewarray instruction do a cross check (like CLCs) between the base type
> of the array and local ValueTypes.
> 
>> In the other hand, the lazy
>> check solution allows arrays of non-nullable elements with zero null elements
>> to work fine with new code.
> 
> So, we have discussed the alternative of adding extra polymorphism to
> all value array types:  Some arrays are flat and reject nulls, while others
> are boxy and accept nulls.  But here again I want to push back against
> inventing a parallel set of boxy implementations, because it's a long term
> systemic cost for a short term marginal gain.
> 
> Besides, some library classes don't use native anewarray but use
> jlr.Array.newInstance to make arrays.  Do we make that guy caller-sensitive
> so he can tell which kind of array to make?  I think this is a long string to
> pull on.  It's easier to define something as "clearly in error" (see above)
> than to try to fix it on the fly, because you probably have to fix more and
> more stuff, and keep track of the fixes.  Like I say, long term cost for
> marginal migration improvements.
> 
> 
>> From an implementation point of view, the JVM already has to make the
>> distinction between flattened and not flattened arrays, so there’s a logic
>> in place to detect some internal constraints of arrays, but the nullable/
>> non-nullable element semantic would require one additional bit.
> 
> We *can* do this, but we shouldn't because (a) it's a long string to pull
> on a user model that is ultimately disappointing, and (b) it means that
> even optimized code, twenty years from now, will have to deal with
> this extra polymorphism.
> 
> — John
>