Valhalla EG minutes Feb 14, 2018

Tue Feb 27 03:59:10 UTC 2018

On Feb 20, 2018, at 7:52 AM, Karen Kinnear <karen.kinnear at oracle.com> wrote:
> 
> attendees: Tobi, Mr Simms, Dan H, Dan S, Frederic, Remi, Karen
> 
> ...
> III. Value Types
> 
> Latest LWorld Value Types proposal: http://cr.openjdk.java.net/~acorn/LWorldValueTypesFeb13.pdf
> Latest rough draft JVMS: http://cr.openjdk.java.net/~fparain/L-world/L-World-JVMS-4b.pdf
> 
> Feedback/Q&A:
> 
> 1. creation of a new value type - Remi
>    - why not vnew ? why default/withfield/withfield/withfield?
>    - transformations - e.g. Byteman - easier if arguments are on the stack
> 
> Frederic: First proposal had a factory bytecode, returning a single fully constructed value type
>  rejected: concern: cost of pushing all arguments, method signature and attribute to how signature maps to fields

Yep.  This is really a FAQ.  I'll take a shot at it.

"Why not just do vnew"?  Because vnew would be a complicated construct
requiring, for each vnew instruction, a detailed list of fields and values which
amount to a whole record type.  This would be resolved in the constant pool
(again, for each vnew instruction).  The constant pool would have to support
a new variadic constant type, as a primitive constant type to supply the needs
of vnew.  This is beyond the complexity of any other constant pool type to date.

The data required to correctly link is analogous to (but more complex than)
the code generation of the enum-switch code generation or strings-in-switch.
Like those features, it is best implemented by a metafactory, not a single VM
instruction.

Also, we have an independent need for one-field update to values (withfield)
to express "wither" methods and similar shapes.  But vdefault + withfield*
covers the same functionality as vnew.  And it reuses CONSTANT_Fieldref.

Also, Java constructors for value types translate naturally into vdefault +
withfield*.  They do not translate naturally into vnew.  Some Java
constructors translate naturally into vnew, but only those trivial ones
which perform no logic other than blank final field assignment.  Most
Java constructors are not so trivial, and we do not intend to limit the
expressiveness of value type constructors in such a way.

One objection to use of withfield is that it is hard to implement a series
of withfield instructions in the interpreter, without creating many
intermediate versions of a variable, each in its own buffer.  Of course
the JIT has no such problem, since it knows the liveness of each
intermediate value, and can immediately reuse storage known to be
unique to a particular value, to create the next version after a withfield.
The problem in the interpreter can also be fixed, e.g., by appropriate
use of approximate liveness tracking, such as reference counts.
Such a local implementation issue must not be allowed to overturn
the more basic design points noted above, which affect all
classfiles and all compilers.

Specifically:  If a value is loaded (using aload or vnew) to TOS, it
may well have a reference count of 2 or more.  But the first withfield
will rebuffer it to unaliased storage.  A chain of further withfields can
just modify it in place.  This pattern (of provably unaliased value
buffers) can be detected on the fly by mechanical reference counts,
or else by a lightweight pre-pass run at class load time, recoding
each subsequent withfield after the first as "patching_withfield".
Classfiles would not be allowed to mention patching_withfield,
because it is grossly unsafe, but the interpreter would safely
run those instructions where the class loader had determined
that the operation is safe.  There are lots of ways to skin this cat.

> Dan S: declared fields do not have an inherit ordering, so e.g. attribute to identify order
>   - expected usage: factory method in the value class itself
> 
> Dan: also want withfield exposed at the language level to allow tweaking one thing

Yes, this is very important.

> Karen: would be helpful to have a single way to create a value type or an object to allow more shared code
>   - model is to move all toward a factory mechanism

For object classes this single way is new + <init> + putfield*.  Plus reflective versions.
For value classes this single way is <make> + vdefault + withfield*.  Plus reflective.

The two cases are in close correspondence in order to allow constructors to
be translated compatibly for both object classes and value classes.

If we go to factory mechanisms it will be very frustrating (as with MVT) to
pain-gram the constructor translation strategy.  Remember that constructors
usually intermix field sets with method calls (including on 'this') and control
flow.  You can't fit that into a factory or metafactory.  You have to use a sequence
of elemental operations expressed by bytecodes.

> Frederic:
>   - inside factory - it is not the same bytecodes for value type and object type creation
>   - note: withfield returns a new value type - it does not have the same stack behavior as putfield

There are a superficial differences, but the similarities outweigh them.

The initial binding of 'this' in an object constructor is the result of the 'new'
instruction passed as the receiver argument to invokespecial <init>,
and stored in local number zero.   The initial binding of 'this' in a value
constructor is produced locally, using 'vdefault', and stored (if necessary)
in a local which does not correspond to any of the incoming arguments.
(In fact it could be stored at the root of the stack, in many but not all cases.)

Those two sequences are different but in the rest of the constructor body,
'this' is uniformly available, perhaps in a partially uninitialized state.
It's true: The object might have blank final fields containing their default
values because they have not been putfield-ed to. The value might have
fields containing their default values because they have not be withfield-ed
to.  There is no difference to the programmer:  Getfield in either case
will produce the default value.  The rules of definite unassignment in
the JLS make it hard to see this, of course, but it's there in both cases.

(Probably the JLS rules, as written today, are enough to ensure that
*no* value type can contort itself to observe an uninitialized field,
but this is not a necessary point.  The JVM can see a default field
value, if it wants, for both objects and values.)

After entry, the constructor body runs control flow and method calls,
mixed with assignments (one per field along any path, per JLS rules)
to the fields of the new instance (either object or value).  At the end
of the constructor, when it returns normally, the <init> method returns
void, leaving the passed-in object, in the required state.  The factory
method returns the new value, in the required state.  The way I
see it, these differences are at the surface, not in the basic
semantics of constructors for the two kinds of classes.

A final observation:  putfield and withfield have different stack
behaviors, in that putfield doesn't return a result (just keeps
hammering the same object over and over) while withfield
returns a new version of the value.  Again, this is a surface
difference, because the translation of a value class constructor
must simply pop the new version of the value and store it
in the local variable (mentioned above) which was initially
populated with a "vdefault".  In some cases, a translator
may be able to keep the value on the JVM stack through
a series of peephole optimizations, but this point doesn't
affect the semantic parity between the two classes of
constructors.

(Another note to us implementors:  It would be somewhat
reasonable, though uglier, if withfield took its first operand
from a local rather than from the stack, and updated that
local in place.  The iinc instruction is an pre-existing example
of this.  There would be two benefits to using such an
in-place withfield:  First, there would be no need in a
constructor or wither to push the local containing this
on the stack and then pop the new version back off, for
a small reduction in bytecode size.  Second, there might
be less duplication of buffers in an interpreter which used
reference counts to track buffer usage.  But these small
advantages do not seem to me to outweigh the relative
cleanliness of the current design of withfield.)

> Dan H: factory proposal is better than defaultvalue/withfield
>    - less throwing away extra created value types for the interpreter

I hate to disappoint Dan, but that would be the tail wagging
the dog; see above.

> 
> 3. withfield handling
> Remi: why withfield?
> Frederic: goal is to allow loop iteration with low cost 
> Remi: why restrict to within the value class itself?
> 
> Karen: concern: this creates a new value type, think of it as CopyOnWrite, it does NOT go through final 
> and update an existing value type. So this is heavyweight
> 
> Remi: could we have the language decide restrictions on its usage rather than the JVMS?

That's the current scheme:  We keep withfield private even if the field
is public.  This allows class writers to decide independently (a) how visible
to make fields for *reading*, and (b) how much trust they give clients
to *create* new values with arbitrary field settings.  If a value type
represents a checked capability, it must not be possible for external
users to forge arbitrary new capabilities, in an unchecked manner.
But public withfield would do this, or else force API designers always
to hide fields behind accessors.

Same point even if the value type isn't a capability, but just asserts
its right to validate and/or normalize field values.  Raw withfield
would subvert that.

A future version of withfield might allow a class to open up "raw"
withfield access.  But it is more likely that we will create a way for
a class to open up a more "cooked" version of such access, such
as some hook for "dumping" the state of a value and "reassembling"
it from an altered state.  In fact, that's just getfield* + constructor,
in many cases, so the API points are already there.  Crucially,
the class writer has control over validation, normalization, permission
checks, etc., in the reconstruction of the new value state.

> 
> Dan S: future - if we want a general purpose withfield - we may want to put that in with extended
> field access controls - e.g. separate read vs. write. At that time you could use withfield if the field were
> accessible. 
>  - e.g. with Records - may expose readability, not availability

Yes.  This is possible, even with object classes.  I like to name this
feature "sealed fields", since the sealed field is "usable but not
redefinable", where "usable" = readable and "redefinable" = writable.
(By analogy with sealed interfaces.)  Value fields are sealed
by default for reasons given above, but could be unsealed.
Object fields are sealed if final, but unsealed if non-final.
An intermediate state might make sense.

But:  When I work out use cases for these intermediate states,
I don't see anything promising yet.  So I think we can stick with
what we have now and make a note to reevaluate later.

> Frederic: concern about confusing people - withfield with an immutable object
> 
> Dan S: language could make this clearer that this is not an assignment, but is a “new”
> 
> Opinions?

Yes, we need a new syntax at the source code level to make it clear
that (a) an old value instance is being operated on, but (b) a new
version of that instance is the result of the operation.  It seems
promising to me to allow something like a constructor body (with
field values in scope under their own names) to pull this off.

But I don't know what this looks like, except maybe in the
easier case of a "named reconstructor" within a class:

__ByValue class Rational {
  public final long num, den;
  public Rational(long num, long den) {
    this.num = num; this.den = den;
    assert(den != 0);
  }
  public __Reconstructor neg() {
    // constructor rules here, except fields appear mutable
    num *= -1;  // aload L0; dup; getfield num; iconst -1; imul; withfield num; astore L0
    return;  // aload L0; areturn
  }
} 

The rule is, inside a constructor you can assign to your fields.
That's the rule already, of course, but in a reconstructor you can
do the same things, *plus* refer to previous field values.
(And perhaps 'this', although there's some doubt about which
version should apply:  Either the original or the current state.)

> 
> 4. arrays
> We need a new bytecode to create a flattenable/non-nullable array
> existing bytecodes do not create flattenable arrays with the new model of container marking flattenable
> rather than by type

Whoa.

I haven't yet seen a strong reason to do per-container flattenability.
I'd rather not do this unless there is a strong reason.

And even if we need this, there's no reason to burn a new bytecode;
it can be a reflective call, as java.lang.reflect.Array.newNullableInstance.
(Yes, I think the existing bytecode should make the correct flatness.)

> …

> 5. Arrays and nullability
> 
> Question: can you pass a VT[] where an Object[] is expected?
> Yes you can pass the argument, and sub typing works.
> 
> Frederic:  If you have an Object[], if you have non-flattenable values then elements are nullable, if you have flattenable values, then elements are not nullable

Yep.  We are eating the cost of buffering up flat elements inside
aaload, even though that requires a data-dependent check.

The costs of this can be reduced in the JIT, using profiling.

But if we allow flatness to be a new randomly changing bit
on instances, the profiling will be less effective (until we
profile that bit, perhaps, or perhaps not).

The JVM works very hard to feed type analysis to the JIT.
Let's think twice before we make flatness *not* a property
of types.

(Note to self:  This argument applies somewhat to frozen
arrays also.  But there reusing the same types seems to be
a forced move.  Non-flat arrays of flat types is *not* a forced
move.)

> 5. Generics and nullability
> 
> Dan S: With generics, value types will work as is.
>   In future, if we were to change a field to be non-nullable, then we could get NullPointerExceptions
> Karen: if we were to change a field to be non-nullable, then if we wanted to we could support a different layout,
> and that would require specialization if the field were non-nullable depending on the parameter type.
> 
> This is a current open challenge - how to handle migration to non-nullable fields and arrays

We are working cases on this.  It looks like the issue, often, is how
tolerant to be about "polluting nulls" in code which is not fully type-correct.
(It is not type-correct when old classfiles co-exist with new ones.)
The decision points are at places like getfield, putfield, invoke, return,
aastore, checkcast.  When do we allow a polluting null to pass by,
and when do we throw NPE?  The basic right move, I think, is to
throw NPE as soon as possible, as a service to the JIT, and also
to the user who wants to know something is fishy in the code.

But legacy classfiles must never throw NPE, since they can't know
any better.  This implies that a number of our bytecodes must be
sensitive to the classfile version, and throw NPE only on recompiled
code.  This behavioral divergence is not (IMO) enough to warrant
a new slew of bytecodes, but it is a tight fit to support both old and
new behaviors.  Maybe we want a bytecode *prefix* that means
"allow polluting nulls", and treat all relevant bytecodes in *old*
code as if that prefix were present.

> Note that in future we might want non-nullable identity objects as well as value types.

Yes.  In which case the prefix might mean "invert treatment of polluting
nulls".  Or maybe the prefix comes with flag bit to explicitly get the
NPE behavior vs. pass behavior.

We could also handle statically mandated null checks with a new
nullcheck bytecodes or even just invokestatic Os.requireNonNull.
That doesn't feel right to me, since the right behavior should also
look relatively simple in bytecodes, compared to the wrong behavior.

> To help migration, Brian would like us to find a way so that javac would detect a mismatch in expectations of nullability,
> so we catch them at compile time.

Ooh, that's a good idea.  When javac resolves use of an API in
a JAR that isn't up to Valhalla classfile version, it could check to
see if the API signature mentions return types which are statically
known to be value types.  (And in other covariant positions, maybe.)
That's an example of a classfile that could introduce polluting nulls,
and merits a warning.

Thanks for pushing all this forward!
— John