LWorld1 initial phase proposal

Thu Jan 4 02:40:04 UTC 2018

(here's some of the "more later")

On Dec 13, 2017, at 11:58 AM, Karen Kinnear <karen.kinnear at oracle.com> wrote:
> 
> …
> V. Expected Behaviors:
> 
> …
> 
> 2. Java level APIs
>   isValue

There are a few places where we can ask whether something is a value:

1. Given a U-type(*) variable x, is that variable's current value a value class?  System.isValue(x)
2. Given a class mirror, is it a value class?  Class.isValue().
3. Given a reflected field, was it marked as (virtually) flattened, aka ACC_VALUE?  Field.isValue().

(In L-world, a U-type appears to the JVM as an L-type, often an actual
L-descriptor like "Lpkg.Foo;".  The term U-type refers to a new way of
using of L-type values.  Likewise, R-types also appear to the JVM as
L-types, but the term R-type refers to the legacy usage of legacy L-types.
A Q-type enables a new way of passing or storing operands by value
instead of by reference.  In L-world, there are mainly just L-types and
L-descriptors.  If we need actual Q-descriptors in the JVM, it is only
because sometimes there is no better way to encode the directive to
use a Q-type, that is pass or store an operand only by value.)

>   isFlattened (for a reflection Field or array)

Not sure what this is.  Same as #3 above?  Do we need, at the virtual
level (not the physical level) two separate concepts, value and flattened?
Or is this a distinction that applies only to the physical (sub-virtual) level?

There is a low-level query we can make, which is to ask what the JVM
has decided about physically (not virtually) flattening a field or a whole
class.  I would prefer to place that query out of reach, so that folks can't
shoot themselves in the foot.  It's a moral equivalent of @ForceInline,
something that is very easy to misuse, and of little value in portable
code.

>   isElementValue (for an array - not the same as is the array a Value)

Yes, although the correct term is "component", not "element".
Cf. getComponentType.

field : array element :: Field.getType : Class.getComponentType
field : array element :: Field.isValue : Class.isComponentValue

We are touching on a weakness of reflection, where we want
to use jl.Class to refer to types, but it isn't quite rich enough.
Brian calls a hypothetically enriched Class a "crass".  Such
a thing can report not only the "proper" class associated
with a type, but also its modifier (Q/R/U) or its template
parameters.

Given such a thing we can factor all the ancillary type queries
onto the common type:

interface CrassType {
  Class getProperClass();
  boolean isValue();  // Q-type
  boolean isObject();  // R-type
  boolean isUniversal();  // U-type
  //boolean isTemplate();
  //List<CrassType> getTemplateParameters();
}
class Class implements … CrassType { … }

The various in-context queries then turn into single crass-queries:

CrassType Class.getComponentCrassType();
CrassType Field.getCrassType();

>   ucmp - substitutability check (not overridable)

Given the cost of this, I prefer to phrase it as a library call instead
of a bytecode instruction:

System.isSubstitutableValue

and then something like:

System.getSubstitutableHashCode

These would be available to upgrade the various kinds of
identity-sensitive collection types we have today, to wean
them off of acmp and System.identityHashCode.

> 3. Support beyond MVT
>    Allow Value Types in static fields (not flattened)

Yep.  (My comments above apply post-MVT!)

>    Method and method invocation support
>    Interface support: default methods - must handle both L-types and Q-types
> 
> 4. LWorld bytecodes vs. JVMS 9
> Key Challenge: can we apply the same bytecodes to QTypes and RTypes? can they check dynamically without loss of performance?
>   special handling:
>      if_acmpeq/if_acmpne: false/true if either is a Qtype.

I.e., JVM detects a "NaN-like" condition:  Both operands are identical,
*but* both are also values (the same value).  In that case, we return
false for acmpeq.

> They should fall back to .equals

"They" = "User code".

It's important to say here that user code should fall back to Object.equals
or a similar method (such as System.isSubstitutableValue) if identical values
should be treated as identical.

When the fall-back is Object.equals, we have encountered LIFE as we
know it—the Legacy Idiom For Equality—which looks like "x==y||x.equals(y)".
There are several known forms of LIFE, because of details like null processing.

The JIT should optimize these bytecodes by detecting signs of LIFE
(ooh, I gotta million of these) and transforming to cheaper machine code.

For example, remove the value-detection part of the acmp semantics,
and refrain from calling Object.equals in the case of a value being
compared to itself.

This transformation requires that we trust that Object.equals fulfills
its contract, which requires a little more thought.  It's probably enough
to say "if the JVM can prove that two values are copies of each other,
it may short-circuit calls to Object.equals on them to true".  This basically
enshrines a promise of the Object.equals API into an optional optimization
in the JVM itself, which I think would be harmless, and it would enable
us to make the above transformation with a clear conscience.

>      ifnull/ifnonnull: as if acmp vs. Null: false/true if QType

Yep.  That's cheap, as long as no value is encoded by a null buffer
pointer.

>   needs dynamic different handling:
>      aaload/aastore: handle LType or QType dynamically

This is worth experimenting with; maybe it's cheap enough.
Something in the object's header (or one hop away) can say
whether the array is flattened or not, and the interpreter or
JIT can branch to the appropriate code.

>      aload/astore: handle LType or QType dynamically

We have found that the cost here is small, since the dynamic
check is only needed on return.

With Loom, we might need a more complicated check on yield.
That's when a number of frames are temporarily unwound from
a thread, to be remounted later on some other thread.  For that
case, anything buffered in the thread needs to be rebuffered
either on the fiber or in the heap.  I prefer the heap for starters,
since I am trying to resist all forms of mission-creep for fibers.
But there's probably some way to eventually use the same
fiber-local storage that the fiber uses to store the stack frames.

>      areturn: handle LType or QType dynamically

That's where the cost for aload/astore goes.

>   exception if wrong kind: ICCE
>      putfield: QType exception: ICCE
>      monitorenter/exit: exception for QType: ICCE
>      new: exception for QType: ICCE (expects uninitialized state)

Good.

>      aconst_null: exception for QType: ICCE

This bytecode doesn't take a type parameter, so it can't object
to a Q-type.  I think you mean:

     null value: exception when passed as QType: ICCE

There may be a role for NPE to play here.  For example, we
could have a u2q instruction which checks types *and* checks
for nullness, through CCE *or* NPE.

>      vdefault: exception for LType: ICCE

Or just return null.  When we come to templated generics we'll
have to reconsider this.

> 
>      withfield: exception for LType: ICCE

Yes.

> 
>   unchanged or already implemented (in MVT) or should fall out:
>      getfield: handle LType or QType dynamically (already implemented)

     putfield: handle LType or QType dynamically (already implemented)

>      newarray/etc.: handle LType or QType dynamically (already implemented)
>      athrow: always LType (subtype of Error)  - unchanged
>      invoke*: handle LType or QType dynamically (should fall out)
>      checkcast/instanceof: should fall out

A checkcast to a value type should throw NPE if presented with a null.

>      ldc: should fall out
> 
> V. Implementation use of explicit QType
> 1. Field descriptors
>   Goal: not require verifier or class file parser to load all fields.
>      ICCE if misclaimed, at first runtime mismatch (kind constraints)
>      To allow flattening, want field and arrays to explicitly use QTypes at language level

There's a fundamental choice here between denoting Q-types with
descriptors or modifier bits.  Let's explore using modifier bits as much
as possible.  Or do you want to put that off for later?

For flattenable fields, it's a choice of Q-descriptor vs. ACC_VALUE modifier.

> 2. Array descriptors: propose - yes
>      Remi: not needed - at array creation you know the element type

I like Remi's suggestion, because I'm trying to get us to a uniform non-use of Q-descriptors.

For flattenable arrays, it's a choice of Q-descriptor or dynamic "flattened" modifier on array.

Not sure what the following is arguing for, but I'll jump in anyway!

>      Propsal: uniformity
>        - confusing to explain inconsistency

Agree.  Why do arrays need an extra flattening decoration?  Isn't this just
one more thing that I might get wrong or might change after javac-time?

>        - javac already knows the information and has done the work, why slow down?

It knows this, but only at compile time.  Java can change classes later.

>        - safety - kind constraints - could be checked by verifier

If we do kind constraints, we should do them thoroughly, not "just for arrays".

This suggests a bytecode to impose kind-constraints:  kindcast [Q|R]
And one to test them:  iskind [Q|R].
The two could be coded with one code point, I suppose.
Or, overload them onto checkcast and instanceof,
with funny CP entries, Utf8["Q"], etc.

Hmm; just make those be API points, for now.  The JIT can intrinsify them.

It's not clear which bytecodes would benefit from kind-constraints, though.

> 3. Method descriptors : propose - no
>    1) No indication of receiver type
>    2) method descriptor parameters/return type - all use LTypes
>    Propose: do NOT support QTypes in Method Descriptors
>     challenge: descriptor mismatches based on migration
>         - for client inheritance/overriding, caller/callee matching

Yes, good!  (Pushing away those Q-descriptors, pushing, pushing, … gone!)

> Resolved Questions for LWorld1
> 1. Q: support other superclasses?
>    A: QType has no subclasses
>    A. for now - QType has only jlO as superclass, may be extended in future
>       (see if any that would break any optimizations)

I wish we wouldn't perpetuate this use of jl.O; it doesn't work for either
interfaces or value types.

Suggestion:  Consider allowing or requiring NULL (CP ref #0) as superclass
of value types.  And also of at least some interfaces.

(And an interface with jlO as super is re-interpreted as having I$Object
as one of its extends??)

> 
> 2. Q:acmp behavior options:
>    a) failing: return false <- propose for try 1

(agree)

>    b) throw exception
>    c) field-equality using ucmp as "substitutable" - field-wise comparison
>      general bit equality including floating point
>      may need to recurse on values buffereed

c is too expensive; let the user ask for it.

b is also expensive in that it's harder to optimize in the presence of LIFE.

and (my pet idea, just for the record):
   d) indeterminate (when given two identical values)

Both a and d can be optimized in the presence of LIFE.
d is marginally easier to optimize, while a is more deterministic

BTW, Guy Steele notes that a is easy to justify by claiming that
acmp really is a reference comparison, and that when either operand
is a value, it is converted to a new reference by some sort of boxing.
Since a new reference always compares distinct from a pre-existing
one, then you get the described behavior:  If either operand is a value,
you get a false reference comparison.

In the same vein, d can be justified by claiming that the boxing happens
as above, *except* the JVM is allowed (but not required) to produce equivalent
references for values it can prove to be equivalent.

>    A: LWorld1: if >= one operand is a QType: if_acmpeq -> false, if_acmpne -> true
>    A: null handling: as if acmp vs. NULL
>       if operand is a QType: ifnull -> false, ifnonnull->true
> 
> 3. Q: Do we need to know if an LType is an old L-Type or a new LType?
>    A: be on the lookout - we have not yet identified any cases
>       - note: If we do, we have ClassFileVersion
> 
> 4. Q: Any issues with argument passing/argument return handling if we have
>       runtime mismatches of kind?
>    A: If we have kind constraints then we should not get runtime mismatches
>    A: advanced error handling - check implications if verification is skipped

I think we need to do some more work on this one.  A classic example is
a descriptor LFoo; which refers to a class Foo that is never loaded.  Can
you work with methods that mention Foo as an argument or return type?
Sure you can:  Just pass null always.

Now suppose that Foo is a value-class, and it gets loaded an hour after
the JIT has processed all the hot code including methods mentioning Foo.
Now what do I do with all the nulls floating around in my system?
And am I forbidden to pass Foo operands by value?

> 
> 5. Q: Do we need a new carrier type?
>    A: TBD - so far requirement not identified.

The premise behind L-world is that the L-carrier type is flexible enough
to be extended to a U-type (and carry Q-values as well as R-values).

> 6. Q: What does it mean for LObject to be more like an interface?
>    A: TBD (TODO - ask Dan?)
>       A1. disallow adding fields
>       A2. superclass of references and value types (with modified methods)

(See above.)
> 
> 7. Q: Do we need a fast way to distinguish an LType from a QType? in the JVM itself?
>    A. Initial prototype can work with instancKlass vs. valueKlass
>       Later - look at potential optimizations

I think it's pretty clear that we need fast dynamic tests for all of the following:

1. a frame-local Q-value, for areturn (pass up to caller)
2. a thread-local Q-value, for aastore and putfield (convert to heap-buffer)
3. a Q-value (vs. an R-value), for acmp (short-circuit to false)

The third one should also be a user-callable API (like isValue, above).

> 8. Q: What does updated Object.hashcode do?
>     - field equality based hashcode
>     - assume cache in header optimization ?
>   A: Dan: call hashcode or identity hashcode (throw) - performance tradeoff

The default is to call System.identityHashCode (or an internal equivalent).
We can continue with that.  And System.identityHashCode should probably
throw, although it could also return the substitutability hash code.
The performance should be OK since identityHashCode is a slow-path call,
and we know we will have a fast test for Q-ness (because of acmp).

> 9. Q: Do we need a fast way for Java to determine ValueType (isValue call?)
>     A: Frederic proposed: e.g. give all value types a common super interface
>         e.g. ValueMarker
>   - verifier or class file format checking at class loading
>     - ensure that this can't be a superinterface if not a value type
>     - this is probably temporary, but useful

Simpler to have a value-bit on the class metadata block,
and then do isValue() := this.$class_metadata.isValue_bit.
(That's pseudo-code, of course.  No real code on this alias.)
Much faster than an interface check.

> 10. Q: Migration QType->LType support?
>   Customers will try this
>   A: Need to ensure we catch failures
>   challenges:
>     instance creation:
>       value type - must have a private constructor - so new will fail IAE
>         - except for nestmates (dynamically added which are not same compilation)
>         - except Reflection.setAccessible
> 
> 11. Q: Circularity handling for Field types?
>     A: today we get StackOverflowError - hotspot folks propose we leave that alone
>        to avoid complexity and costly testing

OK for a prototype, but we need a proper ClassCircularityError eventually.
We should repurpose the mechanism we use for today's ClassCircularityError
to detect the additional kinds of circularity; they are all of the same kind.

Principle stolen from C++:  Having a flattened field is much like having a
superclass, and vice versa.

> 12. Q: Class.forName() and internal loaded class cache
>     A: NO L vs. Q type naming here, there is no ambiguity
> 
> Unresolved Questions for LWorld1
> 1. Q: Reflection requirements/plans?

For now, do the messy contextual isValue query points.

And ask Brian for help designing a CrassType API,
anticipating more messy contextual query points
when templates arrive.

> 
> 2. Q. What can the verifier check, what do we want to check later to avoid early class loading?
>    A. Expect to create kind constraints, if we have bytecodes that require a value type
>       which would be checked when we load that value type and throw ICCE

It's not clear to me where we need kind constraints:  They seem like
a solution in search of a problem.  But maybe I'm just missing the point.

>    A. Verifier can still check primitive vs. LType - leave those checks in

(Yes; those are distinct "carrier types".  The Q- and R-types are not
distinct in that way, since they are carried by a common L-type in L-world.)

>    A. note: for loaded classes such as supertypes, value type fields or isAssignable checks, some classes are already loaded, so may not need a kind constraint
>    Q: Do we also need to do this for references - e.g. "new" bytecode?
>    Q: Is there a concern that by loading a class that uses the wrong bytecodes for someone else's kind - that could prevent a class from successfully loading?

Yep.

> ===================
> Early experiment:
> Add to JDK 11 (not MVT specific)
>  add dynamic checks for ValueBasedClasses
>     follow LWorld dynamic restrictions
>  add checks for ifacmp_eq/ne NOT followed by a call to .equals?
>  (todo: find Dan's corpus search results email)
>  todo: consider using JFR? This will be open in jdk11
> 
> Experimental steps:
> 1. new repo - remove MVT parts, just include -XX:+EnableValhalla tests (already split)
> 
> 2. LangTools:
> 2.1 test bytecode generator
>     - generate LWorld bytecodes (from simp
> 
> 2.2 Javac:
>   __ByValue for class declaration (for simplified JVMS bytecode changes document)
>      request:
>        super as java.lang.Object
>        generate LWorld bytecodes
>        LWorld restrictions checking
>          - e.g. allow superinterfaces , allow value types in static fields
> 
> 2.2 new static utility class for new bytecodes
>    isValue
>    isElementValue (for array)
>    isFlattened (for field or array)
>    ucmp
> 
> 2.3 java.lang.Object methods
>    Using isValue - rewrite
> 
> 2.4 real JVMS LWorld1 proposal - Dan
> 
> 3. Runtime
> a. write up simplified JVMS bytecode changes document (not verifier yet)
> b. interpreter bytecode changes - see list above
> c. migrate VVT tests to LVT1
> d. verifier -
>    propose JVMS verifier changes
>    propose kind constraint handling - to not add any additional
>      class loading
>      - note: value type fields, supertypes and isAssignable checks already perform loading
> e. method handle support
> 
> 4. JIT:
> a. bytecodes
> b. migrate VVT tests to LVT1
> c. adaptor generation
> d. optimizations
> 
> 
> 5. Core libraries:
> a. methodhandle support - e.g. changes to LambdaForms
>    - (simplified from MVT with no __Value)
>    - no boxing
> b. Reflection/new reflection?
>    TODO - what are the requirements here?
> 
> 6. Migration testing:
> Try VBC -> value type -- run tests and see what breaks

(This is a very good work-list.  I can't think of anything else to add to it at the moment.)