Moving from VVT to the L-world value types (LWVT)

Wed Jan 24 00:22:31 UTC 2018

On Jan 23, 2018, at 1:25 PM, Frederic Parain <frederic.parain at oracle.com> wrote:
> 
> Hi John,
> 
> thank you for the detailed feedback.

You are welcome; I'm really glad how well this work is going.

> The Q-descriptor is not a fundamental part of the proposal, it is just an unsatisfying
> way for class files to express their expectations regarding types they think are value
> class types (to differentiate them from object class types). Q-descriptors provide this
> information but have drawbacks like the signature matching issue.

Good; I agree.

The most important place we were using Q-descriptors was field declarations.
Have you tried checking an ACC_VALUE_TYPE bit in value-typed fields?
The semantics could be as follows:
  A. if the bit is set, load the field descriptor class
      A.1 if the field descriptor class is a value class, DTRT
      A.2 if the field descriptor class is something else,
       ignore the ACC_VALUE_TYPE bit (no error)
  B. if the bit is clear, do not load the field descriptor class; allocate a reference and initialize with null
      B.1 when storing the reference, test for a buffered value, and store a heap buffer as needed

The net of the above semantics is (1) if everybody agrees on the ACC_VALUE_TYPE
bits, then we get the semantics we want.  But also (2) the ACC_VALUE_TYPE bit
becomes (in other cases) an indication to "flatten if possible", and its absence
means "don't flatten".  That would seem to be not only a good binary compatibility
story, but also a useful knob for language implementors.  Of course, the JVM
also has vote:  It could choose to flatten or not, internally, and nobody would
know.

> Remi’s proposal is appealing because it avoids the signature matching issue.
> An attribute is not the most convenient data structure for the JVM, but we can
> record the information elsewhere in our meta-data. However, it seems more
> brittle because the attribute can easily omitted, unless we make it mandatory
> after a given class file format number, with a slightly different syntax where all
> classes named in the class files have to be listed, so it can be verified. For
> older class file format, the attribute would be absent and all classes are assumed
> to be object classes.

If we use the semantics suggested above for fields, do we have any
need for Remi's attribute?  If not, let's start with the field modifier bit.

> We had two brainstorming sessions. yesterday and this morning, trying to figure
> out what would be the consequences of having only L-descriptors, with class
> files having different assumptions regarding the real nature of a type (object class
> or value class), either in the case of VBC migration or simply because of separate
> compilation.

Yes, these issues have to be resolved, and I think we have some good
options on the table.

> Some issues are related to the calling/returning conventions for the
> JIT compiled code.

I don't think the JIT's calling sequences need to affect the classfile design.
The JIT usually operates with full global information about types, and
on the other hand, if global information is sometimes missing, the JIT
routinely backs off to a safe assumption.  In this case, an unknown
type can be passed (suboptimally) as L-Object in L-world.

> Some others issues are related to the class loader constraints,
> and the fact that a class with the wrong assumption regarding the nature of a class
> might prevent the real class from being loaded.

IMO, the best way to deal with CLCs is not make new kinds of them.

> The case where a class expects
> a Value Based Class (object class type) and the class is in fact a migrated value
> class seems to be OK.

Awesome!

> The case where a class expects a value class, but the
> class loader loads an object class seems much more problematic to us.

Please list the problems?  Some of them can be dealt with as
noted above by backing off to L-Object (which is a heap pointer).

> Regarding the migration of value based classes, trying to prevent null references
> from leaking into migrated code seems to be a step to far.

I am so glad to hear this.  I fully agree.

> We reviewed the issue with
> Karen this morning, and it doesn’t seems too dangerous to only check for null
> when the reference is stored in a field or array expecting an instance of a value
> class.

Excellent; let's prototype that and see how it feels.

— John
> 
> Thank you,
> 
> Fred
> 
> 
>> On Jan 19, 2018, at 23:22, John Rose <john.r.rose at oracle.com> wrote:
>> 
>> On Jan 16, 2018, at 12:56 PM, Frederic Parain <frederic.parain at oracle.com> wrote:
>>> 
>>> Here’s an attempt to bootstrap the L-world exploration, where java.lang.Object
>>> is the top type of all value classes (as discussed during the November meetings
>>> in Burlington).
>> 
>> This is excellent work, Frederic; thank you.  I'm really hopeful that we
>> are on the right track.
>> 
>>> ...
>>> Here’s a quick summary of the changes with some consequences on the HotSpot code:
>>> - all v-bytecodes are removed except vdefault and vwithfield
>> 
>> At some point we may want to strip the v-prefix from those survivors.  No hurry.
>> 
>>> - all bytecodes operating on an object receiver are updated to support values as well,
>>>  except putfield and new
>> 
>> Yep.
>> 
>>> - single carrier type for both instances of object classes and instances of value classes
>>> - this carrier type maps to the T_OBJECT BasicType
>>> - T_VALUETYPE still exists but its usage is limited (same purpose as T_ARRAY)
>> 
>> T_ARRAY can be a confusing source of bugs.  I've always wondered if it was worth it.
>> 
>>> - qtos TosState is removed
>>> - JNI: the jobject type can be used to carry either a reference to an object or an
>>>         array or a value. The type jvaluetype, sub-type of jobject, is used when only
>>>         a value class instance is expected
>>> - Q…; remains the way to encode value classes in signature (fields and methods)
>> 
>> I'd like to move towards an ACC_VALUE bit on both fields and classes.
>> Again, no hurry, but (as in my previous message) I'd like to retire Q-descriptors.
>> 
>>> - In the constant pool, the CONSTANT_CLASS_info entry type is used to store a
>>> symbolic reference to either an object class or a value class
>>> - the ;Q escape sequence is not used anymore in value class names
>>> 
>>> 
>>> One important point of this exercise is to ensure that the migration of Value Based Classes
>>> into Value Classes is possible, and doable with a reasonable complexity and costs. In addition
>>> to the JVMS update (and consistent with the JVMS modifications), here’s a set of proposals
>>> on how to deal with the VBC migration. 
>> 
>> I'm glad you are doing this analysis, not only because VBC migration is
>> a wonderful goal, but also because I think the same analysis is necessary
>> just to manage separate recompilation, even if we never decided to
>> migrate a single class.
>> 
>> In short, I see you are leaning hard on Q-descriptors, but I don't think
>> you are getting enough value out of them, and they cause serious
>> problems.  More comments below… 
>> 
>>> 
>>> Migration of Value Based Classes into Value Classes:
>>> - challenges:
>>>    - signature mismatch
>> 
>> Goes away when/if we retire Q-descriptors!
>> 
>>>    - null
>> 
>> Can be dealt with by assuming non-null and throwing dynamic NPEs
>> as needed where Q types are in play.  Also, we tolerate "polluting nulls"
>> along paths where the Q/R distinction is not available, even if (at some
>> point later on) we realize that it was a Q all along.  Eventually, the
>> polluting null will cause an NPE.
>> 
>> (In my view, the NPE should happen later than one might prefer if it were
>> a true coding error rather than a recompilation artifact.  Catching polluting
>> nulls early in the presence of recompilation requires too many heroics.)
>> 
>>>    - change in behavior
>> 
>> Yes, that's the tricky part.
>> 
>>> - proposal for signature mismatch:
>>>     - with LWVT, value class types in signatures are using the Q…; format
>>>     - legacy code is using signature with L…; format (because VBC are object classes)
>>>     - methods will have two signatures:
>>>       - true signature, which could include Q…; elements 
>>>       - a L-ified signature where all Q…; elements are re-written with the L…; format
>>>       - method lookup still works by signature string comparisons
>>>       - the signature of the method being looked up will compared against both the
>>>         true and the L-ified signatures, if the looked up signature matches the L-ified
>>>         signature but not the true signature, it means a situation where legacy code
>>>         is trying to invoke migrated code has been detected, and additional work might
>>>         be required for the invocation (actions to be taken have to be defined)
>>>      - signature mismatch can also occur for fields, this is still being investigating, the
>>>        proposal will be updated as soon as we have a solution ready to be published
>> 
>> This sort of thing is, for me, a rich argument against keeping Q-descriptors.
>> 
>>> - proposal for null references leaking to migrated code
>>>    - having a null reference for a Value Based Class variable or field is valid in legacy code
>>>      but it becomes invalid when the Value Based Class has been migrated to a Value Class
>>>    - trying to prevent all references with a value class type to get a null value would be very
>>>      expensive (it would require to look at the stackmap for each assignment to a local variable)
>> 
>> Yes.  We have to tolerate polluting nulls where the Q/R distinction is unavailable.
>> 
>>>   -  the proposed solution is to allow null references for local variable and expression stack slots,
>>>      but forbid them for fields or array elements (bytecodes operating on fields and array have to
>>>      be updated to throw a NPE whenever a null reference is provided instead of a value class
>>>      instance)
>> 
>> Yes, I think this is on the right track.  On paths where a Q-type is needed
>> we do a null check.  That's the Java way.
>> 
>>>   - null references are likely to be an issue for JIT optimizations like passing values in registers
>>>     when a method is invoked. The proposed solution is to only allow null references for value classes
>>>     in legacy code, by detecting them and blocking them when leaking to migrated code. The
>>>     detection can be done at invocation time, when a mismatch between the signature expected
>>>    by the caller and the real signature of the callee is detected (see signature mismatch proposal above)
>> 
>> At some point, a polluting null might reach code that "knows" there is a Q type
>> (and may even "know" that it goes in an xmm register).  That's the point where
>> an NPE should be thrown.  In some cases, a deopt might be appropriate, to
>> correctly order the NPE by executing interpreter code.
>> 
>> Note that this combination of techniques does not Q-descriptors.  The lack
>> of Q-descriptors doesn't totally destroy the Q/R distinction; it just means you
>> have to execute a little further before you get to code which "knows" that
>> the null is illegal.
>> 
>>>  - the null reference should also be detected and blocked when it is used as a return value and the
>>>    type of the value to be returned is a value class type 
>> 
>> Doing this requires (a) Q-descriptors in method returns, (b) Remi's
>> ValueTypes table, or (c) toleration of nulls in the interpreter.  (The JIT
>> doesn't have to tolerate nulls:  It can deopt if it hits a surprise null,
>> or perhaps throw an early NPE.)  So, I am arguing for (c).
>> 
>>> In addition to the JVMS update, here’s a chart trying to summarize the new checks that will have to
>>> be added to existing bytecode when moving the vbytecodes semantic in to a* bytecodes. The categories
>>> in the chart are not very precise, but we can use it as a starting point for our discussions. The chart
>>> can also help defining which experiments could be done to estimate the costs of the different additional
>>> checks needed to be added to existing bytecodes.
>> 
>> The chart is really helpful, thanks.  More comments later.
>> 
>> Onward!
>> 
>> — John
>> 
>> 
>