Moving from VVT to the L-world value types (LWVT)

Wed Jan 24 00:05:50 UTC 2018

Updated JVMS document with a few fixes and the Q-descriptors
removed (this removal changed only 3 lines!):

http://cr.openjdk.java.net/~fparain/L-world/L-World-JVMS-2.pdf

No attribute to list value classes has been added yet, so there’s
currently some issues for the verification.

Fred

> On Jan 23, 2018, at 16:25, Frederic Parain <frederic.parain at oracle.com> wrote:
> 
> Hi John,
> 
> thank you for the detailed feedback.
> 
> The Q-descriptor is not a fundamental part of the proposal, it is just an unsatisfying
> way for class files to express their expectations regarding types they think are value
> class types (to differentiate them from object class types). Q-descriptors provide this
> information but have drawbacks like the signature matching issue.
> 
> Remi’s proposal is appealing because it avoids the signature matching issue.
> An attribute is not the most convenient data structure for the JVM, but we can
> record the information elsewhere in our meta-data. However, it seems more
> brittle because the attribute can easily omitted, unless we make it mandatory
> after a given class file format number, with a slightly different syntax where all
> classes named in the class files have to be listed, so it can be verified. For
> older class file format, the attribute would be absent and all classes are assumed
> to be object classes.
> 
> We had two brainstorming sessions. yesterday and this morning, trying to figure
> out what would be the consequences of having only L-descriptors, with class
> files having different assumptions regarding the real nature of a type (object class
> or value class), either in the case of VBC migration or simply because of separate
> compilation. Some issues are related to the calling/returning conventions for the
> JIT compiled code. Some others issues are related to the class loader constraints,
> and the fact that a class with the wrong assumption regarding the nature of a class
> might prevent the real class from being loaded. The case where a class expects
> a Value Based Class (object class type) and the class is in fact a migrated value
> class seems to be OK. The case where a class expects a value class, but the
> class loader loads an object class seems much more problematic to us.
> 
> Regarding the migration of value based classes, trying to prevent null references
> from leaking into migrated code seems to be a step to far. We reviewed the issue with
> Karen this morning, and it doesn’t seems too dangerous to only check for null
> when the reference is stored in a field or array expecting an instance of a value
> class.
> 
> Thank you,
> 
> Fred
> 
> 
>> On Jan 19, 2018, at 23:22, John Rose <john.r.rose at oracle.com> wrote:
>> 
>> On Jan 16, 2018, at 12:56 PM, Frederic Parain <frederic.parain at oracle.com> wrote:
>>> 
>>> Here’s an attempt to bootstrap the L-world exploration, where java.lang.Object
>>> is the top type of all value classes (as discussed during the November meetings
>>> in Burlington).
>> 
>> This is excellent work, Frederic; thank you.  I'm really hopeful that we
>> are on the right track.
>> 
>>> ...
>>> Here’s a quick summary of the changes with some consequences on the HotSpot code:
>>> - all v-bytecodes are removed except vdefault and vwithfield
>> 
>> At some point we may want to strip the v-prefix from those survivors.  No hurry.
>> 
>>> - all bytecodes operating on an object receiver are updated to support values as well,
>>>  except putfield and new
>> 
>> Yep.
>> 
>>> - single carrier type for both instances of object classes and instances of value classes
>>> - this carrier type maps to the T_OBJECT BasicType
>>> - T_VALUETYPE still exists but its usage is limited (same purpose as T_ARRAY)
>> 
>> T_ARRAY can be a confusing source of bugs.  I've always wondered if it was worth it.
>> 
>>> - qtos TosState is removed
>>> - JNI: the jobject type can be used to carry either a reference to an object or an
>>>         array or a value. The type jvaluetype, sub-type of jobject, is used when only
>>>         a value class instance is expected
>>> - Q…; remains the way to encode value classes in signature (fields and methods)
>> 
>> I'd like to move towards an ACC_VALUE bit on both fields and classes.
>> Again, no hurry, but (as in my previous message) I'd like to retire Q-descriptors.
>> 
>>> - In the constant pool, the CONSTANT_CLASS_info entry type is used to store a
>>> symbolic reference to either an object class or a value class
>>> - the ;Q escape sequence is not used anymore in value class names
>>> 
>>> 
>>> One important point of this exercise is to ensure that the migration of Value Based Classes
>>> into Value Classes is possible, and doable with a reasonable complexity and costs. In addition
>>> to the JVMS update (and consistent with the JVMS modifications), here’s a set of proposals
>>> on how to deal with the VBC migration. 
>> 
>> I'm glad you are doing this analysis, not only because VBC migration is
>> a wonderful goal, but also because I think the same analysis is necessary
>> just to manage separate recompilation, even if we never decided to
>> migrate a single class.
>> 
>> In short, I see you are leaning hard on Q-descriptors, but I don't think
>> you are getting enough value out of them, and they cause serious
>> problems.  More comments below… 
>> 
>>> 
>>> Migration of Value Based Classes into Value Classes:
>>> - challenges:
>>>    - signature mismatch
>> 
>> Goes away when/if we retire Q-descriptors!
>> 
>>>    - null
>> 
>> Can be dealt with by assuming non-null and throwing dynamic NPEs
>> as needed where Q types are in play.  Also, we tolerate "polluting nulls"
>> along paths where the Q/R distinction is not available, even if (at some
>> point later on) we realize that it was a Q all along.  Eventually, the
>> polluting null will cause an NPE.
>> 
>> (In my view, the NPE should happen later than one might prefer if it were
>> a true coding error rather than a recompilation artifact.  Catching polluting
>> nulls early in the presence of recompilation requires too many heroics.)
>> 
>>>    - change in behavior
>> 
>> Yes, that's the tricky part.
>> 
>>> - proposal for signature mismatch:
>>>     - with LWVT, value class types in signatures are using the Q…; format
>>>     - legacy code is using signature with L…; format (because VBC are object classes)
>>>     - methods will have two signatures:
>>>       - true signature, which could include Q…; elements 
>>>       - a L-ified signature where all Q…; elements are re-written with the L…; format
>>>       - method lookup still works by signature string comparisons
>>>       - the signature of the method being looked up will compared against both the
>>>         true and the L-ified signatures, if the looked up signature matches the L-ified
>>>         signature but not the true signature, it means a situation where legacy code
>>>         is trying to invoke migrated code has been detected, and additional work might
>>>         be required for the invocation (actions to be taken have to be defined)
>>>      - signature mismatch can also occur for fields, this is still being investigating, the
>>>        proposal will be updated as soon as we have a solution ready to be published
>> 
>> This sort of thing is, for me, a rich argument against keeping Q-descriptors.
>> 
>>> - proposal for null references leaking to migrated code
>>>    - having a null reference for a Value Based Class variable or field is valid in legacy code
>>>      but it becomes invalid when the Value Based Class has been migrated to a Value Class
>>>    - trying to prevent all references with a value class type to get a null value would be very
>>>      expensive (it would require to look at the stackmap for each assignment to a local variable)
>> 
>> Yes.  We have to tolerate polluting nulls where the Q/R distinction is unavailable.
>> 
>>>   -  the proposed solution is to allow null references for local variable and expression stack slots,
>>>      but forbid them for fields or array elements (bytecodes operating on fields and array have to
>>>      be updated to throw a NPE whenever a null reference is provided instead of a value class
>>>      instance)
>> 
>> Yes, I think this is on the right track.  On paths where a Q-type is needed
>> we do a null check.  That's the Java way.
>> 
>>>   - null references are likely to be an issue for JIT optimizations like passing values in registers
>>>     when a method is invoked. The proposed solution is to only allow null references for value classes
>>>     in legacy code, by detecting them and blocking them when leaking to migrated code. The
>>>     detection can be done at invocation time, when a mismatch between the signature expected
>>>    by the caller and the real signature of the callee is detected (see signature mismatch proposal above)
>> 
>> At some point, a polluting null might reach code that "knows" there is a Q type
>> (and may even "know" that it goes in an xmm register).  That's the point where
>> an NPE should be thrown.  In some cases, a deopt might be appropriate, to
>> correctly order the NPE by executing interpreter code.
>> 
>> Note that this combination of techniques does not Q-descriptors.  The lack
>> of Q-descriptors doesn't totally destroy the Q/R distinction; it just means you
>> have to execute a little further before you get to code which "knows" that
>> the null is illegal.
>> 
>>>  - the null reference should also be detected and blocked when it is used as a return value and the
>>>    type of the value to be returned is a value class type 
>> 
>> Doing this requires (a) Q-descriptors in method returns, (b) Remi's
>> ValueTypes table, or (c) toleration of nulls in the interpreter.  (The JIT
>> doesn't have to tolerate nulls:  It can deopt if it hits a surprise null,
>> or perhaps throw an early NPE.)  So, I am arguing for (c).
>> 
>>> In addition to the JVMS update, here’s a chart trying to summarize the new checks that will have to
>>> be added to existing bytecode when moving the vbytecodes semantic in to a* bytecodes. The categories
>>> in the chart are not very precise, but we can use it as a starting point for our discussions. The chart
>>> can also help defining which experiments could be done to estimate the costs of the different additional
>>> checks needed to be added to existing bytecodes.
>> 
>> The chart is really helpful, thanks.  More comments later.
>> 
>> Onward!
>> 
>> — John
>> 
>> 
>