minimal value types proposal

Thu Sep 1 00:14:57 UTC 2016

On Aug 30, 2016, at 10:56 AM, Dan Smith <daniel.smith at oracle.com> wrote:
> 
>> On Aug 29, 2016, at 6:04 PM, John Rose <john.r.rose at oracle.com> wrote:
>> ...
>> For the user, a boxed representation is needed for basic debuggability.  What does println or JVMTI do unless there's a box?
> 
> One option is that JVMTI knows about value types, as it does primitives, and provides a printout of the fields. Or maybe ValueTypeSupport has a debugString operation that does this.

That's where we have to get to, but (and this is one of my key points
so I'll keep repeating it) we can delay filling in stuff like that if we lean
on the tools we have, which work with objects not values.

As long as we keep those inelegant value-based POJOs (VaBaJOs)
as proxies for values, most of our toolchain just works, under the
illusion that those POJOs are the real thing.  Eventually we will
want something better, maybe even very different, at which point
the tools will need to shift.

(Serendipity:  "va abajo" means "go down" in Spanish.)

I am advocating a very low road here with VaBaJOs
as proxies for real values.  But this move is known to work,
short term.  It is a key tactic in our successful Vector work.

> Or we use a naming convention -- value types are expected to provide 'toString', and/or 'box'. (I don't mind the boxes themselves, just the automatic aspect of them.)

Yes, this is a good idea.  If you want to put a method
on the value, you can define a suitably named static method
on the POJO, and the MH lookup runtime will bind through
to it.  But note that binding through to POJO non-static
methods is perfectly reasonable too.  We know we can
make the temporary box go away in optimized code.
Basically, you are proposing a naming convention, which
is good, but there is already a pretty good one in place.
And, if the non-static POJO methods are non-private, you
get an extra alignment effect between the POJO API
and the underlying value API.  Is this too magic?  I think
it is useful!

I am noticing that several folks are bothered by the boxes.
Which means I need to issue a disclaimer about them,
as follows:

This proposal does *not* attempt to fix a design for boxes,
although it *does* exhibit what (I thought, until a few days
ago) folks would regard as a reasonable model, that
of an L/Q duality of types derived from one source type.
(We have talked about this many times in many venues
since 2013!)  In any case, regardless of whether we have
L/Q duality, or more "type-opaque" boxes (perhaps
exposing *only* P-interfaces including P-Object), or
perhaps user-written helper boxes like java.lang.Integer
(to which I always say "yuck not that again"), or even
no boxes at all, except perhaps some mysterious
universal monad-shaped value-holder.  Of all those
options, I current prefer the L/Q duality design, *but*
none of the above options are precluded by the minimal
design as proposed.  The minimal design makes it
certain that some parts will be discarded and superseded
in the end.

There are two reasons for proposing this encoding
of value types as VaBaJOs.  1. The tool chain doesn't
interoperate with anything else (unless we do much
worse stuff with primitives or arrays).  2. The JVM
can be hacked to perform automatic transforms on
class loading to extract the value part from the POJO
part, and create both aspects internally, and doing
this from a single class file makes certain (proof by
construction) that the parts will be aligned with each
other.  #2 is a parsimony argument, but it's more than
just esthetic.

>> I do like the idea of requiring the user to set up both classes manually, at first.  It has the advantage of making very clear (all too clear) the distinction between the Q-type and the L-type:  No source code defines both; the Val guy would (presumably) disable its L-type so people could not use it.
> 
> Yes, that's what I have in mind.

The advantage would have to be strong enough to override
frictions resulting from loss of alignment between the parts.

Here's an intermediate proposal, which I think is better,
because it keeps alignment:  Have a single source file.
When the JVM loads a marked class file, it extracts the
"data structure" part (just the fields) and reassembles it
into a value type with no methods, just fields.

The fields are wrapped in a primitive value struct with no
methods.  That struct can be be referred to by Q-types
(including bytecode descriptors but not class owners),
method handles, and vload/vstore/vreturn.  The MH
runtime can spin

To avoid confusion, we could also add distinct entries
to the JVM's type name dictionary:  The POJO is called
"Foo" and the VT is called "Foo$Value", as if from
the following nested definition:

  @DeriveValueType class DoubleComplex {
     //final double re, im;  // moved to Value
     static __ByValue class Value {
       final double re, im;
     }
     final Value value;
     double realPart() { return re; }  // getfield re => $value.re
  }

(An actual syntax like that can be added later, and users can
experiment further with diverging box types from value types,
even putting them in separate files, if they must.)

Inside methods like DoubleComplex.realPart, the field
reference to "re" resolves to the nested reference re.value.
In other words, the VM's link resolver forgives the original
bytecode for referring to the non-existent "re" and silently
substitutes an ad hoc nested reference.  This can be done
also by bytecode rewriting but I think the fastest way to
prototype is with an extra path in the JVM's resolver.

In the language of descriptors, the name LDoubleComplex;
would continue to refer to the POJO.  (It has to, or else
none of our tool chain will work on these things.)  What
is the Q-type?  There are options here too, but the most
natural thing to do, if we want to separate the Q-type
(IMO temporarily) from the L-type, the value type descriptor
will be QDoubleComplex$Value;.

This has the advantage that, as we evolve javac to know
more about Q-types, we can make an early experiment
with using DoubleComplex.Value (the nested type) as a
way of uttering the Q-type from Java code.

Somebody is surely thinking, "Wait, blockhead, that's all
backwards!  The value type comes first, and the box type,
if it exists at all, is some part or derivative of the value!
The Q-type should be DoubleComplex and the L-type,
if any, should be DoubleComplex.Box."  (With a dash of,
"These box-lovers are really obtuse", and "So hard to get
good help these days.")  At least, that's pretty much what
I would be thinking, given (only) the overall vision of
"codes like a class, works like an int", etc.

My excuse (and it's a good one) for putting the box type
first and the value type second is that's how our tool chain
operates today.  After we bootstrap ourselves for a while,
we will discard our prototypes, and recode then in the
natural values-first style.  In other words, reasons 1 & 2
above.

Someone might ask, "what about methods on the value
type"?  They all get left beyond on the POJO in the above
load-time translation.  I think that's a good thing; the
minimal proposal exposes value-type methods, and
even fields, only through method handles.  That means
that we can experiment with various models for binding
methods to value types, simply by changing the MH
Lookup runtime.  This is far easier than changing the
JVM interpreter or link resolver, so we can do more
iterations to find the sweet spot.

The POJO carries all the payload except the layout of
its data fields.  XXXX

I hope this makes clear how the linkages between the
value and its POJO, the Q-type and the L-type, are
crucial to a prototype, but also subject to deep revision.

Because it's all going to be revised, I am willing to break
name symmetry (L/Q duality) if that's the clearest
and simplest way to get "hello world" from values.
I expect to re-establish the duality in the final design,
unless we end up with something even better.

>>> 2) Instance methods also add tons of complexity.
>> 
>> I disagree; I think the incremental complexity is comparable to trying to do everything with statics, which is why I'm recommending this in the minimal model.
>> 
>> The only invocation paths for instance methods (and instance fields) on Q-types is through method handles.  Method handles treat all arguments (including 'this') symmetrically, so any effort applied to have them work on Q-types *at all* will apply to 'this' parameters for Q-types.
> 
> It's a given that you have statics (see your Int128 declaration, for example). If instance methods are practically free after that, fine. But if not, there's no particular reason to support them. We don't have polymorphism (for Q types, anyway -- I'm assuming no automatic boxes, per (1)), and it's just as easy -- easier, in fact -- for a client to do "invokestatic Val.m(QVal;)I" as it is to do "invokedynamic [vinvoke ...]".

That's true, if we can settle on a stable enough translation strategy
that we can map virtual-appearing APIs to static methods.  The indy
trick would let us put off some of that.  Why put it off?  Well, static
methods are already standard API points, and binding them from
fluent (x.f()) calls as well will come as a surprise, because of the alias.

(This is a little like my other proposal, for fluent invocation of static methods:
   interface List<T> { static List<T> freeze(List<T> this); … }
In both cases, you have a static method that you want to invoke using
instance-method syntax.)

But, in any case, static methods (defined on the POJO!) will play a
large role in shaping the behavior of the value type.  In the minimal
proposal, the "Lookup.findVirtual" thing is really just an approximation
to the eventual syntax of value type behavioral methods.  I.e., we
don't define bytecodes, but give a strong hint of what they will look
like by defining the corresponding MH API in the given shape.

If we decide that value types don't need the static-virtual distinction
at all, then we can translate to statics only, and make various other
adjustments and simplifications.  That's not a question to answer here,
and the minimal proposal gives the beginnings of virtual methods
(without requiring classfile support for them) as a first step towards
validating that user model.  ("Codes like a class" would seem to
imply we have both kinds of methods, but maybe we will find out
otherwise?  That's not a question to settle now, of course.)

>> Perhaps you are objecting to the inefficiency of operating on 'this' in the boxed L-type form, when the operation starts as a MH-based invocation of a Q-type?  That's only a startup transient; there are several tactics we can use to remove it.  For example, box elision (already in the JITs, though not value-friendly yet) would remove boxing overheads without requiring any manual recoding at all.
> 
> Yeah, this is one of the big things that jumps out at me. Our end goal is to define instance methods with a Q-typed 'this'.

Exactly.  The minimal proposal is a somewhat indirect simulation of that goal.
But it is efficient enough for benchmarks.

> Having an intermediate step of instance methods with an L-typed 'this' doesn't seem productive. Yeah, there's some engineering we may want to do anyway to get default methods with L-typed 'this' to be efficient, but I'd prefer to keep that engineering off of the critical path. Write your bytecode with a Q-typed 'this' (or static, with no 'this' at all), and we don't have to hope that the JIT will optimize.

Two answers:  1. It's already pretty efficient.  2. We can easily do what you propose,
if necessary.  To do #2 we change javac and the MH Lookup runtime, not the JVM.

>> Convenience and migration cannot be driven to zero; that optimizes for "minimal" at the expense of "viable".  To preserve viability, there are at least a few really basic conventions, like Object.toString, that would have to be re-encoded using such statics.
> 
> As I commented above, I'm not opposed to naming conventions that ensure a 'toString' or 'box' method exists. And if you're invoking Object.toString, you're first going to have to box, anyway. It's just as easy to do "invokestatic Val.box(QVal;)LObject;" as it is to do "invokedynamic [asType ...]".

Answered above.  (When you say that it makes me hope we can get List.freeze in the same way!)

>> Re-building virtuals (at least some of the) on top of statics has its own cost, in wasted motion and confusion.
> 
> I'd like to understand this better. You're talking about the confusion involved with training people to invoke static methods, only to tell them later that they can use instance methods, too?

Yes.  Removing fluent calls is not a deal-breaker, but it will feel like
a cut, and too many such cuts will make the prototype programming
model too difficult even for super-users.  There's also the question of
wiring up Object methods and (probably) Comparable.compareTo.
As well as other interfaces that programmers may invent to help them
wrangle these things.  (No interfaces?  That's another cut…)

>> We can and should work towards real Q-typed 'this'.  The simplest way is what I'm proposing with the method handle hack.  In addition, I suggest experimentally modifying javac to emit two copies of non-static methods in value-capable classes, one with the standard bytecodes, and one as a static (with mangled name) which takes a Q-typed 'this' in local 0.  Then teach the method handle resolver to find these guys and bind them, in preference to the boxed-this dance.  Users can get on with their business, unaware of all of this.
> 
> javac doesn't generate value classes at all (at least in the first cut). That aside, yes, any scheme in which a Q-typed 'this' is expressed directly is an improvement in my book.

If we use the above model (L-DoubleComplex, Q-DC$Value) then javac will
indirectly generate value classes, but with no 'this' (because no methods).

Hmm… I suppose given the above model we could work towards hollowing
out the API surface of the POJO.  It would have only private static methods
taking a leading Q-type argument.  The MH.Lookup binder will happily attach
those methods to Q-types as if they are "virtuals".

Next step:  Have javac compile "__ByValue class Foo { … }" without any
mention of the POJO at the source level, and emit the POJO only as a carrier.
(The VaBaJO really goes down underwater at that point.)  Eventually, after
exploring user models a while, have a proper load-format that doesn't rely
on the POJO at all.

(Spoiler alert:  I want to keep the POJO API but derive the POJO from
the primary value-type source, the value class, as a sort of type projection,
analogous to the species projections we have talked about with specialization.)

> 
>>> 3) The minimal feature set for basic operations -- field getters, default value, withers, comparison, arrays -- is a class (e.g., ValueTypeSupport) with bootstrap methods that can be called via invokedynamic. No need to touch MethodHandles.Lookup, etc.
>> 
>> I don't think the cost of touching MH.Lookup is great, especially given that the MH runtime will have to be able to work with Q-types more or less pervasively.  I agree that all the extended lookup functionality could be placed on a new class (alongside findWither etc.), but I don't see any benefit to doing that.
> 
> Okay, cool. I suppose my main discomfort is that, if we embed behavior in existing APIs, it's easier to overlook that change later and forget to put the proper design & specification effort into it. Things get baked in just because they're already there. But if we can avoid that problem, great.

There is at least a major review step when we check in the Lookup runtime changes.
At that point (if all else fails) we can carefully note where existing APIs are getting
experimental changes.  And, if our consciences bother us at that point, we can
separate out the experimental APIs, rather than making them experimental modes
of standard APIs.  For now, I'd rather not settle that; I'd rather do whatever is expedient.

> 
>>> More generally, why so much attention given to reflection? Sure, you need class objects to represent all the JVM's types. But member lookup? Fields, Methods, Constructors? These do not seem necessary.
>> 
>> Because method handles are where the functionality comes from; you need basic reflection in order to mention the method handles you want. Bytecode spinning is not enough, since that would require us to invent a full bytecode set and implement it.  The MH runtime is more malleable than the JVM's interpreter, so we are starting with MHs.  Hence the need for MHs.
> 
> I'm unfairly lumping two things together.
> 
> java.lang.reflect: We need java.lang.Class objects that represent value types. Beyond that, I don't see any point in touching this API. Eventually, sure. Not in the first cut. (For example: Class.getMethod can behave just like it does for primitives, throwing NSME. Or just operating on statics. And we certainly don't need to put any effort into a special-purpose Constructor.newInstance method.)

I agree.  The identity of the Class object is needed.  Its behavior will be…
something… but will change.  I wonder if we can booby-trap the Class
objects somehow with IAE to prevent people from extracting data from
them and then relying on that data?  Or is this just another experimental
mode of a standard API?  Again, I prefer the expedient answer!

If we do the transform proposed above in this message, than the Q-Class
will be DC.Value.class and will have only fields (and maybe, barely, a
useless constructor), and the L-class will be DC.class with no fields
except a mysterious one called "value".  Equal parts useful and misleading.

> java.lang.invoke: Accept Q-typed values as inputs/outputs? Yes. Most of the rest of your proposed changes make sense to me, subject to the discussion above (maybe no box/unbox conversions; maybe no instance methods). I wouldn't bother with findConstructor on a Q type.

If we distinguish DC from DC.Value, it would make sense to hold off
on box/unbox auto-conversions in jlr.  The cost of this is you have
(inevitably) *two* kinds of boxes running around in jlr.  But maybe
that's an OK place to put some confusion, since it's little-used.

(Does this prove the POJO is the Wrong Thing?  Maybe the humble
little boxed DC.Value guy is the true form of the box?  That will require
lots of experimentation to decide.)

If that's the wrong answer for jlr, we can fix it without JVM modifications,
since the jlr box/unbox logic is embodied primarily in the JDK code
that spins reflective access methods.  (It's also in the JVM for
bootstrapping purposes, but we don't have to lean on that.)

Pretty much the same comments apply to MH-basd reflection,
except the MH-based reflection should never surface a boxed
DC.Value, since it does not require boxing (as jlr does).

>>> If I squint, I can kind of see how the idea is that somebody might want to write reflective code to operate on values, since they don't have language support.
>> 
>>> If this is the use case, I think a better use of resources would be to surface Q types in the language.
>> 
>> Yes, surface them, but don't require a full set of bytecodes to operate on them.  That's the slow way to do it.
> 
> Sure, absolutely. I definitely buy into the idea that we need enough support in java.lang.invoke to provide library-defined operations, rather than having to introduce new opcodes.

Good.  This will allow us to pretend that "indy" is our source
of an infinite supply of experimental bytecodes.  When we
decide what we really want, we can hardwire them, and
go indy-free.

>>> I don't think it's necessary to support Q types as the receivers of CONSTANT_Fieldrefs and CONSTANT_Methodrefs. The receiver can be a vanilla CONSTANT_Class, and the client (in this case, the 'vgetfield' API point) can figure out what to do with the resolved reference.
>> 
>> Yes, that's one way to go.  But representing Q-types as java.lang.Class objects will be a sunk cost, so passing the L/Q distinction through existing data flows (on "overloaded" API points) is a reasonable design pattern, for a prototype.
> 
> My thinking is that, for example, I'd rather not touch method resolution at all. Maybe that saves us some work. (And as I've thought about the ultimate bytecode design, I'm leaning that way as the ultimate solution, meaning maybe less work to undo things later.)

By "not touch" you mean "use existing rules verbatim"?
That would be a pretty clear sign we were on the right track.

>> I also think (in this case) the Lookup API will, in the long term, look something like the current sketch; there won't be a separate Lookup.findValueGetter any more than there is a separate Lookup.findInterfaceVirtual.
> 
> Yeah, that makes sense. The inputs to these methods are Class objects, so you'll have the extra type information you need.

Yep.  API reuse there would be another sign we met the goal of
"codes like a class".  (Except constructors and setters turn into
factories and withers, which have subtly different types.)

Thank you for the comments!  Let me know what you think of
the "two-name, one-file" option sketched above.  I hope it meets
your objections.

— John