minimal value types proposal

Tue Aug 30 00:04:18 UTC 2016

On Aug 29, 2016, at 4:17 PM, Dan Smith <daniel.smith at oracle.com> wrote:
> 
> Some high-level feedback from me:
> 
> I think the idea is reasonable. In other circles, we might call this a "milestone". Should we define a first milestone that we're willing to commit to strongly, with some sort of distribution channel (something better than build-your-own-JDK) and some level of support commitment to users who want to get their hands dirty? Sure, absolutely.
> 
> There are some design decisions that surprise/confuse me. Basically, this is me saying "YAGNI" over and over again:
> 
> 1) Automatic boxing adds tons of complexity, and I don't see the benefit. The feature eliminates boilerplate and supports migration, but I'm not looking for either of those in a minimal first step. We're talking about a handful of value types, which can easily be defined like this:
> 
> class Val {
>    public final int i;
>    public final int j;
>    public static ValBox box(Val x) { return new ValBox(x); }
>    public static Val unbox(ValBox b) { return b.x; }
> }
> 
> class ValBox implements Foo {
>    public final Val x;
>    public ValBox(Val x) { this.x = x; }
> }
> 
> Get rid of boxes, and you can get rid of interfaces, default methods, automatic conversions, constructors, …

It's worth thinking about, and Brian has encouraged me to think about it also.

Boxes (and the other stuff you mention) are so useful that removing them may well cause more trouble than supporting them up front.  Inside the JVM, we need a boxed representation for some data flows (unless we make all data flows radically value-safe up front).  For the user, a boxed representation is needed for basic debuggability.  What does println or JVMTI do unless there's a box?

I do like the idea of requiring the user to set up both classes manually, at first.  It has the advantage of making very clear (all too clear) the distinction between the Q-type and the L-type:  No source code defines both; the Val guy would (presumably) disable its L-type so people could not use it.  (Maybe the JVM would use that for an internal box:  But see where that string leads!)  Maybe that's the way to go, if the JVM implementation of the single-source-class solution turns out to be difficult.

> 2) Instance methods also add tons of complexity.

I disagree; I think the incremental complexity is comparable to trying to do everything with statics, which is why I'm recommending this in the minimal model.

The only invocation paths for instance methods (and instance fields) on Q-types is through method handles.  Method handles treat all arguments (including 'this') symmetrically, so any effort applied to have them work on Q-types *at all* will apply to 'this' parameters for Q-types.

Perhaps you are objecting to the inefficiency of operating on 'this' in the boxed L-type form, when the operation starts as a MH-based invocation of a Q-type?  That's only a startup transient; there are several tactics we can use to remove it.  For example, box elision (already in the JITs, though not value-friendly yet) would remove boxing overheads without requiring any manual recoding at all.

> Again, they only exist for convenience and migration. If static methods can operate on value types, that's all you need. No longer necessary to deal with bytecode written to operate on an L-typed 'this' and somehow re-interpret it for a Q-typed 'this'. No longer necessary to deal with Object methods (because no operation supports invoking them).

Convenience and migration cannot be driven to zero; that optimizes for "minimal" at the expense of "viable".  To preserve viability, there are at least a few really basic conventions, like Object.toString, that would have to be re-encoded using such statics.  Re-building virtuals (at least some of the) on top of statics has its own cost, in wasted motion and confusion.

> (If we really do want instance methods, I suggest making 'this' Q-typed to begin with, not diverting resources into figuring out how to make L-typed instance methods efficient.)

Making L-typed instance methods efficient is a sunk cost; it's something the JITs are already good at.

We can and should work towards real Q-typed 'this'.  The simplest way is what I'm proposing with the method handle hack.  In addition, I suggest experimentally modifying javac to emit two copies of non-static methods in value-capable classes, one with the standard bytecodes, and one as a static (with mangled name) which takes a Q-typed 'this' in local 0.  Then teach the method handle resolver to find these guys and bind them, in preference to the boxed-this dance.  Users can get on with their business, unaware of all of this.

> 3) The minimal feature set for basic operations -- field getters, default value, withers, comparison, arrays -- is a class (e.g., ValueTypeSupport) with bootstrap methods that can be called via invokedynamic. No need to touch MethodHandles.Lookup, etc.

I don't think the cost of touching MH.Lookup is great, especially given that the MH runtime will have to be able to work with Q-types more or less pervasively.  I agree that all the extended lookup functionality could be placed on a new class (alongside findWither etc.), but I don't see any benefit to doing that.  Given that we are touching the MH runtime, it's better to put the new stuff in one place.  The new class will probably just be a wormhole back in to java.lang.invoke, to call non-public API points (which will eventually be public).

> More generally, why so much attention given to reflection? Sure, you need class objects to represent all the JVM's types. But member lookup? Fields, Methods, Constructors? These do not seem necessary.

Because method handles are where the functionality comes from; you need basic reflection in order to mention the method handles you want.  Bytecode spinning is not enough, since that would require us to invent a full bytecode set and implement it.  The MH runtime is more malleable than the JVM's interpreter, so we are starting with MHs.  Hence the need for MHs.

> If I squint, I can kind of see how the idea is that somebody might want to write reflective code to operate on values, since they don't have language support.

And they don't have bytecode support either.  The javac runtime (indy BSMs for vgetfield, etc.) will have to do some of this stuff too.

> But almost everything has to be boxed when using these libraries, which means if you care about performance (which is why you're using this prototype), you're going to be spinning bytecode to do your low-level operations.

Not completely.  The bytecode will use MHs or indy do low-level stuff.

> If this is the use case, I think a better use of resources would be to surface Q types in the language.

Yes, surface them, but don't require a full set of bytecodes to operate on them.  That's the slow way to do it.

> 4) I don't love hacking CONSTANT_Class to encode new types, but I can probably live with it. My preference is to design it the "right" way -- however we envision these ultimately being expressed -- rather than this intermediate step in which everybody learns to interpret some new syntax, only to turn around and deprecate that syntax a little while later. (I realize it's probably easier to change string formats than it is to add a new constant pool form.)

Yes, that's why we are starting this way.  CONSTANT_Class CP entries get overloaded; a bunch of other legacy API points get overloaded.  It's an expedient when the data flows in and out of the APIs can be augmented more easily than we can invent new API points.  But (as the shady-values document says several times) the final API is likely to be different, and in particular to make the new distinctions in more principled ways.

> I don't think it's necessary to support Q types as the receivers of CONSTANT_Fieldrefs and CONSTANT_Methodrefs. The receiver can be a vanilla CONSTANT_Class, and the client (in this case, the 'vgetfield' API point) can figure out what to do with the resolved reference.

Yes, that's one way to go.  But representing Q-types as java.lang.Class objects will be a sunk cost, so passing the L/Q distinction through existing data flows (on "overloaded" API points) is a reasonable design pattern, for a prototype.

I also think (in this case) the Lookup API will, in the long term, look something like the current sketch; there won't be a separate Lookup.findValueGetter any more than there is a separate Lookup.findInterfaceVirtual.

— John