[External] : Re: Draft JVMS changes for Primitive Objects (JEP 401)

Wed Sep 8 00:15:08 UTC 2021

> On Aug 11, 2021, at 1:24 PM, Dan Heidinga <heidinga at redhat.com> wrote:
> 
> And continuing on the "long-overdue" theme, here's my long-overdue
> review of the spec changes.
> 
> A big thank you to you, Dan S., for the careful spec writeup efforts.
> I think this captures our discussions well.
> 
> --Dan
> 
> == Section 2.11.5 Object Creation and Manipulation
>> Create a new class instance: new, withfield.
> Should that also include "defaultvalue"? The semantics aren't quite
> the same because of the structural equality of primitive class types
> but it is conceptually very similar.  And in the instruction section,
> we state "The defaultvalue instruction is similar to the new
> instruction" which lends credence to including it in this list.

Hmm, that is a bit inconsistent. There are two conflicting perspectives:
- It's analogous to 'ldc' or 'aconst_null', putting a well-known constant on the stack (so belongs in 2.11.2)
- It's analogous to 'new', putting a fresh instance on the stack (so belongs in 2.11.5)

I think I lean towards emphasizing the "load a constant" nature of the operation, but you're right that it's not a consistent view throughout the document.

> == Section 4.1 The ClassFile Structure
> The  `ACC_PRIM_SUPER` flag is introduced and restrictions on classes
> with the flag are called out in various sections such as:
> 4.5 Fields > ACC_PRIM_SUPER flag set, each field must have its
> ACC_STATIC flag set.
> 4.6 Methods > In a primitive class, and in an abstract class that has
> its ACC_PRIM_SUPER flag set, a method that has its ACC_SYNCHRONIZED
> flag set must also have its ACC_STATIC flag set.
> 5.3 Creation and Loading > implements PrimitiveObject if the opposite
> is true (ACC_PRIM_SUPER, no instance initialization method).
> 
> I didn't see static constraints called out to enforce these
> restrictions (should they be?).  Having the handling of the
> ACC_PRIM_SUPER in one place would make the VM's job of validating it
> easier.

Most of Chapter 4 provides the specification for format checking—including 4.5 and 4.6. Compare the restrictions on fields and methods of interfaces appearing in 4.5 and 4.6.

There's a perspective shift here from what you're looking for—rather than saying "an ACC_PRIM_SUPER class file is validated as follows: ...", we treat ACC_PRIM_SUPER as a statement of fact, and then *other* things in the class file are validated with that fact in mind. This avoids forcing everything about the new feature into a sidebar, as if it's not part of the "core" JVM.

Anyway, I'd suggest implementing & thinking about validation in the same way you implement validation of ACC_INTERFACE-related constraints.

> == 4.6 Methods
>> Design discussion: this section requires that unnamed factory methods (named <new>) are static
>> and have a return type that matches their declaring class or interface. By restricting the
>> descriptor in this way, clients can rely on a predictable, useful return type.
>> 
>> Alternatively, we could allow a subtype or supertype as the return type, or impose no constraints
>> at all. One potential use case is a hidden class, which is incapable of naming its class type in
>> a descriptor.
> 
> Because these are static methods, I thought we had agreed they could
> name any superclass as the return value due to the hidden class
> requirements.  Even though this allows some strange behaviour (ie:
> after some bytecode manipulation) such as the following pseudo-code
> shows:
> ```
> primitive class Strange {
>    Strange() { //<new>()Ljava/lang/Object;
>        return new String();
>    }
> }
> ```
> The contract on `<new>` is more convention than requirement.  In cases
> where the return value needs to be used as a primitive value, it would
> need to go through a checkcast to validate it when a different return
> type is named.
> 
> While this doesn't give the hidden class full powers to be checked in
> the checkcast, it can still be checked against the PrimitiveObject
> interface or its ACC_PRIM_SUPER type.  Seems like a reasonable setup
> and avoids the VM having to check the name matches on the return type
> of the descriptor.

Something we need to investigate more is how factory methods get surfaced in reflection. I could imagine clients like reflection really wanting a guarantee that when you call Foo.<new>, you get a Foo. But it depends on how it is presented in the API.

So, still an open question, which is why I listed both alternatives.

Hidden classes did indeed push us to be more permissive, but in retrospect, that really shouldn't drive the choice: if you insist on putting a factory-like method in your hidden class, but can't follow the JVM's rules, you're fully capable of spinning your own static method using whatever name you want.

>> A method of a class or interface named <new> (2.9.4) must have its ACC_STATIC flag set.
> 
> Should interfaces be able to implement `<new>` or should we prevent
> that like we do for `<init>`?  Preventing it now gives us the most
> freedom later if we retcon this as a general object factory.
> 
>> If the name of the method is <new>, then the descriptor must denote a return type
>> that is a type of the current class or interface. For a primitive class, the return
>> type must be an inlinable reference type.
> 
> Same questions about requiring the return type to match (I don't think
> we should) and if we should prevent interfaces from implementing it (I
> think yes).

Yeah, more of the same question about what these things are really for, and how they are meant to be interpreted. I picked one point in the space ("a factory method creates an instance of this class or interface via ad hoc program logic"); there are others, ranging from "a factory method is an arbitrary method with a funny name" to "a factory method is, exclusively, the construction mechanism for a primitive class".

> == 4.7.31 The JavaFlags Attribute
>> We're having some good internal discussions about default values & null, and will send something out when that settles into something stable.
> I expect (hope?) this will change as the internal discussion
> solidifies.  I'm not a fan of this "kitchen sink" approach as it
> becomes an attractive nuisance to wedge other flags into.  The
> suggestion to use more focused attributes (`PrimitiveClassProperties`
> & `ReferenceDefaultPrimitiveClass`) matches the existing conventions
> for naming / using attributes.

Yes, worth revisiting when we get further along. The idea here is that javac just needs some space to stash its special metadata, and things have suffered because we don't have that space (see, e.g., ACC_ENUM). But as we tweak things, TBD how much stuff ends up being "javac special metadata".

> == 5.3.5 Deriving a Class from a class File Representation
>> Alternatively, we could more uniformly claim that the class is "considered to implement" the
>> expected interface, regardless of what it implements by inheritance. The difference in
>> behavior might be observable, say, via reflection.
> I think we need to honour programmer intent here.  If a class
> implements one of the interfaces by inheritance, then the programmer
> has specified their intent and we should go with it (or flag it as an
> error).

There would be no error. The scenario is whether:

class Foo implements IdentityObject
class Bar extends Foo

gets interpreted as

class Bar extends Foo [implements IdentityObject]

or just

class Bar extends Foo

In other words, how much do we "prune" redundant inferred supers? More work for you to prune them, but maybe the results of reflection are more intuitive with pruning.

>> An abstract class implements IdentityObject if it declares an instance initialization method
>> and does not have its ACC_PRIM_SUPER flag set; and implements PrimitiveObject if the opposite
>> is true (ACC_PRIM_SUPER, no instance initialization method). Instance initialization methods
>> and ACC_PRIM_SUPER represent two channels for subclass instance creation, and this analysis
>> determines whether only one channel is "open".
> 
> The rules expressed here don't cover the cases outlined in chapter 4 -
> namely that a class that has ACC_PRIM_SUPER must only have static
> fields and only its static methods can be synchronized.

Correct. The Chapter 4 rules are enforced at Step 2 of 5.3.5. The inference of supertypes (and related constraints) is left to Step 5.

>> Alternatively, we could ignore instance initialization methods and rely entirely on
>> ACC_PRIM_SUPER. In practice, abstract classes written in the Java programming language always
>> have instance initialization methods, so the difference in behavior is only relevant to classes
>> produced via other languages or tools.
> 
> My preference is for the VM to check the rules against the explicitly
> set ACC_PRIM_SUPER bit.  It means the language can own when to set it
> and the VM only has to do consistency checks on it.

Aesthetically, I like the idea that there are parallel mechanisms for inferring IdentityObject and inferring PrimitiveObject. But, yeah, our particular needs are met whether you can infer PrimitiveObject or not.

> == 5.4.3.1 Field Resolution
>> Thus, a field reference with a type like Qjava/lang/String; is permitted. Since it's impossible
>> to declare a field with such a type (see 5.4.2), resolution of the reference will fail anyway
>> with a NoSuchFieldError.
> 
> I'm a little confused by "resolution of the reference will fail with
> NoSuchFieldError" as my reading of 5.4.2 says we would reject any
> class that has a Q descriptor that doesn't name a Q type.  How would
> resolution of such a field reference ever occur?

The choices are:

1) Explicitly check descriptors of field references at resolution time for Q types in the descriptors that don't name valid primitive classes. If any are found, resolution fails with an ICCE.

2) Allow resolution to run its course, which will inevitably fail with a NSFE.

Just a question of which error happens (with associated rules for checking).

> == Bytecodes
> new > tolerable because the Identity class requires no initialization.
> Do we need changes to the JVMTI spec to indicate that Identity isn't
> passed to the ClassFileLoadHook and is not modifiable?  And maybe a
> rule in static constraints that says Identity has an abstract <init>
> method? (Or have we dropped that idea?)  We need a way to say, via the
> spec, that code in Identity.<init>()V will never run, no matter how a
> user adds it there.

At a minimum, the spec for 'new' describes what happens if you say 'new java/lang/Object', and if you try to hack the Identity class via JVMTI, you need to be aware of that spec.

Whether JVMTI wants to do require something more for sanity checking is a good question...

For constraining the standard library, eh, I'm pretty happy leaving that to the standard library. (And, no, we don't have abstract <init> methods anymore.) No need to make sure standard library classes are defined the way they're supposed to be defined. If somebody hacks those binaries, again, the spec for 'new' says what happens, the hacker shouldn't be surprised.

> withfield > use of the withfield instruction is restricted to
> nestmates of the field's declaring class.
> I'm glad to see this as discussion on this went around and around.
> Limiting to the nest still seems like the right choice to me.  So +1
> to this.

Cool. Perhaps someday we'll expand it with normal access controls, but it requires a separate mechanism from the (read-oriented) field access flags.