Value type companions, encapsulated

Sun Jul 3 12:25:22 UTC 2022

I fully agree on 
- having the restriction on array creation, not array access, 
- providing access to companion class/default value through Lookup and 
- build the reflaction API on top of the Lookup API. 

One kind of sad thing with CONSTANT_Class QC; is that we need it now but once we will have the new generics, we will not need it anymore because it can be express with a CONSTANT_Specialization_Linkage + a constant dynamic. So it's a kind of temporary design. 

I wonder if it's not "better" to separate checkcast from unbox/box given that mixing them together result in different resolution for all checkcasts (compare to anewarray). From the language POV, those two kind of checkcasts are different anyway. 

Rémi 

> From: "John Rose" <john.r.rose at oracle.com>
> To: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
> Sent: Sunday, July 3, 2022 5:24:19 AM
> Subject: Value type companions, encapsulated

> In this message Brian wrote out the major features
> of an emerging design for value classes:
>> From: Brian Goetz [ mailto:brian.goetz at oracle.com | brian.goetz at oracle.com ]
>> To: … [ mailto:valhalla-spec-experts at openjdk.java.net |
>> valhalla-spec-experts at openjdk.java.net ]
>> Subject: Re: User model stacking: current status
>> Date: Thu, 23 Jun 2022 15:01:24 -0400
> I think controlling the complexity by having a separate
> nested declaration of the value companion type will
> work very well.

> So what exactly does a private value companion do?
> What is it you can and cannot do with this type?
> What problems are prevented by privatizing it?
> How and when is privatization enforced?
> What other problems are created by those new rules?

> I have been pulling on this thread for a few days
> now, and I think I have some answers.

> [ http://cr.openjdk.java.net/~jrose/values/encapsulating-val.md |
> http://cr.openjdk.java.net/~jrose/values/encapsulating-val.md ]
> [ http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html |
> http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html ]

> (The Hitchhiker’s Guide suddenly comes to mind. Don’t panic!)

> I expect I will be editing these files as we go.
> For reference here is a verbatim copy of the MD file
> as it stands right now (minus the header): Background

> (We will start with background information. The [
> https://partage.u-pem.fr/mail#privatization-to-the-rescue | new stuff comes
> afterward ] . Impatient readers can find a very quick [
> https://partage.u-pem.fr/mail#summary-of-restrictions | summary of
> restrictions ] at the end.) Affordances of C.ref

> Every class or interface C comes with a companion type, the
> reference type C.ref derived from C which describes any variable
> (argument, return value, array element, etc.) whose values are either
> null or of a concrete class derived from C . We are not in the habit
> of distinguishing C.ref from C , but the distinction is there. For
> example, if we call Object::getClass on a variable of type C.ref
> we might not get C.class ; we might even get a null pointer
> exception!

> We are so very used to working with reference types (for short,
> ref-types ) that we sometimes forget all that they do for us
> in addition to their linkage to specific classes:

>     * C.ref gives a starting point for accessing C 's members.
>     * C.ref provides abstraction: C or a subtype might not be loaded yet.
>     * C.ref provides the standard uninitialized value null .
>     * C.ref can link C objects into graphs, even circular ones.
>     * C.ref has a known size, one "machine word", carefully tuned by the JVM.
>     * C.ref allows a single large object to be shared from many locations.
>     * C.ref with an identity class can centralize access to mutable state.
>     * C.ref values uniformly convert to and from general types like Object .
>     * C.ref variable types can be reflected using Class mirror objects.
>     * C.ref is safe for publication if the fields of C are final .

> When I store a bunch of C objects into an object array or list, sort
> it, and then share it with another thread, I am using several of the
> above properties; if the other thread down-casts the items to C.ref
> and works on them it relies on those properties.

> If I implement C as a doubly-linked list data structure or a
> (alternatively) a value-based class with tree structure, I am using
> yet more of the above properties of references.

> If my C object has a lot of state and I pass out many pointers to
> it, and perhaps compute and cache interesting values in its mutable
> fields, I am again relying on the special properties of references,
> as well as of identity classes (if fields are mutable).

> By the way, in the JVM, variables of type C.ref (some of them at
> least) are associated not with C simple, but with the so-called
> L-descriptor spelled LC; . When we talk about C.ref we are
> usually talking about those L-descriptors in the JVM, as well.

> I don't need to think much about this portfolio of properties as I go
> about my work. But if they were to somehow fail, I would notice bugs
> in my code sooner or later.

> One of the big consequences of this overall design is that I can write
> a class C which has full control over its instance states. If it is
> mutable, I can make its fields private and ensure that mutations occur
> only under appropriate locking conditions. Or if I declare it as a
> value-based class, I can ensure that its constructor only allows
> legitimate instances to be constructed. Under those conditions, I
> know that every single instance of my class will have been examined
> and accepted by the class constructor, and/or whatever factory and
> mutator methods I have created for it. If I did my job right, not
> even a race condition can create an invalid state in one of my
> objects.

> Any instance state of C which has been reached without being
> produced from a constructor, factory, mutator, or constant of C can
> be called non-constructed . Of course, inside a class any state
> whatever can be constructed, subject to the types of fields and so on.
> But the author of the class gets to decide which states are
> legitimate, and the decisions are enforced by access control at the
> boundaries of the encapsulation.

> So if I code my class right, using access control to keep bad states
> away from my clients, my class's external API will have no
> non-constructed states. Costs of C.ref

> In that case why have value types at all, if references are so
> powerful? The answer is that reference-based abstraction pays for its
> benefits with particular costs, costs that Java programmers do not
> always wish to pay:

>     * A reference (usually) requires storage for a pointer to the object.
>    * A reference (usually) requires storage for a header embedded inside the
>     object.
>    * Access to an object's fields (usually) requires extra cycles to chase the
>     pointer.
>    * The GC expends effort administering a singular "home location" for every
>     object.
>    * Cache line invalidation near that home location can cause useless memory
>     traffic.
>    * A reference must be able to represent null ; tightly-packed types like int and
>     long would need to add an extra bit somewhere to cover this.

> The major alternative to references, as provided by Valhalla, is flat
> objects, where object fields are laid out immediately in their
> containers, in place of a pointer which points to them stored
> elsewhere. Neither alternative is always better than the other, which
> is why Java has both int and Integer types and their arrays, and
> why Valhalla will offer a corresponding choice for value classes. Alternative
> affordances of C.val

> Now, instances of a value class can be laid out flat in their
> containing variables. But they can also be "boxed" in the heap, for
> classic reference-based access. Therefore, a value class C has not
> one but two companion types associated it, not only the reference
> companion C.ref but also the value companion C.val . Only value
> classes have value companions, naturally. The companion C.val is
> called a value type (or val-type for short), by contrast with any
> reference type, whether Object.ref or C.ref .

> The two companion types are closely related and perform some of the
> same jobs:

>     * C.ref and C.val both give a starting point for accessing C 's members.
>     * C.ref and C.val can link C objects into acyclic graphs.
>    * C.ref and C.val values uniformly convert to and from general types like Object
>     .
>     * C.ref and C.val variable types can be reflected using Class mirror objects.

> For these jobs, it usually doesn't matter which type companion does
> the work.

> Despite the similarities, many properties of a value companion type
> are subtly different from any reference type:

>     * C.val is non-abstract: You must load its class file before making a variable.
>     * C.val cannot nest except by reference; C cannot declare a C.val field.
>     * C.val does not represent the value null .
>     * C.val is routinely flattenable, avoiding headers and indirection pointers
>     * C.val has configurable size, depending on C 's non-static fields.
>     * C.val heap variables (fields, array elements) are initialized to all-zeroes.
>     * C.val might not be safe for publication (even though its fields are final ).

> The JVM distinguishes C.val by giving it a different descriptor, a
> so-called Q-descriptor of the form QC; , and it also provides a
> so-called secondary mirror C.val.class which is similar to the
> built-in primitive mirrors like int.class .

> As the Valhalla performance model notes, flattening may be expected
> but is not fully guaranteed. A C.val stored in an Object
> container is likely to be boxed on the heap, for example. But C.val
> objects created as bytecode temporaries, arguments, and return values
> are likely to be flattened into machine registers, and C.val fields
> and array elements (at least below certain size thresholds) are also
> likely to be flattened into heap words.

> As a special feature, C.ref is potentially flattenable if C is a
> value class. There are additional terms and conditions for flattening
> C.ref , however. If C is not yet loaded, nothing can be done:
> Remember that reference types have full abstraction as one of their
> powers, and this means building data structures that can refer to them
> even before they are loaded. But a class file can request that the JVM
> "peek" at a class to see if it is a value class, and if this request
> is acted on early enough (at the JVM's discretion), then the JVM can
> choose to lay out some or all C.ref values as flattened C.val
> values plus a boolean or other sentinel value which indicates the
> null state. Pitfalls of C.val

> The advantages of value companion types imply some complementary
> disadvantages. Hopefully they are rarely significant, but they
> must sometimes be confronted.

>     * C.val might need to load a class file which is somehow unloadable
>    * C.val will fail to load if its instance layout directly or indirectly includes
>     a C.val field or subfield
>     * C.val will throw an exception if you try to assign a null to it.
>    * C.val may have surprising costs for multi-word footprint and assignment (and
>     so might C.ref if that is flattened)
>     * C.val is initialized to its all-zero value, which might be non-constructed
>    * C.val might allow data races on its components, creating values which are
>     non-constructed

> The footprint issue shows up most strongly if you have many copies of
> the same C.val value; each copy will duplicate all the fields, as
> opposed many copies of the same C.ref reference, which are likely to
> all point to a single heap location with one copie of all the fields.

> Flat value size can also affect methods like Arrays.sort , which
> perform many assignments of the base type, and must move all fields on
> each assignment. If a C.val array has many words per element, then
> the costs of moving those words around may dominate a sort request.
> For array sorting there are ways to reduce such costs transparently,
> but it is still a "law of physics" that editing a whole data structure
> will have costs proportional to the size of the edited portions of the
> data structure, and C.ref arrays will often be somewhat more compact
> than C.val arrays. Programmers and library authors will have to use
> their heads when deciding between the new alternatives given by value
> classes.

> But the last two pitfalls are hardest to deal with, because they both
> have to do with non-constructed states. These states are the all-zero
> state with the second-to-last pitfall, and (with the last pitfall) the
> state obtained by mixing two previous states by means of a pair of
> racing writes to the same mutable C.val variable in the heap.
> Unlike reference types, value types can be manipulated to create these
> non-constructed states even in well-designed classes.

> Now, it may be that a constructor (or factory) might be perfectly able
> to create one of the above non-constructed states as well, no strings
> attached. In that case, the class author is enforcing few or no
> invariants on the states of the value class. Many numeric classes,
> like complex numbers, are like this: Initialization to all-zeroes is
> no problem, and races between components are acceptable, compared to
> the costs of excluding races.
>> (The reader may recall that early JVMs accepted races on the high
> and low halves of 64-bit integers as well; this is no longer a
> widespread issue, but bigger value types like complex raise the same
> issue again, and we need to provide class authors the same solution,
> if it fits their class.)

> There are also some classes for which there are no good defaults, or
> for which a good default is definitely not the all-zero bit pattern.
> Authors of such types will often wish to make that bit pattern
> inaccessible to their clients and provide some factory or constant
> that gives the real default. We expect that such types will choose
> the C.ref companion, and rely on the extra null checks to ensure
> correct initialization.

> Other classes may need to avoid other non-constructed values that may
> arise from data races, perhaps for reasons of reliability or security.
> This is a subtle trade-off; very few class authors begin by asking
> themselves about the consequences of data races on mutable members,
> and even fewer will ask about races on whole instances of value
> types, especially given that fields in value types are always
> immutable. For this reason, we will set safety as the default, so
> that a class (like complex numbers) which is willing to tolerate data
> races must declare its tolerance explicitly. Only then will the JVM
> drop the internal costs of race exclusion.

> Whether to tolerate the all-zero bit pattern is a simpler decision.
> Still, it turns out to be useful to give a common single point of
> declarative control to handle all non-constructed states, both
> the default value of C.val and its mysterious data races. Privatization to the
> rescue

> (Here are the important details about the encapsulation of value
> types. The impatient reader may enjoy the very quick [
> https://partage.u-pem.fr/mail#summary-of-restrictions | summary of
> restrictions ] at the end of this document.)

> In order to hide non-constructed states, the value companion C.val
> may be privatized by the author of the class C . A privatized
> value companion is effectively withdrawn from clients and kept private
> to its own class (and to nestmates). Inside the class, the value
> companion can be used freely, fully under control of the class author.

> But untrusted clients are prevented from building uninitialized fields
> or arrays of type C.val . This prevents such clients from creating
> (either accidentally or purposefully) non-constructed values of type
> C.val . How privatization is declared and enforced is discussed in
> the rest of this document.
>> (To review, for those who skipped ahead, non-constructed values are
> those not created under control of the class C by constructors or
> other accessible API points. A non-constructed value may be either an
> uninitialized variable of C.val , or the result of a data race on a
> shared mutable variable of type C.val . The class itself can work
> internally with such values all day long, but we exclude external
> access to them by default.) Atomicity as well

> As a second tactic, a value class C may select whether or not the
> JVM enforces atomicity of all occurrences of its value companion
> C.val . A non-atomic value companion is subject to data races, and
> if it is not privatized, external code may misuse C.val variables
> (in arrays or mutable fields) to create non-constructed values via
> data races.

> A value companion which is atomic is not subject to data races. This
> will be the default if the the class C does not explicitly request
> non-atomicity. This gives safety by default and limits
> non-constructed states to only the all-zero initial value. The
> techniques to support this are similar to the techniques for
> implementing non-tearing of variables which are declared volatile ;
> it is as if every variable of an atomic value variable has some (not
> all) of the costs of volatility.

> The JVM is likely to flatten such an atomic value only up to the
> largest available atomically settable memory unit, usually 128 bits.
> Values larger than that are likely to be boxed, or perhaps treated
> with some other expensive transactional technique. Containers that
> are immutable can still be fully flattened, since they are not subject
> to data races.

> The behavior of an atomic C.val is aligned with that of C.ref . A
> reference to a value class C never admits data races on C 's
> fields. The reason for this is simple: A C.ref value is a C.val
> instance boxed on the heap in a single immutable box-class field of
> type C.val . (Actually, the JVM may partially or wholly flatten the
> representation of C.ref if it can get away with it; full flattening
> is likely for JVM locals and stack values, but any such secret
> flattening is undetectable by the user.) Since it is final all the
> way down (to C 's fields) any C.ref value is safely published
> without any possibility of data races. Therefore, an extra
> declaration of non-atomicity in C affects only the value companion
> C.val .

> It seems that there are use cases which justify all four combinations
> of both choices (privatization and declared non-atomicity), although
> it is natural to try to boil down the size of the matrix.

>     * C.val private & atomic is the default, and safest configuration

> hiding all non-constructed values outside of C and all data races
> even inside of C . There are some runtime costs.

>     * C.val public & non-atomic is the opposite, with fewer runtime

> costs. It must be explicitly declared. It is desirable for
> numerics like complex numbers, where all possible bitwise states are
> meaningful. It is analogous to the situation of a naturally
> non-atomic primitive like long .

>     * C.val public & atomic allows everybody to see the all-zero

> initial value but no other non-constructed states. This is
> analogous to the situation of a naturally atomic primitive like
> int .

>     * C.val private & non-atomic allows C complete control over the

> visibility of non-constructed states, but C also has the ability
> to work internally on arrays of non-atomic elements. C should
> take care not to leak internally-created flat arrays to untrusted
> clients, lest they use data races to hammer non-constructed values
> into those arrays.

> It is logically possible, but there does not seem to be a need, for
> allowing a single class C to work with both kinds of arrays, atomic
> and non-atomic. (In principle, the dynamic typing of Java arrays
> would support this, as long as each array was configured at its
> creation.) The effect of this can be simulated by wrapping a
> non-atomic class C in another wrapper class WC which is atomic.
> Then C.val[] arrays are non-atomic and WC.val[] arrays are atomic,
> yet each kind of array can have the same "payload", a repeated
> sequence of the fields of C . Privatization in code

> For source code and bytecode, privatization is enforced by performing
> access checks on names. Privatization rules in the language

> We will stipulate that a value class C always has a value
> companion type C.val , even if it is never declared or used. And we
> give the author of C some control over how clients may use the type
> C.val , in a manner roughly similar to nested member classes like
> C.M .

> Specifically, the declaration of C always selects an access mode for
> its value companion C.val from one of the following three choices:

>     * C.val is declared private
>     * C.val is declared public
>     * C.val is declared, but neither public nor private

> If C.val is declared private, then only nestmates of C may access
> C.val . If it is neither public nor private, only classes in the
> same runtime package as C may access it. If it is declared public,
> then any class that can access C may also access C.val .

> As an independent choice, the declaration of C may select an atomicity for its
> value companion C.val` from one of the following two choices:

>     * C.val is explicitly declared non-atomic
>     * C.val is not explicitly declared non-atomic, and is thus atomic

> If there is no explicit access declaration for C.val in the code of
> C , then C.val is declared private and atomic. That is, we set the
> default to the safest and most restrictive choice.

> In source code, these declarations are applied to explicit occurrences
> of the type name C.val . The access modification of C.val is also
> transferred to the implicitly declared name C.default

> The syntax looks like this:
> class C {
>   //only one of the following lines may be specified
>   //the first line is the default
>   private value companion C.val;  //nestmates only
>   value companion C.val;          //package-mates only
>   public value companion C.val;   //all may access
>   // the non-atomic modifier may be present:
>   private non-atomic value companion C.val;
>   public non-atomic value companion C.val;
>   non-atomic value companion C.val;
> }

> When a type name C.val or an expression C.default is
> used by a class X , there are two access checks that occur. First,
> access from X to the class C is checked according to the usual
> rules of Java. If access to C is permitted, a second check is done
> if the companion is not declared public . If the companion is
> declared private , then X and C must be nestmates, or else access
> will fail. If the companion is neither public nor private , then
> X and C must be in the same package, or else access will fail. Example
> privatized value companion

> Here is an example of a class which refuses to construct its default
> value, and which prevents clients from seeing that state:
> class C {
>   int neverzero;
>   public C(int x) {
>     if (x == 0)  throw new IllegalArgumentException();
>     neverzero = x;
>   }
>   public void print() { System.out.println(this); }

>   private value companion C.val;  //privatized (also the default)

>   // some valid uses of C.val follow:
>   public C.val[] flatArray() { return new C.val[]{ this }; }
>   private static C.ref nonConstructedZero() {
>     return (new C.val[1])[0];  //OK:  C.val private but available
>   }
>   public static C.ref box(C.val val) { return val; }  //OK param type
>   public C.val unbox() { return this; }  //OK return type

>   // valid use of private C.default, with Lookup negotiation
>   public static
>   C.ref defaultValue(java.lang.reflect.MethodHandles.Lookup lookup) {
>     if (!lookup.in(C.class).hasFullPrivilegeAccess())
>       return null;     //…or throw
>     return C.default;  //OK: default for me and maybe also for thee
>   }
> }

> // non-nestmate client:
> class D {
>   static void passByValue(C x) {
>     C.ref ref = box(x);   //OK, although x is null-checked
>     if (false)  box((C.ref) null);  //would throw NPE
>     assert ref == x;
>   }

>   static Object useValue(C x) {
>     x.unbox().print();   //OK, invoke method on C.val expression
>     var xv = x.unbox();  //OK, although C.val is non-denotable
>     xv.print();          //OK
>     //> C.val xv = x.unbox();  //ERROR: C.val is private
>     return xv;  //OK, originally from legitimate method of C
>   }

>   static Object arrays(C x) {
>     var a = x.flatArray();
>     //> C.val[] va = a;  //ERROR: C.val is private
>     Arrays.toString(a);  //OK
>     C.ref[] a2 = a;      //covariant array assignment
>     C.ref[] na = new C.ref[1];
>     //> na = new C.val[1];  //ERROR: C.val is private
>     return a[0];  //constructed values only
>   }
> }

> The above code shows how a privatized value companion can and cannot
> be used. The type name may never be mentioned. Apart from that
> restriction, client code can work with the value companion type as it
> appears in parameters, return values, local variables, and array
> elements. In this, a privatized companion behaves like other
> non-denotable types in Java.
>> Rationale: Note that a companion type is not a real class.
> Therefore it cannot appeal, precisely, to the existing provisions (in
> JLS or JVMS) for enforcing class accessibility. But because it is a
> type, and today nearly all types are classes (and interfaces), users
> have a right to expect that encapsulation of companion types will
> "feel like" encapsulation of type names. More precisely, users will
> hope to re-use their knowledge about how type name access works when
> reasoning about companion types. We aim to accommodate that hope. If
> it works, users won't have to think very often about the class-vs-type
> distinction. That is why the above design emulates pre-existing
> usage patterns for non-denotable types. Privatization in translation

> When a value class is compiled to a class file, some metadata is
> included to record the explicit declaration or implicit status of the
> value companion.

> The access selection of C 's value companion (public, package,
> private) is encoded in the value_flags field of the ValueClass
> attribute of the class information in the class file of C .

> The value_flags field (16 bits) has the following legitimate values:

>     * zero: C.val default access, non-atomic
>     * ACC_PUBLIC : C.val public access, non-atomic
>     * ACC_PRIVATE : C.val private access, non-atomic
>     * ACC_VOLATILE : C.val default access, atomic
>     * ACC_VOLATILE|ACC_PUBLIC : C.val public access, atomic
>     * ACC_VOLATILE|ACC_PRIVATE : C.val private access, atomic

> Other values are rejected when the class file is loaded.

> ( JVM ISSUE #0: Can we kill the ACC_VALUE modifier bit? Do we
> really care that jlr.Modifiers kind-of wants to own the reflection
> of the contextual modifier value ? Who are the customers of this
> modifier bit, as a bit? John doesn't care about it personally, and
> thinks that if we are going to have an attribute we can get rid of the
> flag bit. One implementation issue with killing ACC_VALUE is that
> class modifiers are processed very late during class loading, while
> class modifiers are processed very early. It may be easier to do some
> kinds of structural checks on the fly during class loading even before
> class attributes are processed. Yet this also seems like a poor
> reason to use a modifier bit.)

> ( JVM ISSUE #1: What if the attribute is missing; do we reject the
> class file or do we infer value_flags=ACC_PRIVATE|ACC_VOLATILE ?
> Let's just reject the file.)

> ( JVM ISSUE #2: Is this ValueClass attribute really a good place
> to store the "atomic" bit as well? This attribute is a green-field
> for VM design, as opposed to the brown-field of modifier bits. The
> above language assumes the atomic bit belongs in there as well.)

> A use of a value companion C.val , in any source file, is generally
> translated to a use of a Q-descriptor QC; :

>     * a field declaration of C.val translates to a field-info with a Q-descriptor
>    * a method or constructor declaration that mentions C.val mentions a
>     corresponding Q-descriptor in its method descriptor
>     * a use of a field resolves a CONSTANT_Fieldref with a Q-descriptor component
>    * a use of a method or constructor uses a CONSTANT_Methodref (or
>     CONSTANT_InterfaceMethodref ) with a Q-descriptor component
>    * a CONSTANT_Class entry main contain a Q-descriptor or an array type whose
>     element type is a Q-descriptor
>    * a verifier type record may refer to CONSTANT_Class which contains a
>     Q-descriptor

> Privatization is enforced for these uses only as much as is needed to
> ensure that classes cannot create unintiialized values, fields, and
> arrays.

> If an access from bytecode to a privatized Q-descriptor fails, an
> exception is thrown; its type is IllegalAccessError , a subtype of
> IncompatibleClassChangeError . Generally speaking such an exception
> diagnoses an attempt by bytecode to make an access that would have
> been prevented by the static compiler, if the Java source program had
> been compiled together as a whole.

> When a field of Q-descriptor type is declared in a class file, the
> descriptor is resolved early, before the class is linked, and that
> resolution includes an access check which will fail unless the class
> being loaded has access to C.val , as determined by loading C and
> inspecting its ValueClass attribute. These checks prevent untrusted
> clients of C from created non-constructed zero values, in any of
> their fields.

> The timing of these checks, on fields, is aligned with the internal
> logic of the JVM which consults the class file of C to answer other
> related questions about field types: (a) whether C is in fact a
> value class, and (b) what is the layout of C.val , in case the JVM
> wishes to flatten the value in a containing field. The third check
> (c) is C.val companion accessible happens at the same time. This is
> early during class loading for non-static fields, and during
> class preparation for static fields.

> Privatization is not enforced for non-field Q-descriptors, that
> occur in method and constructor signatures, and in state descriptions
> for the verifier. This is because mere use of Q-descriptors to
> describe pre-existing values cannot (by itself) expose non-constructed
> values, when those values are on stack or in locals.
>> This can happen invisible at the source-code level as well. An API
> might be designed to return values of a privatized type from its
> methods or fields, and/or accept values of a privatized type into its
> methods, constructors, or fields. In general, the bytecode for a
> client of such an API will work with a mix of Q-descriptor and
> L-descriptor values.

> The verifier's type system uses field descriptor types, and thus can
> "see" both Q-descriptors and L-descriptors. Clients of a class with a
> privatized companion are likely to work mostly with L-descriptor
> values but may also have Q-descriptor values in locals and on stack.

> When feeding an L-descriptor value to an API point that accepts a
> Q-descriptor, the verifier needs help to keep the types straight. In
> such cases, the bytecode compiler issues checkcast instructions to
> adjust types to keep the verifier happy, and in this case the operand
> of the checkcast would be of the form CONSTANT_Class["QC;"] .

> ( JVM ISSUE #3: The Q/L distinction in the verifier helps the
> interpreter avoid extra dynamic null checks around putfield ,
> putstatic , and the invoke instructions. This distinction requires
> an explicit bytecode to fix up Q/L mismatches; the checkcast
> bytecode serves this purpose. That means checkcast requires the
> ability to work with privatized types. It requires us to make the
> dynamic permission check when other bytecodes try to use the
> privatized type. All this seems acceptable, but we could try to make
> a different design which CONSTANT_Class resolution fails immediately
> if it contains an inaccessible Q-descriptor. That design might
> require a new bytecode which does what checkcast does today on a
> Q-descriptor.)

> Meanwhile, arrays are rich sources of non-constructed zero values.
> They appear in bytecode as follows:

>    * A C.val[] array construction uses anewarray with a CONSTANT_Class type for the
>     Q-descriptor; this is new to Valhalla.
>    * Such an array construction may also use multianewarray with an appropriate
>     array type.
>    * An array element is read from heap to stack by aaload ; the verifier type of
>     the stacked value is copied from the verifier type of the array itself.
>    * An array element is written from stack to heap by aastore ; the verifier type
>     of the stored value is merely constrained to the type Object .

> Note that there are no static type annotations on array access
> instruction. The practical impact of this is that, if an array of a
> privatized type C.val is passed outside of C , then any values in
> that array become accessible outside of C . Moreover, if C.val is
> non-atomic, clients may be able to inflict data races on the array.

> Thus, the best point of control over misuse of arrays is their
> creation , not their access . Array creation is controlled by
> CONSTANT_Class constant pool entries and their access checking.
> When an anewarray or multianewarray tries to create an array,
> the CONSTANT_Class constant pool entry it uses must be consulted
> to see if the element type is privatized and inaccessible to the
> current class, and IllegalAccessError thrown if that is the case.

> All this leads to special rules for resolving an entry of the form
> CONSTANT_Class["QC;"] . When resolving such a constant, the class
> file for C is loaded, and C is access checked against the current
> class. (This is just what happens when CONSTANT_Class["C"] gets
> resolved.) Next, the ValueClass attribute for C is examined; it
> must exist, and if it indicates privatization of C.val , then access
> is checked for C.val against the current class.

> If that access to a privatized companion would fail, no exception is
> thrown, but the constant pool entry is resolved into a special
> restricted state. Thus, a resolved constant pool entry of the form
> CONSTANT_Class["QC;"] can have the following states:

>     * Error, because C is inaccessible or doesn't exist or is not a value class.
>     * Full resolution, so C.val is ready for general use in the current class.
>    * Restricted resolution, so C.val is ready for restricted use in the current
>     class.

> That last state happens when C is accessible but C.val is not.

> Likewise, a constant pool entry of the form CONSTANT_Class["[QC;"]
> (or a similar form with more leading array brackets) can have three
> states, error, full resolution, and restricted resolution.

> Pre-Valhalla CONSTANT_Class entries which do not mention
> Q-descriptors have only two resolved states, error and full
> resolution.

> As required above, the checkcast bytecode treats full resolution and
> restricted resolution states the same.

> But when the anewarray or multianewarray instruction is executed,
> it consults throws an access error if its CONSTANT_Class is not
> fully resolved (either it is an error or is restricted). This is how
> the JVM prevents creation of arrays whose component type is an
> inaccessible value companion type, even if the class file does
> not correspond to correct Java source code.

> Here are all the classfile constructs that could refer to a
> CONSTANT_Class constant in the restricted state, and whether they
> respect it (throwing IllegalAccessError ):

>     * checkcast ignores the restriction and proceeds
>     * instanceof ignores the restriction (consistent with checkcast )
>     * anewarray and multianewarray respect the restriction and throw
>     * ldc throws (consistent with C.val.class in source code)
>     * bootstrap arguments throw (consistent with ldc )
>     * verifier types ignore the restriction and continue checking
>     * (FIXME: There must be more than this.)

> Q-descriptors not in CONSTANT_Class constants are naturally immune
> to privatization restrictions. In particular, CONSTANT_Methodtype
> constants can successfully refer to mirrors to privatized companions.

> Uses of CONSTANT_Class constants which forbid Q-descriptors and
> their arrays are also naturally immune, since they will never
> encounter a constant resolved in the restricted state. These include
> new , aconst_init , the class sub-operands of CONSTANT_Methodref
> and its friends, exception catch-types, and various attributes like
> NestHost and InnerClasses : All of the above are allowed to refer
> only to proper classes, and not to their value companions or arrays.

> Nevertheless, a aconst_init bytecode must throw an access error when
> applied to a class with an inaccessible privatized value companion.
> This is worth noting because the constant pool entry for aconst_init
> does not mention a Q-descriptor, unlike the array construction
> bytecodes.
>> Perhaps regular class constants of the form CONSTANT["C"] would
> also benefit slightly from a restricted state, which would be
> significant only to the aconst_init bytecode, and ignored by all
> the above "naturally immune" usages. If a JVM implementation takes
> this option, the same access check would be performed and recorded for
> both CONSTANT["C"] and CONSTANT["QC;"] , but would be respected
> only by withvalue (for the former) and anewarray and the other
> cases noted above (for the latter but not the former). On the other
> hand, the particular issue would become moot if aconst_init , like
> withfield , were restricted to the nest of its class, because then
> privatization would not matter.

> The net effect of these rules, so far, is that neither source code nor
> class files can directly make uninitialized variables of type C.val ,
> if the code or class file was not granted access to C.val via C .
> Specifically, fields of type C.val cannot be declared nor can arrays
> of type C.val[] be constructed.

> This includes class files as correctly derived from valid source code
> or as "spun" by dodgy compilers or even as derived validly from old
> source code that has changed (and revoked some access).
>> Remember that new nestmates can be injected at runtime via the
> Lookup API, which checks access and then loads new code that enjoys
> the same access. The level of access depends in detail on the
> selection of ClassOption.NESTMATE (for nestmate injection) or not
> (for package-mate injection). The JVM uses common rules for these
> injected nestmates or package-mates and for normally compiled ones.

> There are no restrictions on the use of C.ref , beyond the basic
> access restrictions imposed by the language and JVM on the name C .
> Access checks for regular references to classes and interfaces are
> unchanged throughout all of the above.

> There are more holes to be plugged, however. It will turn out that
> arrays are once again a problem. But first let's examine how
> reflection interacts with companion types and access control. Privatization and
> APIs

> Beyond the language there are libraries that must take account of the
> privatization of value companions. We start on the shared boundary
> between language and libraries, with reflection. Reflecting privatization

> Every companion type is reflected by a Java class mirror of type
> java.lang.Class . A Java class mirror also represents the class
> underlying the type. The distinction between the concept of class and
> companion type is relatively uninteresting, except for a value class
> C , which has two companion types and thus two mirrors.

> In Java source code the expression C.class obtains the mirror for
> both C and its companion C.ref . The expression C.val.class
> obtains the mirror for the value companion, if C is a value class.
> Both expressions check access to C as a whole, and C.val.class
> also checks access to the value companion (if it was privatized).

> But it is a generally recognized fact that Java class mirrors are less
> secure than the Java class types that the mirrors represent. It is
> easy to write code that obtains a mirror on a class C without
> directly mentioning the name C in source code. One can use
> reflective lookup to get such mirrors, and without even trying one may
> also "stumble upon" mirrors to inaccessible classes and companion
> types. Here are some simple examples:
> Class<?> lookup() {
>   var name = "java.util.Arrays$ArrayList";
>   //or name = "java.lang.AbstractStringBuilder";
>   //> java.lang.invoke.MethodHandles.lookup().findClass(name);  //ERROR
>   return Class.forName(name);  //OK!
> }
> Class<?> stumble1() {
>   //> return java.util.Arrays.ArrayList.class;  //ERROR
>   return java.util.Arrays.asList().getClass();  //OK!
> }
> Class<?> stumble2() {
>   //> return java.lang.AbstractStringBuilder.class;  //ERROR
>   return StringBuilder.class.getSuperclass();  //OK!
> }
> Class<?> stumble3() {
>   //> return C.val.class;  //ERROR if C.val is privatized
>   return C.ref.class.asValueType();  //OK!
> }

> Therefore, access checking class names is not and cannot be the whole
> story for protecting classes and their companion types from reflective
> misuse. If a mirror is obtained that refers to an inaccessible
> non-public class or privatized companion, the mirror will "defend
> itself" against illegal access by checking whether the caller has
> appropriate permissions. The same goes for method, constructor, and
> field mirrors derived from the class mirror: You can reflect a method
> but when you try to call it all of the access checks (including the
> check against the class) are enforced against you, the caller of the
> reflective API.
>> The checking of the caller has two possible shapes. Either a caller
> sensitive method looks directly at its caller, or the call is
> delegated through an API that requires negotiation with a
> MethodHandles.Lookup object that was previously checked against a
> caller.

> Now, if a class C is accessible but its value companion C.val is
> privatized, all of C 's public methods and other API points are
> accessible (via both companion types), but access is limited to those
> very specific operations that could create non-constructed instances
> (via a variable of companion type C.val ). And this boils down
> to a limitation on array creation. If you cannot use either source
> code or reflection to create an array of type C.val[] , then you
> cannot create the conditions necessary to build non-constructed
> instances.

> Reflective APIs should be available to report the declared properties
> of reference companions. It is enough to add the following two methods:

>     * Class::isNonAtomic is true only of mirrors of value companions

> which have been declared non-atomic. On some JVM implementations it
> may additionally be true of long.class and/or double.class .

>     * Class::getModifiers , when applied to a mirror of a value

> companion, will return a modifier bit-mask that reflects the
> declared access. (This is compatible with the current behavior of
> HotSpot for primitive mirrors, which appear as if they were somehow
> declared public , with abstract and final thrown in to boot.)

> (Note that most reflective access checking should take care to work
> with the reference mirror, not the value mirror, as the modifier bits
> of the two mirrors might differ.) Privatization and arrays

> There are a number of standard API points for creating Java array
> objects. When they create arrays containing uninitialized elements,
> then a non-constructed default value can appear. Even when they
> create properly initialized arrays, if the type is declared
> non-atomic, then non-constructed values can be created by races.

>    * java.lang.reflect.Array::newInstance takes an element mirror and length and
>    builds an array. The elements of the returned array are initialized to the
>     default value of the selected element type.
>    * java.util.Arrays::copyOf and copyOfRange can extend the length of an existing
>     array to include new uninitialized elements.
>    * A special overloading of java.util.Arrays::copyOf can request a different type
>     of the new array copy.
>    * java.util.Collection::toArray (an interface method) may extend the length of
>     an existing array, but does not add uninitialized elements.
>    * java.lang.invoke.MethodHandles.arrayConstructor creates a method handle that
>     creates uninitialized arrays of a given type, as if by the anewarray bytecode.
>    * The serialization API contains an operator for materializing arrays of
>     arbitrary type from the wire format.

> The basic policy for all these API points is to conservatively limit
> the creation of arrays of type C.val[] if C.val is not public.

>     *

> java.lang.reflect.Array::newInstance will throw IllegalArgumentException if the
> element type is privatized. (See below for a possible caller-sensitive
> enhancement.)
>     *

> java.util.Arrays::copyOf and copyOfRange will throw instead of creating
> uninitialized elements, if the element type is privatized. If only previously
> existing array elements are copied, there is no check, and this is a use common
> case (e.g., in ArrayList::toArray ).
>     *

> The special overloading of java.util.Arrays::copyOf will refuse to create an
> array of any non-atomic privatized type. (This refusal protects against
> non-constructed values arising from data races.) It also incorporates the
> restrictions of its sibling methods, against creating uninitialized elements
> (even of an atomic type).
>     *

> java.lang.invoke.MethodHandles.arrayConstructor will refuse to create a factory
> method handle if the element type is privatized.
>     *

> java.util.Collection::toArray needs implementation review; as it is built on top
> of the previous API points, it may possibly fail if asked to lengthen an array
> of privatized type. Note that many methods of toArray use Arrays.copyOf in a
> safe manner, which does not create uninitialized elements.
>     *

> java.util.stream.Stream::toArray , the various List::toArray , and other clients
> of Arrays::copyOf or Array::newInstance need implementation review. Where a
> generic API is involved, the assumption is often that non-flat reference arrays
> are being created, and in that case no outage is possible, since reference
> companion arrays can always be freely created. For specialized generics with
> flat types, additional implementation work is required, in general, to ensure
> that flat arrays can be created by parties with the right to do so.
>     *

> The serialization API should restrict its array creation operator. Serialization
> methods should not attempt to serialize flat arrays either. It is enough to
> serialize arrays of the reference type.

> API ISSUE #1: Should we relax construction rules for zero-length
> arrays? This would add complexity but might be a friendly move for
> some use cases. A zero-length array cannot expose non-constructed
> values. It may, however, serve as a misleading "witness" that some
> code has gained permission to work with flat arrays. It's safer to
> disallow even zero-length arrays.

> API ISSUE #2: What about public value companions of non-public
> inaccessible classes? In source code, we do not allow arrays of
> private classes to be made, or of their their public value companions.
> Should we be more permissive in this case? We could specify that
> where a value companion has to be checked against a client, its
> original class gets checked as well; this would exclude some use cases
> allowed by the above language, which only takes effect if the
> companion is privatized. An extra check for a public companion seems
> like busy-work and a source of unnecessary surprises, though. Let's
> not.

> There are probably legitimate use cases for arrays of privatized
> types, with which the new restrictions on the above API points would
> interfere. So as a backup, we will make API adjustments to work with
> privatized array types, with an extra handshake to perform the access
> check (via either caller sensitivity or negotiation with an instance
> of MethodHandles.Lookup ).

>     *

> java.lang.reflect.Array::newInstance should probably be made caller sensitive,
> so it can refrain from throwing if a privatized element type is accessible to
> the caller. (Alternatively, a new caller-sensitive API point could made, such
> as Array::newFlatInstance . But a new API point seems unnecessary in this case,
> and caller-sensitivity is common practice in this method's package.) Note that,
> as is typical of core reflection API points, many uses of newInstance will not
> benefit from the caller sensitivity.
>     *

> java.util.Arrays::copyOf and copyOfRange may be joined by additional "companion
> friendly" methods of a similar character which fill new array elements with
> some other specified fill value, and/or which cyclically replicate the contents
> of the original array, and/or which call a functional interface to provide
> missing elements. The details of this are a matter for library designers to
> decide. Adding caller sensitivity to these API points is probably the wrong
> move.
>     *

> java.lang.invoke.MethodHandles::arrayConstructor will be joined by a method of
> the same name on MethodHandles.Lookup which performs a companion check before
> allowing the array constructor method handle to be returned. It will not check
> the class , just the companion. Note that the use of caller sensitivity in the
> Lookup API is concentrated on the factory method Lookup::lookup , which is the
> starting point for Lookup -based negotiation.
> Miscellaneous privatization checks

> Besides newly-created or extended arrays, there are a few API points
> in java.lang.invoke which expose default values of reflectively
> determined types. Like the array creation methods, they must simply
> refuse to expose default values of privatized value companions.

>     * MethodHandles::zero and MethodHandles::empty will simply

> refuse to produce a result of a privatized C.val type. Clients
> with a legitimate need to produce such default values can use
> MethodHandles::filterReturnValue and/or MethodHandles::constant
> to create equivalent handles, assuming they already possess the
> default value.

>     * MethodHandles::explicitCastArguments will refuse to convert from

> a nullable reference to a privatized C.val type. Clients with a
> legitimate need to convert nulls to privatized values can use
> conditional combinators to do this "the hard way".

>     * The method Lookup::accessCompanion will be defined analogously

> to Lookup::accessClass . If Lookup::accessClass is applied to a
> companion, it will check both the class and the companion, whereas
> Lookup::accessCompanion will look only at the possible
> privatization of the companion. (Thus it can simply refer to
> Reflection::verifyCompanionType .)

> To support reflective checks against array elements which may be
> privatized companion types, an internal method of the form
> jdk.internal.reflect.Reflection::verifyCompanionType may be defined.
> It will pass any reference type (regardless of class accessibility)
> and for a value companion it will check access of the companion (but
> not the class itself). Building companion-safe APIs

> The method Lookup::arrayConstructor gives enough of a "hook" to
> create all kinds of safe but friendly APIs in privileged JDK code.
> The methods in java.util could make use of this privileged API to
> quickly adapt their internal code to create arrays in cases they are
> refused by the existing methods Array.newInstance and
> Arrays.copyOf .

> For example, a checked method MethodHandles.Lookup::defaultValue(C)
> may be added to provide the default value C.default if its companion
> C.val is accessible. It will operate as if it first creates a
> one-element array of the desired type, and then loads the element.

> Or, a caller-sensitive method Class::defaultValue or Class::newArray
> could be added which check the caller and return the requested result.
> All such methods can be built on top of MethodHandles.Lookup .

> In general, a library API may be designed to preserve some aspect of
> companion safety, as it allows untrusted code to work with arrays of
> privatized value type, while preventing non-constructed values of that
> type from being materialized. Each such safe and friendly API has to
> make a choice about how to prevent clients from creating
> non-constructed states, or perhaps how to allow clients to gain
> privilege to do so. Some points are worth remembering:

>     * An unprivileged client must not obtain C.default if C.val is privatized.
>    * An unprivileged client must not obtain a non-empty C.val[] array if C.val is
>     privatized and non-atomic.
>    * It's safe to build new (non-empty, mutable) arrays from (non-empty, mutable)
>     old arrays, if the default is not injected.
>    * If a new array is somehow frozen or wrapped so as be effectively immutable, it
>     is safe as long as it does not expose C.default values.
>     * If a value companion is public , there is no need for any restriction.
>     * Also, unrestricted use can be gated by a Lookup object or caller sensitivity.

>> In the presence of a reconstruction capability, either in the
> language or in a library API or as provided by a single class,
> avoiding non-constructable objects includes allowing legitimate
> reconstruction requests; each legitimate reconstruction request must
> somehow preserve the intentions of the class's designer.
> Reconstruction should act as if field values had been legitimately
> (from C 's API) extracted, transformed, and then again legitimately
> (to C 's API) rebuilt into an instance of C . Serialization is an
> example of reconstruction, since field values can be edited in the
> wire format. Proposed with expressions for records are another
> example of reconstruction. The withfield bytecode is the primitive
> reconstruction operator, and must be restricted to nestmates of C
> since it can perform all physically possible field updates.
> Reconstruction operations defined outside of C must be designed with
> great care if they use elevated privileges beyond what C provides
> directly. Summary of user model

> A value class C has a value companion C.val which denotes the
> null-hostile (zero-initialized) fully flattenable value type for C .

> Like other type members of C , C.val can be declared with an access
> modifier ( public or private or neither). It is therefore quite
> possible that clients of C might be prevented from using the
> companion type.

> The operations on C.val are almost the same as the operations on
> plain C ( C.ref ), so a private C.val is usually not a burden.

> Operations which are unique to C.val , and which therefore may
> be restricted to you, are:

>     * declaring a field of type C.val
>     * making an array with element type C.val
>     * getting the default flat value C.default
>     * asking for the mirror C.val.class

> Library routines which create empty flattenable arrays of C.val
> might not work as expected, when C.val is not public. You'll have
> to find a workaround, such as:

>     * use a plain C reference array to hold your data
>     * use a different API point which is friendly to privatie C.val types
>     * ask C politely to build such an array for you
>     * crack into C with a reflective API and build your own

> If you look closely at the code for C , you might noticed that it
> uses its private type C.val in its public API. This is allowed.
> Just be aware that null values will not flow through such API points.
> When you get a C.val value into your own code, you can work on it
> perfectly freely with the type C (which is C.ref ).

> If a value companion C.val is declared public , the class has
> declared that it is willing to encounter its own default value
> C.default coming from untrusted code. If it is declared private ,
> only the class's own nest can work with C.default . If the value
> companion is neither public nor private, the class has declared that
> it is willing to encounter its own default within its own package.

> If a class has declared its companion non-atomic, it is willing to
> encounter states arising from data races (across multiple fields) in
> the same places it is willing to encounter its default value. Summary of
> restrictions

> From the implementation point of view, the salient task is restricting
> clients from illegitimately obtaining non-constructed values of C ,
> if the author of C has asked for such restrictions. (Recall that a
> non-constructed value of C is one obtained without using C 's
> constructor or other public API.) Here are the generally enforced
> restrictions regarding a privatized type C.val :

>     * You cannot mention the name C.val or C.default in code.
>     * You cannot create and load bytecodes which would implement such a mention.
>     * You cannot obtain C.default from a mirror of C or C.val .
>     * You cannot create a new C.val[] array from a mirror of C or C.val .
>    * You cannot lengthen an existing C.val[] array to contain uninitialized
>     elements.
>    * You cannot copy an existing array as a new C.val[] array, if C.val is declared
>     non-atomic.

> Even so, let us suppose you are an accident-prone client of C .
> Ignoring the above restrictions, you might go about obtaining a
> non-constructed value of C in several ways, and there is an
> answer from the system in each case that stops you:

>     * You can mention the C.val or C.default directly in code, in various ways.
>    * After obtaining the mirror C.val.class (by one of several means), you can call
>     Class::defaultValue , MethodHandles::zero , or a similar API point.
>    * If you can declare a field of type C.val directly you can extract an initial
>     value (or a data-race result, if C.val is non-atomic).
>    * If you can indirectly create an array of type C.val , you can extract an
>     initial value (or a data-race result, if C.val is non-atomic).

> And there are a number of ways you might attempt to indirectly create
> an array of type C.val[] :

>    * Indirectly create it from a mirror using Array::newInstance or Arrays::copyOf
>     or MethodHandles::arrayConstructor or another similar API point.
>    * Create it from a pre-existing array of the same type using Object::clone or
>     Arrays::copyOf or another similar API point.
>     * Specify such an array on a serialization wire format and deserialize it.

> Using C.val or C.default directly is blocked if C privatizes its
> value companion, unless you are coding a nestmate or package-mate of
> C . These checks are applied both at compile time and when the JVM
> resolves names, so they apply equally to source code and bytecodes
> created by any means whatsoever.

> There are no realistic restrictions on obtaining a mirror to a
> companion type C.val . (Accidental and casual direct use of
> C.val.class is prevented by access restrictions on the type name
> C.val . But there are many ways to get around this limitation.)
> Therefore any method or API which could violate the above generally
> enforced restrictions must perform an appropriate dynamic access check
> on behalf of its mirror argument.

> Such a dynamic access check can be made negotiable by an appeal to
> caller sensitivity or a Lookup check, so a correctly configured call
> can avoid the restriction. For some simple methods (perhaps
> Arrays::copyOf or MethodHandles::zero ) there is no negotiation.
> Depending on the use case, access failure can be worked around via a
> "negotiable" API point like Lookup::arrayConstructor .
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-experts/attachments/20220703/16781cfd/attachment-0001.htm>