Value type companions, encapsulated
Dan Heidinga
heidinga at redhat.com
Mon Jul 4 13:45:53 UTC 2022
Sorry for top-posting but it was easier to track a list of issues as I
read through:
* Miscellaneous privatization checks
--> MethodHandle.asType(MT) and MethodHandle.invoke() will also need
to protect against the zero being introduced.
For example:
jshell> public class T { public static void z() {}}
| created class T
jshell> MethodHandles.lookup().findStatic(T.class, "z",
MethodType.methodType(void.class))
$28 ==> MethodHandle()void
jshell> Object o = $28.invoke()
o ==> null
Here we see invoke() converting a void to a reference (null) and
similarly for a primitive, to zero. Both these apis will need similar
treatment as ::explictCastArguments.
Serialization
--> There's a mention of serialization but if Lambda taught us
anything, it's that serialization requires more thought than we
expected, even if we take that into account =) We should spend some
time on what serialization of a C.val actually means, any format
concerns, and how it interacts with default reconstitution behaviours.
Otherwise, we'll leave a hole here where unconstructed values can be
deserialized.
C.default & Reflection
--> Is "default" a reflectively accessible field or compiler sugar?
If a user does C.val.class.getDeclaredFields will it find "default"?
Or maybe C.class.getDeclaredFields? I'm fine with it being a fiction
but I wasn't clear how far we were pushing that into the reflective
model as well. I think the intent is to expose this with
Class::defaultValue / Lookup::defaultValue APIs but clarification
would be good.
Accessing C.val.class
--> Do we need restrictions here beyond those of accessing C.class?
The mirror may be required to create MethodTypes for use in
MethodHandle lookup().find* apis even by code that can't create a
C.val. Given that it will leak already as shown in the doc, do we
need the extra restrictions?
More thoughts and comments to follow after another read.
--Dan
On Sun, Jul 3, 2022 at 12:56 AM John Rose <john.r.rose at oracle.com> wrote:
>
> In this message Brian wrote out the major features
> of an emerging design for value classes:
>
> From: Brian Goetz brian.goetz at oracle.com
> To: … valhalla-spec-experts at openjdk.java.net
> Subject: Re: User model stacking: current status
> Date: Thu, 23 Jun 2022 15:01:24 -0400
>
> I think controlling the complexity by having a separate
> nested declaration of the value companion type will
> work very well.
>
> So what exactly does a private value companion do?
> What is it you can and cannot do with this type?
> What problems are prevented by privatizing it?
> How and when is privatization enforced?
> What other problems are created by those new rules?
>
> I have been pulling on this thread for a few days
> now, and I think I have some answers.
>
> http://cr.openjdk.java.net/~jrose/values/encapsulating-val.md
> http://cr.openjdk.java.net/~jrose/values/encapsulating-val.html
>
> (The Hitchhiker’s Guide suddenly comes to mind. Don’t panic!)
>
> I expect I will be editing these files as we go.
> For reference here is a verbatim copy of the MD file
> as it stands right now (minus the header):
>
> Background
>
> (We will start with background information. The new stuff comes
> afterward. Impatient readers can find a very quick summary of
> restrictions at the end.)
>
> Affordances of C.ref
>
> Every class or interface C comes with a companion type, the
> reference type C.ref derived from C which describes any variable
> (argument, return value, array element, etc.) whose values are either
> null or of a concrete class derived from C. We are not in the habit
> of distinguishing C.ref from C, but the distinction is there. For
> example, if we call Object::getClass on a variable of type C.ref
> we might not get C.class; we might even get a null pointer
> exception!
>
> We are so very used to working with reference types (for short,
> ref-types) that we sometimes forget all that they do for us
> in addition to their linkage to specific classes:
>
> C.ref gives a starting point for accessing C's members.
> C.ref provides abstraction: C or a subtype might not be loaded yet.
> C.ref provides the standard uninitialized value null.
> C.ref can link C objects into graphs, even circular ones.
> C.ref has a known size, one "machine word", carefully tuned by the JVM.
> C.ref allows a single large object to be shared from many locations.
> C.ref with an identity class can centralize access to mutable state.
> C.ref values uniformly convert to and from general types like Object.
> C.ref variable types can be reflected using Class mirror objects.
> C.ref is safe for publication if the fields of C are final.
>
> When I store a bunch of C objects into an object array or list, sort
> it, and then share it with another thread, I am using several of the
> above properties; if the other thread down-casts the items to C.ref
> and works on them it relies on those properties.
>
> If I implement C as a doubly-linked list data structure or a
> (alternatively) a value-based class with tree structure, I am using
> yet more of the above properties of references.
>
> If my C object has a lot of state and I pass out many pointers to
> it, and perhaps compute and cache interesting values in its mutable
> fields, I am again relying on the special properties of references,
> as well as of identity classes (if fields are mutable).
>
> By the way, in the JVM, variables of type C.ref (some of them at
> least) are associated not with C simple, but with the so-called
> L-descriptor spelled LC;. When we talk about C.ref we are
> usually talking about those L-descriptors in the JVM, as well.
>
> I don't need to think much about this portfolio of properties as I go
> about my work. But if they were to somehow fail, I would notice bugs
> in my code sooner or later.
>
> One of the big consequences of this overall design is that I can write
> a class C which has full control over its instance states. If it is
> mutable, I can make its fields private and ensure that mutations occur
> only under appropriate locking conditions. Or if I declare it as a
> value-based class, I can ensure that its constructor only allows
> legitimate instances to be constructed. Under those conditions, I
> know that every single instance of my class will have been examined
> and accepted by the class constructor, and/or whatever factory and
> mutator methods I have created for it. If I did my job right, not
> even a race condition can create an invalid state in one of my
> objects.
>
> Any instance state of C which has been reached without being
> produced from a constructor, factory, mutator, or constant of C can
> be called non-constructed. Of course, inside a class any state
> whatever can be constructed, subject to the types of fields and so on.
> But the author of the class gets to decide which states are
> legitimate, and the decisions are enforced by access control at the
> boundaries of the encapsulation.
>
> So if I code my class right, using access control to keep bad states
> away from my clients, my class's external API will have no
> non-constructed states.
>
> Costs of C.ref
>
> In that case why have value types at all, if references are so
> powerful? The answer is that reference-based abstraction pays for its
> benefits with particular costs, costs that Java programmers do not
> always wish to pay:
>
> A reference (usually) requires storage for a pointer to the object.
> A reference (usually) requires storage for a header embedded inside the object.
> Access to an object's fields (usually) requires extra cycles to chase the pointer.
> The GC expends effort administering a singular "home location" for every object.
> Cache line invalidation near that home location can cause useless memory traffic.
> A reference must be able to represent null; tightly-packed types like int and long would need to add an extra bit somewhere to cover this.
>
> The major alternative to references, as provided by Valhalla, is flat
> objects, where object fields are laid out immediately in their
> containers, in place of a pointer which points to them stored
> elsewhere. Neither alternative is always better than the other, which
> is why Java has both int and Integer types and their arrays, and
> why Valhalla will offer a corresponding choice for value classes.
>
> Alternative affordances of C.val
>
> Now, instances of a value class can be laid out flat in their
> containing variables. But they can also be "boxed" in the heap, for
> classic reference-based access. Therefore, a value class C has not
> one but two companion types associated it, not only the reference
> companion C.ref but also the value companion C.val. Only value
> classes have value companions, naturally. The companion C.val is
> called a value type (or val-type for short), by contrast with any
> reference type, whether Object.ref or C.ref.
>
> The two companion types are closely related and perform some of the
> same jobs:
>
> C.ref and C.val both give a starting point for accessing C's members.
> C.ref and C.val can link C objects into acyclic graphs.
> C.ref and C.val values uniformly convert to and from general types like Object.
> C.ref and C.val variable types can be reflected using Class mirror objects.
>
> For these jobs, it usually doesn't matter which type companion does
> the work.
>
> Despite the similarities, many properties of a value companion type
> are subtly different from any reference type:
>
> C.val is non-abstract: You must load its class file before making a variable.
> C.val cannot nest except by reference; C cannot declare a C.val field.
> C.val does not represent the value null.
> C.val is routinely flattenable, avoiding headers and indirection pointers
> C.val has configurable size, depending on C's non-static fields.
> C.val heap variables (fields, array elements) are initialized to all-zeroes.
> C.val might not be safe for publication (even though its fields are final).
>
> The JVM distinguishes C.val by giving it a different descriptor, a
> so-called Q-descriptor of the form QC;, and it also provides a
> so-called secondary mirror C.val.class which is similar to the
> built-in primitive mirrors like int.class.
>
> As the Valhalla performance model notes, flattening may be expected
> but is not fully guaranteed. A C.val stored in an Object
> container is likely to be boxed on the heap, for example. But C.val
> objects created as bytecode temporaries, arguments, and return values
> are likely to be flattened into machine registers, and C.val fields
> and array elements (at least below certain size thresholds) are also
> likely to be flattened into heap words.
>
> As a special feature, C.ref is potentially flattenable if C is a
> value class. There are additional terms and conditions for flattening
> C.ref, however. If C is not yet loaded, nothing can be done:
> Remember that reference types have full abstraction as one of their
> powers, and this means building data structures that can refer to them
> even before they are loaded. But a class file can request that the JVM
> "peek" at a class to see if it is a value class, and if this request
> is acted on early enough (at the JVM's discretion), then the JVM can
> choose to lay out some or all C.ref values as flattened C.val
> values plus a boolean or other sentinel value which indicates the
> null state.
>
> Pitfalls of C.val
>
> The advantages of value companion types imply some complementary
> disadvantages. Hopefully they are rarely significant, but they
> must sometimes be confronted.
>
> C.val might need to load a class file which is somehow unloadable
> C.val will fail to load if its instance layout directly or indirectly includes a C.val field or subfield
> C.val will throw an exception if you try to assign a null to it.
> C.val may have surprising costs for multi-word footprint and assignment (and so might C.ref if that is flattened)
> C.val is initialized to its all-zero value, which might be non-constructed
> C.val might allow data races on its components, creating values which are non-constructed
>
> The footprint issue shows up most strongly if you have many copies of
> the same C.val value; each copy will duplicate all the fields, as
> opposed many copies of the same C.ref reference, which are likely to
> all point to a single heap location with one copie of all the fields.
>
> Flat value size can also affect methods like Arrays.sort, which
> perform many assignments of the base type, and must move all fields on
> each assignment. If a C.val array has many words per element, then
> the costs of moving those words around may dominate a sort request.
> For array sorting there are ways to reduce such costs transparently,
> but it is still a "law of physics" that editing a whole data structure
> will have costs proportional to the size of the edited portions of the
> data structure, and C.ref arrays will often be somewhat more compact
> than C.val arrays. Programmers and library authors will have to use
> their heads when deciding between the new alternatives given by value
> classes.
>
> But the last two pitfalls are hardest to deal with, because they both
> have to do with non-constructed states. These states are the all-zero
> state with the second-to-last pitfall, and (with the last pitfall) the
> state obtained by mixing two previous states by means of a pair of
> racing writes to the same mutable C.val variable in the heap.
> Unlike reference types, value types can be manipulated to create these
> non-constructed states even in well-designed classes.
>
> Now, it may be that a constructor (or factory) might be perfectly able
> to create one of the above non-constructed states as well, no strings
> attached. In that case, the class author is enforcing few or no
> invariants on the states of the value class. Many numeric classes,
> like complex numbers, are like this: Initialization to all-zeroes is
> no problem, and races between components are acceptable, compared to
> the costs of excluding races.
>
> (The reader may recall that early JVMs accepted races on the high
>
> and low halves of 64-bit integers as well; this is no longer a
> widespread issue, but bigger value types like complex raise the same
> issue again, and we need to provide class authors the same solution,
> if it fits their class.)
>
> There are also some classes for which there are no good defaults, or
> for which a good default is definitely not the all-zero bit pattern.
> Authors of such types will often wish to make that bit pattern
> inaccessible to their clients and provide some factory or constant
> that gives the real default. We expect that such types will choose
> the C.ref companion, and rely on the extra null checks to ensure
> correct initialization.
>
> Other classes may need to avoid other non-constructed values that may
> arise from data races, perhaps for reasons of reliability or security.
> This is a subtle trade-off; very few class authors begin by asking
> themselves about the consequences of data races on mutable members,
> and even fewer will ask about races on whole instances of value
> types, especially given that fields in value types are always
> immutable. For this reason, we will set safety as the default, so
> that a class (like complex numbers) which is willing to tolerate data
> races must declare its tolerance explicitly. Only then will the JVM
> drop the internal costs of race exclusion.
>
> Whether to tolerate the all-zero bit pattern is a simpler decision.
> Still, it turns out to be useful to give a common single point of
> declarative control to handle all non-constructed states, both
> the default value of C.val and its mysterious data races.
>
> Privatization to the rescue
>
> (Here are the important details about the encapsulation of value
> types. The impatient reader may enjoy the very quick summary of
> restrictions at the end of this document.)
>
> In order to hide non-constructed states, the value companion C.val
> may be privatized by the author of the class C. A privatized
> value companion is effectively withdrawn from clients and kept private
> to its own class (and to nestmates). Inside the class, the value
> companion can be used freely, fully under control of the class author.
>
> But untrusted clients are prevented from building uninitialized fields
> or arrays of type C.val. This prevents such clients from creating
> (either accidentally or purposefully) non-constructed values of type
> C.val. How privatization is declared and enforced is discussed in
> the rest of this document.
>
> (To review, for those who skipped ahead, non-constructed values are
>
> those not created under control of the class C by constructors or
> other accessible API points. A non-constructed value may be either an
> uninitialized variable of C.val, or the result of a data race on a
> shared mutable variable of type C.val. The class itself can work
> internally with such values all day long, but we exclude external
> access to them by default.)
>
> Atomicity as well
>
> As a second tactic, a value class C may select whether or not the
> JVM enforces atomicity of all occurrences of its value companion
> C.val. A non-atomic value companion is subject to data races, and
> if it is not privatized, external code may misuse C.val variables
> (in arrays or mutable fields) to create non-constructed values via
> data races.
>
> A value companion which is atomic is not subject to data races. This
> will be the default if the the class C does not explicitly request
> non-atomicity. This gives safety by default and limits
> non-constructed states to only the all-zero initial value. The
> techniques to support this are similar to the techniques for
> implementing non-tearing of variables which are declared volatile;
> it is as if every variable of an atomic value variable has some (not
> all) of the costs of volatility.
>
> The JVM is likely to flatten such an atomic value only up to the
> largest available atomically settable memory unit, usually 128 bits.
> Values larger than that are likely to be boxed, or perhaps treated
> with some other expensive transactional technique. Containers that
> are immutable can still be fully flattened, since they are not subject
> to data races.
>
> The behavior of an atomic C.val is aligned with that of C.ref. A
> reference to a value class C never admits data races on C's
> fields. The reason for this is simple: A C.ref value is a C.val
> instance boxed on the heap in a single immutable box-class field of
> type C.val. (Actually, the JVM may partially or wholly flatten the
> representation of C.ref if it can get away with it; full flattening
> is likely for JVM locals and stack values, but any such secret
> flattening is undetectable by the user.) Since it is final all the
> way down (to C's fields) any C.ref value is safely published
> without any possibility of data races. Therefore, an extra
> declaration of non-atomicity in C affects only the value companion
> C.val.
>
> It seems that there are use cases which justify all four combinations
> of both choices (privatization and declared non-atomicity), although
> it is natural to try to boil down the size of the matrix.
>
> C.val private & atomic is the default, and safest configuration
>
> hiding all non-constructed values outside of C and all data races
> even inside of C. There are some runtime costs.
>
> C.val public & non-atomic is the opposite, with fewer runtime
>
> costs. It must be explicitly declared. It is desirable for
> numerics like complex numbers, where all possible bitwise states are
> meaningful. It is analogous to the situation of a naturally
> non-atomic primitive like long.
>
> C.val public & atomic allows everybody to see the all-zero
>
> initial value but no other non-constructed states. This is
> analogous to the situation of a naturally atomic primitive like
> int.
>
> C.val private & non-atomic allows C complete control over the
>
> visibility of non-constructed states, but C also has the ability
> to work internally on arrays of non-atomic elements. C should
> take care not to leak internally-created flat arrays to untrusted
> clients, lest they use data races to hammer non-constructed values
> into those arrays.
>
> It is logically possible, but there does not seem to be a need, for
> allowing a single class C to work with both kinds of arrays, atomic
> and non-atomic. (In principle, the dynamic typing of Java arrays
> would support this, as long as each array was configured at its
> creation.) The effect of this can be simulated by wrapping a
> non-atomic class C in another wrapper class WC which is atomic.
> Then C.val[] arrays are non-atomic and WC.val[] arrays are atomic,
> yet each kind of array can have the same "payload", a repeated
> sequence of the fields of C.
>
> Privatization in code
>
> For source code and bytecode, privatization is enforced by performing
> access checks on names.
>
> Privatization rules in the language
>
> We will stipulate that a value class C always has a value
> companion type C.val, even if it is never declared or used. And we
> give the author of C some control over how clients may use the type
> C.val, in a manner roughly similar to nested member classes like
> C.M.
>
> Specifically, the declaration of C always selects an access mode for
> its value companion C.val from one of the following three choices:
>
> C.val is declared private
> C.val is declared public
> C.val is declared, but neither public nor private
>
> If C.val is declared private, then only nestmates of C may access
> C.val. If it is neither public nor private, only classes in the
> same runtime package as C may access it. If it is declared public,
> then any class that can access C may also access C.val.
>
> As an independent choice, the declaration of C may select an atomicity for its value companion C.val` from one of the following two choices:
>
> C.val is explicitly declared non-atomic
> C.val is not explicitly declared non-atomic, and is thus atomic
>
> If there is no explicit access declaration for C.val in the code of
> C, then C.val is declared private and atomic. That is, we set the
> default to the safest and most restrictive choice.
>
> In source code, these declarations are applied to explicit occurrences
> of the type name C.val. The access modification of C.val is also
> transferred to the implicitly declared name C.default
>
> The syntax looks like this:
>
> class C {
> //only one of the following lines may be specified
> //the first line is the default
> private value companion C.val; //nestmates only
> value companion C.val; //package-mates only
> public value companion C.val; //all may access
> // the non-atomic modifier may be present:
> private non-atomic value companion C.val;
> public non-atomic value companion C.val;
> non-atomic value companion C.val;
> }
>
> When a type name C.val or an expression C.default is
> used by a class X, there are two access checks that occur. First,
> access from X to the class C is checked according to the usual
> rules of Java. If access to C is permitted, a second check is done
> if the companion is not declared public. If the companion is
> declared private, then X and C must be nestmates, or else access
> will fail. If the companion is neither public nor private, then
> X and C must be in the same package, or else access will fail.
>
> Example privatized value companion
>
> Here is an example of a class which refuses to construct its default
> value, and which prevents clients from seeing that state:
>
> class C {
> int neverzero;
> public C(int x) {
> if (x == 0) throw new IllegalArgumentException();
> neverzero = x;
> }
> public void print() { System.out.println(this); }
>
> private value companion C.val; //privatized (also the default)
>
> // some valid uses of C.val follow:
> public C.val[] flatArray() { return new C.val[]{ this }; }
> private static C.ref nonConstructedZero() {
> return (new C.val[1])[0]; //OK: C.val private but available
> }
> public static C.ref box(C.val val) { return val; } //OK param type
> public C.val unbox() { return this; } //OK return type
>
> // valid use of private C.default, with Lookup negotiation
> public static
> C.ref defaultValue(java.lang.reflect.MethodHandles.Lookup lookup) {
> if (!lookup.in(C.class).hasFullPrivilegeAccess())
> return null; //…or throw
> return C.default; //OK: default for me and maybe also for thee
> }
> }
>
> // non-nestmate client:
> class D {
> static void passByValue(C x) {
> C.ref ref = box(x); //OK, although x is null-checked
> if (false) box((C.ref) null); //would throw NPE
> assert ref == x;
> }
>
> static Object useValue(C x) {
> x.unbox().print(); //OK, invoke method on C.val expression
> var xv = x.unbox(); //OK, although C.val is non-denotable
> xv.print(); //OK
> //> C.val xv = x.unbox(); //ERROR: C.val is private
> return xv; //OK, originally from legitimate method of C
> }
>
> static Object arrays(C x) {
> var a = x.flatArray();
> //> C.val[] va = a; //ERROR: C.val is private
> Arrays.toString(a); //OK
> C.ref[] a2 = a; //covariant array assignment
> C.ref[] na = new C.ref[1];
> //> na = new C.val[1]; //ERROR: C.val is private
> return a[0]; //constructed values only
> }
> }
>
> The above code shows how a privatized value companion can and cannot
> be used. The type name may never be mentioned. Apart from that
> restriction, client code can work with the value companion type as it
> appears in parameters, return values, local variables, and array
> elements. In this, a privatized companion behaves like other
> non-denotable types in Java.
>
> Rationale: Note that a companion type is not a real class.
>
> Therefore it cannot appeal, precisely, to the existing provisions (in
> JLS or JVMS) for enforcing class accessibility. But because it is a
> type, and today nearly all types are classes (and interfaces), users
> have a right to expect that encapsulation of companion types will
> "feel like" encapsulation of type names. More precisely, users will
> hope to re-use their knowledge about how type name access works when
> reasoning about companion types. We aim to accommodate that hope. If
> it works, users won't have to think very often about the class-vs-type
> distinction. That is why the above design emulates pre-existing
> usage patterns for non-denotable types.
>
> Privatization in translation
>
> When a value class is compiled to a class file, some metadata is
> included to record the explicit declaration or implicit status of the
> value companion.
>
> The access selection of C's value companion (public, package,
> private) is encoded in the value_flags field of the ValueClass
> attribute of the class information in the class file of C.
>
> The value_flags field (16 bits) has the following legitimate values:
>
> zero: C.val default access, non-atomic
> ACC_PUBLIC: C.val public access, non-atomic
> ACC_PRIVATE: C.val private access, non-atomic
> ACC_VOLATILE: C.val default access, atomic
> ACC_VOLATILE|ACC_PUBLIC: C.val public access, atomic
> ACC_VOLATILE|ACC_PRIVATE: C.val private access, atomic
>
> Other values are rejected when the class file is loaded.
>
> (JVM ISSUE #0: Can we kill the ACC_VALUE modifier bit? Do we
> really care that jlr.Modifiers kind-of wants to own the reflection
> of the contextual modifier value? Who are the customers of this
> modifier bit, as a bit? John doesn't care about it personally, and
> thinks that if we are going to have an attribute we can get rid of the
> flag bit. One implementation issue with killing ACC_VALUE is that
> class modifiers are processed very late during class loading, while
> class modifiers are processed very early. It may be easier to do some
> kinds of structural checks on the fly during class loading even before
> class attributes are processed. Yet this also seems like a poor
> reason to use a modifier bit.)
>
> (JVM ISSUE #1: What if the attribute is missing; do we reject the
> class file or do we infer value_flags=ACC_PRIVATE|ACC_VOLATILE?
> Let's just reject the file.)
>
> (JVM ISSUE #2: Is this ValueClass attribute really a good place
> to store the "atomic" bit as well? This attribute is a green-field
> for VM design, as opposed to the brown-field of modifier bits. The
> above language assumes the atomic bit belongs in there as well.)
>
> A use of a value companion C.val, in any source file, is generally
> translated to a use of a Q-descriptor QC;:
>
> a field declaration of C.val translates to a field-info with a Q-descriptor
> a method or constructor declaration that mentions C.val mentions a corresponding Q-descriptor in its method descriptor
> a use of a field resolves a CONSTANT_Fieldref with a Q-descriptor component
> a use of a method or constructor uses a CONSTANT_Methodref (or CONSTANT_InterfaceMethodref) with a Q-descriptor component
> a CONSTANT_Class entry main contain a Q-descriptor or an array type whose element type is a Q-descriptor
> a verifier type record may refer to CONSTANT_Class which contains a Q-descriptor
>
> Privatization is enforced for these uses only as much as is needed to
> ensure that classes cannot create unintiialized values, fields, and
> arrays.
>
> If an access from bytecode to a privatized Q-descriptor fails, an
> exception is thrown; its type is IllegalAccessError, a subtype of
> IncompatibleClassChangeError. Generally speaking such an exception
> diagnoses an attempt by bytecode to make an access that would have
> been prevented by the static compiler, if the Java source program had
> been compiled together as a whole.
>
> When a field of Q-descriptor type is declared in a class file, the
> descriptor is resolved early, before the class is linked, and that
> resolution includes an access check which will fail unless the class
> being loaded has access to C.val, as determined by loading C and
> inspecting its ValueClass attribute. These checks prevent untrusted
> clients of C from created non-constructed zero values, in any of
> their fields.
>
> The timing of these checks, on fields, is aligned with the internal
> logic of the JVM which consults the class file of C to answer other
> related questions about field types: (a) whether C is in fact a
> value class, and (b) what is the layout of C.val, in case the JVM
> wishes to flatten the value in a containing field. The third check
> (c) is C.val companion accessible happens at the same time. This is
> early during class loading for non-static fields, and during
> class preparation for static fields.
>
> Privatization is not enforced for non-field Q-descriptors, that
> occur in method and constructor signatures, and in state descriptions
> for the verifier. This is because mere use of Q-descriptors to
> describe pre-existing values cannot (by itself) expose non-constructed
> values, when those values are on stack or in locals.
>
> This can happen invisible at the source-code level as well. An API
>
> might be designed to return values of a privatized type from its
> methods or fields, and/or accept values of a privatized type into its
> methods, constructors, or fields. In general, the bytecode for a
> client of such an API will work with a mix of Q-descriptor and
> L-descriptor values.
>
> The verifier's type system uses field descriptor types, and thus can
> "see" both Q-descriptors and L-descriptors. Clients of a class with a
> privatized companion are likely to work mostly with L-descriptor
> values but may also have Q-descriptor values in locals and on stack.
>
> When feeding an L-descriptor value to an API point that accepts a
> Q-descriptor, the verifier needs help to keep the types straight. In
> such cases, the bytecode compiler issues checkcast instructions to
> adjust types to keep the verifier happy, and in this case the operand
> of the checkcast would be of the form CONSTANT_Class["QC;"].
>
> (JVM ISSUE #3: The Q/L distinction in the verifier helps the
> interpreter avoid extra dynamic null checks around putfield,
> putstatic, and the invoke instructions. This distinction requires
> an explicit bytecode to fix up Q/L mismatches; the checkcast
> bytecode serves this purpose. That means checkcast requires the
> ability to work with privatized types. It requires us to make the
> dynamic permission check when other bytecodes try to use the
> privatized type. All this seems acceptable, but we could try to make
> a different design which CONSTANT_Class resolution fails immediately
> if it contains an inaccessible Q-descriptor. That design might
> require a new bytecode which does what checkcast does today on a
> Q-descriptor.)
>
> Meanwhile, arrays are rich sources of non-constructed zero values.
> They appear in bytecode as follows:
>
> A C.val[] array construction uses anewarray with a CONSTANT_Class type for the Q-descriptor; this is new to Valhalla.
> Such an array construction may also use multianewarray with an appropriate array type.
> An array element is read from heap to stack by aaload; the verifier type of the stacked value is copied from the verifier type of the array itself.
> An array element is written from stack to heap by aastore; the verifier type of the stored value is merely constrained to the type Object.
>
> Note that there are no static type annotations on array access
> instruction. The practical impact of this is that, if an array of a
> privatized type C.val is passed outside of C, then any values in
> that array become accessible outside of C. Moreover, if C.val is
> non-atomic, clients may be able to inflict data races on the array.
>
> Thus, the best point of control over misuse of arrays is their
> creation, not their access. Array creation is controlled by
> CONSTANT_Class constant pool entries and their access checking.
> When an anewarray or multianewarray tries to create an array,
> the CONSTANT_Class constant pool entry it uses must be consulted
> to see if the element type is privatized and inaccessible to the
> current class, and IllegalAccessError thrown if that is the case.
>
> All this leads to special rules for resolving an entry of the form
> CONSTANT_Class["QC;"]. When resolving such a constant, the class
> file for C is loaded, and C is access checked against the current
> class. (This is just what happens when CONSTANT_Class["C"] gets
> resolved.) Next, the ValueClass attribute for C is examined; it
> must exist, and if it indicates privatization of C.val, then access
> is checked for C.val against the current class.
>
> If that access to a privatized companion would fail, no exception is
> thrown, but the constant pool entry is resolved into a special
> restricted state. Thus, a resolved constant pool entry of the form
> CONSTANT_Class["QC;"] can have the following states:
>
> Error, because C is inaccessible or doesn't exist or is not a value class.
> Full resolution, so C.val is ready for general use in the current class.
> Restricted resolution, so C.val is ready for restricted use in the current class.
>
> That last state happens when C is accessible but C.val is not.
>
> Likewise, a constant pool entry of the form CONSTANT_Class["[QC;"]
> (or a similar form with more leading array brackets) can have three
> states, error, full resolution, and restricted resolution.
>
> Pre-Valhalla CONSTANT_Class entries which do not mention
> Q-descriptors have only two resolved states, error and full
> resolution.
>
> As required above, the checkcast bytecode treats full resolution and
> restricted resolution states the same.
>
> But when the anewarray or multianewarray instruction is executed,
> it consults throws an access error if its CONSTANT_Class is not
> fully resolved (either it is an error or is restricted). This is how
> the JVM prevents creation of arrays whose component type is an
> inaccessible value companion type, even if the class file does
> not correspond to correct Java source code.
>
> Here are all the classfile constructs that could refer to a
> CONSTANT_Class constant in the restricted state, and whether they
> respect it (throwing IllegalAccessError):
>
> checkcast ignores the restriction and proceeds
> instanceof ignores the restriction (consistent with checkcast)
> anewarray and multianewarray respect the restriction and throw
> ldc throws (consistent with C.val.class in source code)
> bootstrap arguments throw (consistent with ldc)
> verifier types ignore the restriction and continue checking
> (FIXME: There must be more than this.)
>
> Q-descriptors not in CONSTANT_Class constants are naturally immune
> to privatization restrictions. In particular, CONSTANT_Methodtype
> constants can successfully refer to mirrors to privatized companions.
>
> Uses of CONSTANT_Class constants which forbid Q-descriptors and
> their arrays are also naturally immune, since they will never
> encounter a constant resolved in the restricted state. These include
> new, aconst_init, the class sub-operands of CONSTANT_Methodref
> and its friends, exception catch-types, and various attributes like
> NestHost and InnerClasses: All of the above are allowed to refer
> only to proper classes, and not to their value companions or arrays.
>
> Nevertheless, a aconst_init bytecode must throw an access error when
> applied to a class with an inaccessible privatized value companion.
> This is worth noting because the constant pool entry for aconst_init
> does not mention a Q-descriptor, unlike the array construction
> bytecodes.
>
> Perhaps regular class constants of the form CONSTANT["C"] would
>
> also benefit slightly from a restricted state, which would be
> significant only to the aconst_init bytecode, and ignored by all
> the above "naturally immune" usages. If a JVM implementation takes
> this option, the same access check would be performed and recorded for
> both CONSTANT["C"] and CONSTANT["QC;"], but would be respected
> only by withvalue (for the former) and anewarray and the other
> cases noted above (for the latter but not the former). On the other
> hand, the particular issue would become moot if aconst_init, like
> withfield, were restricted to the nest of its class, because then
> privatization would not matter.
>
> The net effect of these rules, so far, is that neither source code nor
> class files can directly make uninitialized variables of type C.val,
> if the code or class file was not granted access to C.val via C.
> Specifically, fields of type C.val cannot be declared nor can arrays
> of type C.val[] be constructed.
>
> This includes class files as correctly derived from valid source code
> or as "spun" by dodgy compilers or even as derived validly from old
> source code that has changed (and revoked some access).
>
> Remember that new nestmates can be injected at runtime via the
>
> Lookup API, which checks access and then loads new code that enjoys
> the same access. The level of access depends in detail on the
> selection of ClassOption.NESTMATE (for nestmate injection) or not
> (for package-mate injection). The JVM uses common rules for these
> injected nestmates or package-mates and for normally compiled ones.
>
> There are no restrictions on the use of C.ref, beyond the basic
> access restrictions imposed by the language and JVM on the name C.
> Access checks for regular references to classes and interfaces are
> unchanged throughout all of the above.
>
> There are more holes to be plugged, however. It will turn out that
> arrays are once again a problem. But first let's examine how
> reflection interacts with companion types and access control.
>
> Privatization and APIs
>
> Beyond the language there are libraries that must take account of the
> privatization of value companions. We start on the shared boundary
> between language and libraries, with reflection.
>
> Reflecting privatization
>
> Every companion type is reflected by a Java class mirror of type
> java.lang.Class. A Java class mirror also represents the class
> underlying the type. The distinction between the concept of class and
> companion type is relatively uninteresting, except for a value class
> C, which has two companion types and thus two mirrors.
>
> In Java source code the expression C.class obtains the mirror for
> both C and its companion C.ref. The expression C.val.class
> obtains the mirror for the value companion, if C is a value class.
> Both expressions check access to C as a whole, and C.val.class
> also checks access to the value companion (if it was privatized).
>
> But it is a generally recognized fact that Java class mirrors are less
> secure than the Java class types that the mirrors represent. It is
> easy to write code that obtains a mirror on a class C without
> directly mentioning the name C in source code. One can use
> reflective lookup to get such mirrors, and without even trying one may
> also "stumble upon" mirrors to inaccessible classes and companion
> types. Here are some simple examples:
>
> Class<?> lookup() {
> var name = "java.util.Arrays$ArrayList";
> //or name = "java.lang.AbstractStringBuilder";
> //> java.lang.invoke.MethodHandles.lookup().findClass(name); //ERROR
> return Class.forName(name); //OK!
> }
> Class<?> stumble1() {
> //> return java.util.Arrays.ArrayList.class; //ERROR
> return java.util.Arrays.asList().getClass(); //OK!
> }
> Class<?> stumble2() {
> //> return java.lang.AbstractStringBuilder.class; //ERROR
> return StringBuilder.class.getSuperclass(); //OK!
> }
> Class<?> stumble3() {
> //> return C.val.class; //ERROR if C.val is privatized
> return C.ref.class.asValueType(); //OK!
> }
>
> Therefore, access checking class names is not and cannot be the whole
> story for protecting classes and their companion types from reflective
> misuse. If a mirror is obtained that refers to an inaccessible
> non-public class or privatized companion, the mirror will "defend
> itself" against illegal access by checking whether the caller has
> appropriate permissions. The same goes for method, constructor, and
> field mirrors derived from the class mirror: You can reflect a method
> but when you try to call it all of the access checks (including the
> check against the class) are enforced against you, the caller of the
> reflective API.
>
> The checking of the caller has two possible shapes. Either a caller
>
> sensitive method looks directly at its caller, or the call is
> delegated through an API that requires negotiation with a
> MethodHandles.Lookup object that was previously checked against a
> caller.
>
> Now, if a class C is accessible but its value companion C.val is
> privatized, all of C's public methods and other API points are
> accessible (via both companion types), but access is limited to those
> very specific operations that could create non-constructed instances
> (via a variable of companion type C.val). And this boils down
> to a limitation on array creation. If you cannot use either source
> code or reflection to create an array of type C.val[], then you
> cannot create the conditions necessary to build non-constructed
> instances.
>
> Reflective APIs should be available to report the declared properties
> of reference companions. It is enough to add the following two methods:
>
> Class::isNonAtomic is true only of mirrors of value companions
>
> which have been declared non-atomic. On some JVM implementations it
> may additionally be true of long.class and/or double.class.
>
> Class::getModifiers, when applied to a mirror of a value
>
> companion, will return a modifier bit-mask that reflects the
> declared access. (This is compatible with the current behavior of
> HotSpot for primitive mirrors, which appear as if they were somehow
> declared public, with abstract and final thrown in to boot.)
>
> (Note that most reflective access checking should take care to work
> with the reference mirror, not the value mirror, as the modifier bits
> of the two mirrors might differ.)
>
> Privatization and arrays
>
> There are a number of standard API points for creating Java array
> objects. When they create arrays containing uninitialized elements,
> then a non-constructed default value can appear. Even when they
> create properly initialized arrays, if the type is declared
> non-atomic, then non-constructed values can be created by races.
>
> java.lang.reflect.Array::newInstance takes an element mirror and length and builds an array. The elements of the returned array are initialized to the default value of the selected element type.
> java.util.Arrays::copyOf and copyOfRange can extend the length of an existing array to include new uninitialized elements.
> A special overloading of java.util.Arrays::copyOf can request a different type of the new array copy.
> java.util.Collection::toArray (an interface method) may extend the length of an existing array, but does not add uninitialized elements.
> java.lang.invoke.MethodHandles.arrayConstructor creates a method handle that creates uninitialized arrays of a given type, as if by the anewarray bytecode.
> The serialization API contains an operator for materializing arrays of arbitrary type from the wire format.
>
> The basic policy for all these API points is to conservatively limit
> the creation of arrays of type C.val[] if C.val is not public.
>
> java.lang.reflect.Array::newInstance will throw
> IllegalArgumentException if the element type is privatized.
> (See below for a possible caller-sensitive enhancement.)
>
> java.util.Arrays::copyOf and copyOfRange will throw instead of
> creating uninitialized elements, if the element type is
> privatized. If only previously existing array elements are
> copied, there is no check, and this is a use common case (e.g., in
> ArrayList::toArray).
>
> The special overloading of java.util.Arrays::copyOf will refuse
> to create an array of any non-atomic privatized type. (This
> refusal protects against non-constructed values arising from data
> races.) It also incorporates the restrictions of its sibling
> methods, against creating uninitialized elements (even of an
> atomic type).
>
> java.lang.invoke.MethodHandles.arrayConstructor will refuse to
> create a factory method handle if the element type is privatized.
>
> java.util.Collection::toArray needs implementation review; as it
> is built on top of the previous API points, it may possibly fail
> if asked to lengthen an array of privatized type. Note that many
> methods of toArray use Arrays.copyOf in a safe manner, which
> does not create uninitialized elements.
>
> java.util.stream.Stream::toArray, the various List::toArray,
> and other clients of Arrays::copyOf or Array::newInstance need
> implementation review. Where a generic API is involved, the
> assumption is often that non-flat reference arrays are being
> created, and in that case no outage is possible, since reference
> companion arrays can always be freely created. For specialized
> generics with flat types, additional implementation work is
> required, in general, to ensure that flat arrays can be created by
> parties with the right to do so.
>
> The serialization API should restrict its array creation operator.
> Serialization methods should not attempt to serialize flat arrays
> either. It is enough to serialize arrays of the reference type.
>
> API ISSUE #1: Should we relax construction rules for zero-length
> arrays? This would add complexity but might be a friendly move for
> some use cases. A zero-length array cannot expose non-constructed
> values. It may, however, serve as a misleading "witness" that some
> code has gained permission to work with flat arrays. It's safer to
> disallow even zero-length arrays.
>
> API ISSUE #2: What about public value companions of non-public
> inaccessible classes? In source code, we do not allow arrays of
> private classes to be made, or of their their public value companions.
> Should we be more permissive in this case? We could specify that
> where a value companion has to be checked against a client, its
> original class gets checked as well; this would exclude some use cases
> allowed by the above language, which only takes effect if the
> companion is privatized. An extra check for a public companion seems
> like busy-work and a source of unnecessary surprises, though. Let's
> not.
>
> There are probably legitimate use cases for arrays of privatized
> types, with which the new restrictions on the above API points would
> interfere. So as a backup, we will make API adjustments to work with
> privatized array types, with an extra handshake to perform the access
> check (via either caller sensitivity or negotiation with an instance
> of MethodHandles.Lookup).
>
> java.lang.reflect.Array::newInstance should probably be made
> caller sensitive, so it can refrain from throwing if a privatized
> element type is accessible to the caller. (Alternatively, a new
> caller-sensitive API point could made, such as
> Array::newFlatInstance. But a new API point seems unnecessary
> in this case, and caller-sensitivity is common practice in this
> method's package.) Note that, as is typical of core reflection
> API points, many uses of newInstance will not benefit from
> the caller sensitivity.
>
> java.util.Arrays::copyOf and copyOfRange may be joined by
> additional "companion friendly" methods of a similar character
> which fill new array elements with some other specified fill
> value, and/or which cyclically replicate the contents of the
> original array, and/or which call a functional interface to
> provide missing elements. The details of this are a matter for
> library designers to decide. Adding caller sensitivity to
> these API points is probably the wrong move.
>
> java.lang.invoke.MethodHandles::arrayConstructor will be joined
> by a method of the same name on MethodHandles.Lookup which
> performs a companion check before allowing the array constructor
> method handle to be returned. It will not check the class, just
> the companion. Note that the use of caller sensitivity in the
> Lookup API is concentrated on the factory method Lookup::lookup,
> which is the starting point for Lookup-based negotiation.
>
> Miscellaneous privatization checks
>
> Besides newly-created or extended arrays, there are a few API points
> in java.lang.invoke which expose default values of reflectively
> determined types. Like the array creation methods, they must simply
> refuse to expose default values of privatized value companions.
>
> MethodHandles::zero and MethodHandles::empty will simply
>
> refuse to produce a result of a privatized C.val type. Clients
> with a legitimate need to produce such default values can use
> MethodHandles::filterReturnValue and/or MethodHandles::constant
> to create equivalent handles, assuming they already possess the
> default value.
>
> MethodHandles::explicitCastArguments will refuse to convert from
>
> a nullable reference to a privatized C.val type. Clients with a
> legitimate need to convert nulls to privatized values can use
> conditional combinators to do this "the hard way".
>
> The method Lookup::accessCompanion will be defined analogously
>
> to Lookup::accessClass. If Lookup::accessClass is applied to a
> companion, it will check both the class and the companion, whereas
> Lookup::accessCompanion will look only at the possible
> privatization of the companion. (Thus it can simply refer to
> Reflection::verifyCompanionType.)
>
> To support reflective checks against array elements which may be
> privatized companion types, an internal method of the form
> jdk.internal.reflect.Reflection::verifyCompanionType may be defined.
> It will pass any reference type (regardless of class accessibility)
> and for a value companion it will check access of the companion (but
> not the class itself).
>
> Building companion-safe APIs
>
> The method Lookup::arrayConstructor gives enough of a "hook" to
> create all kinds of safe but friendly APIs in privileged JDK code.
> The methods in java.util could make use of this privileged API to
> quickly adapt their internal code to create arrays in cases they are
> refused by the existing methods Array.newInstance and
> Arrays.copyOf.
>
> For example, a checked method MethodHandles.Lookup::defaultValue(C)
> may be added to provide the default value C.default if its companion
> C.val is accessible. It will operate as if it first creates a
> one-element array of the desired type, and then loads the element.
>
> Or, a caller-sensitive method Class::defaultValue or Class::newArray
> could be added which check the caller and return the requested result.
> All such methods can be built on top of MethodHandles.Lookup.
>
> In general, a library API may be designed to preserve some aspect of
> companion safety, as it allows untrusted code to work with arrays of
> privatized value type, while preventing non-constructed values of that
> type from being materialized. Each such safe and friendly API has to
> make a choice about how to prevent clients from creating
> non-constructed states, or perhaps how to allow clients to gain
> privilege to do so. Some points are worth remembering:
>
> An unprivileged client must not obtain C.default if C.val is privatized.
> An unprivileged client must not obtain a non-empty C.val[] array if C.val is privatized and non-atomic.
> It's safe to build new (non-empty, mutable) arrays from (non-empty, mutable) old arrays, if the default is not injected.
> If a new array is somehow frozen or wrapped so as be effectively immutable, it is safe as long as it does not expose C.default values.
> If a value companion is public, there is no need for any restriction.
> Also, unrestricted use can be gated by a Lookup object or caller sensitivity.
>
> In the presence of a reconstruction capability, either in the
>
> language or in a library API or as provided by a single class,
> avoiding non-constructable objects includes allowing legitimate
> reconstruction requests; each legitimate reconstruction request must
> somehow preserve the intentions of the class's designer.
> Reconstruction should act as if field values had been legitimately
> (from C's API) extracted, transformed, and then again legitimately
> (to C's API) rebuilt into an instance of C. Serialization is an
> example of reconstruction, since field values can be edited in the
> wire format. Proposed with expressions for records are another
> example of reconstruction. The withfield bytecode is the primitive
> reconstruction operator, and must be restricted to nestmates of C
> since it can perform all physically possible field updates.
> Reconstruction operations defined outside of C must be designed with
> great care if they use elevated privileges beyond what C provides
> directly.
>
> Summary of user model
>
> A value class C has a value companion C.val which denotes the
> null-hostile (zero-initialized) fully flattenable value type for C.
>
> Like other type members of C, C.val can be declared with an access
> modifier (public or private or neither). It is therefore quite
> possible that clients of C might be prevented from using the
> companion type.
>
> The operations on C.val are almost the same as the operations on
> plain C (C.ref), so a private C.val is usually not a burden.
>
> Operations which are unique to C.val, and which therefore may
> be restricted to you, are:
>
> declaring a field of type C.val
> making an array with element type C.val
> getting the default flat value C.default
> asking for the mirror C.val.class
>
> Library routines which create empty flattenable arrays of C.val
> might not work as expected, when C.val is not public. You'll have
> to find a workaround, such as:
>
> use a plain C reference array to hold your data
> use a different API point which is friendly to privatie C.val types
> ask C politely to build such an array for you
> crack into C with a reflective API and build your own
>
> If you look closely at the code for C, you might noticed that it
> uses its private type C.val in its public API. This is allowed.
> Just be aware that null values will not flow through such API points.
> When you get a C.val value into your own code, you can work on it
> perfectly freely with the type C (which is C.ref).
>
> If a value companion C.val is declared public, the class has
> declared that it is willing to encounter its own default value
> C.default coming from untrusted code. If it is declared private,
> only the class's own nest can work with C.default. If the value
> companion is neither public nor private, the class has declared that
> it is willing to encounter its own default within its own package.
>
> If a class has declared its companion non-atomic, it is willing to
> encounter states arising from data races (across multiple fields) in
> the same places it is willing to encounter its default value.
>
> Summary of restrictions
>
> From the implementation point of view, the salient task is restricting
> clients from illegitimately obtaining non-constructed values of C,
> if the author of C has asked for such restrictions. (Recall that a
> non-constructed value of C is one obtained without using C's
> constructor or other public API.) Here are the generally enforced
> restrictions regarding a privatized type C.val:
>
> You cannot mention the name C.val or C.default in code.
> You cannot create and load bytecodes which would implement such a mention.
> You cannot obtain C.default from a mirror of C or C.val.
> You cannot create a new C.val[] array from a mirror of C or C.val.
> You cannot lengthen an existing C.val[] array to contain uninitialized elements.
> You cannot copy an existing array as a new C.val[] array, if C.val is declared non-atomic.
>
> Even so, let us suppose you are an accident-prone client of C.
> Ignoring the above restrictions, you might go about obtaining a
> non-constructed value of C in several ways, and there is an
> answer from the system in each case that stops you:
>
> You can mention the C.val or C.default directly in code, in various ways.
> After obtaining the mirror C.val.class (by one of several means), you can call Class::defaultValue, MethodHandles::zero, or a similar API point.
> If you can declare a field of type C.val directly you can extract an initial value (or a data-race result, if C.val is non-atomic).
> If you can indirectly create an array of type C.val, you can extract an initial value (or a data-race result, if C.val is non-atomic).
>
> And there are a number of ways you might attempt to indirectly create
> an array of type C.val[]:
>
> Indirectly create it from a mirror using Array::newInstance or Arrays::copyOf or MethodHandles::arrayConstructor or another similar API point.
> Create it from a pre-existing array of the same type using Object::clone or Arrays::copyOf or another similar API point.
> Specify such an array on a serialization wire format and deserialize it.
>
> Using C.val or C.default directly is blocked if C privatizes its
> value companion, unless you are coding a nestmate or package-mate of
> C. These checks are applied both at compile time and when the JVM
> resolves names, so they apply equally to source code and bytecodes
> created by any means whatsoever.
>
> There are no realistic restrictions on obtaining a mirror to a
> companion type C.val. (Accidental and casual direct use of
> C.val.class is prevented by access restrictions on the type name
> C.val. But there are many ways to get around this limitation.)
> Therefore any method or API which could violate the above generally
> enforced restrictions must perform an appropriate dynamic access check
> on behalf of its mirror argument.
>
> Such a dynamic access check can be made negotiable by an appeal to
> caller sensitivity or a Lookup check, so a correctly configured call
> can avoid the restriction. For some simple methods (perhaps
> Arrays::copyOf or MethodHandles::zero) there is no negotiation.
> Depending on the use case, access failure can be worked around via a
> "negotiable" API point like Lookup::arrayConstructor.
More information about the valhalla-spec-observers
mailing list