access control for withfield bytecode, compared to putfield

Thu Apr 9 20:14:54 UTC 2020

This makes a lot of sense to me, John. withfield seems like a good
primitive operation to "edit" inline objects, and is the kind of thing
other JVM languages will surely want. It seems like a good idea to make it
more permissive than putfield-on-a-final, even if the Java language doesn't
(yet?) have a use for that greater flexibility.

On Wed, Apr 8, 2020 at 10:30 PM John Rose <john.r.rose at oracle.com> wrote:

> In the Java language fields can be final or not, and independently
> can be access controlled at one of four levels of access:  public,
> protected, package, and private.
>
> Final fields cannot be written to except under very narrow
> circumstances:  (a) In an initialization block (static initializer
> or constructor body), and (b) only if the static compiler can
> prove there has been no previous write (based on the rules
> of the language).
>
> We are adding inline classes, whose non-static fields are always
> final.  (There are possible meanings for non-final fields of inline
> classes, but nothing I’m saying today interacts or interferes
> with any known such meanings.)  Behaviorally, an inline class
> behaves like a class with all-final non-static fields, *and* it has
> its identity radically suppressed by the JVM.  In the language,
> a constructor for an inline class is approximately indistinguishable
> from a constructor for a regular class with all-final non-static fields.
> In particular, a constructor of any class (inline or regular identity)
> is empowered, by rules of the the language, to set each of its (final,
> non-static) fields exactly once along any path through the constructor.
>
> All of this hangs together nicely. When we translate to the JVM,
> the reading of any non-static field always uses the getfield instruction,
> and the access checks built into the JVM enforce the language access
> rules for that field—and this is true equally for inline and identity
> classes (the JVM doesn’t care).  However, we have to use distinct
> tactics for translating assignments to fields.  The existing putfield
> instruction has no possible applicability to inline classes, because
> it assumes you can pass it an instance pointer, execute it, and the
> *same instance pointer* will refer to the updated instance.  This
> cannot possibly work with inline classes (unless we add a whole
> new layer of “larval” states to inline classes—which would not be
> thrifty design).
>
> Instead, setting the field of an inline class needs a new bytecode ,
> a new sibling of getfield and putfield, which we call withfield.
> Its output is a new instance of the same inline class whose
> field values are all identical to those in the old instance, except
> for the one field referred to by the withfield instruction.  Thus:
>
> * getfield consumes a reference and returns a value (I) → (F)
> * putfield consumes both and returns a side effect (I F) & state → () &
> state′
> * withfield  consumes same as putfield and produces a new instance (I F) →
> (I′)
>
> The access checking rules are fairly uniform for all of these
> instructions.  If the field F of C has protection level P, unless a client
> has access to level P of C, then it cannot execute (cannot even resolve)
> the instruction that tries to access F.  In the case of putfield or
> withfield, if F is final (and for withfield that is currently always
> the case, though that could change), then an additional check
> is made, to ensure that F is only being set in a legitimate context.
> More in a moment on what “legitimate” means for this “context”.
> The getfield instruction only has to pass the access check, and then
> the client has full access to read the value of the field.  This works
> pleasingly like the source-level expression which fetches the field
> value.
>
> Currently, for a non-static final field, both “putfield” and “withfield”
> are generated only inside of constructors, which have rigid rules,
> in the source language, that ensure nothing too fishy can happen.
>
> For an identity class C, it would be extremely fishy if the classfile of
> C were able to execute putfield instructions outside of one of C’s
> constructors.  The reason for this is that a constructor of C would
> be able to produce a supposedly all-final instance of C, but then
> some other method of C would be (in principle) be able to overwrite
> one of C’s supposedly final fields with some other value, by executing
> a putfield instruction in that other method.  Now, the JVM doesn’t
> fully trust final fields even today (because they change state at most
> once from default to some other value), but if maliciously spun
> classfiles were able to perform “putfield” at will on fully constructed
> objects, it might be possible to create paradoxes that could lead
> to unpredictable behavior.  For this reason, not only doesn’t the
> JVM fully trust final fields, but it also forbids classes from executing
> putfield on their own final fields, except inside of constructors.
> In essence, putfield on a final field is a special restricted operating
> mode of putfield which has unusually tight restrictions on its
> execution.  In this note I’d like to call it out with a special name,
> putfield-on-a-final.
>
> Note that the JVM does *not* fully enforce the Java source language
> rules for field initialization:  At the JVM level, a constructor can
> run putfield-on-a-final, on some given field, zero, one, or many
> times, where the Java language requires at most one, and exactly
> one on normal exits.  The JVM simply provides a reasonable backstop
> check, preventing certain failure modes due either to javac bugs
> or (what’s more sinister) intentionally broken class files.
>
> The main responsibility for ensuring the integrity of some class
> C is, and always will be, C’s compilation unit C.java, as faithfully
> compiled by javac into a nest of classes containing at least C.class
> maybe other nestmates.
>
> This is an important point to back up and take notice of:  While
> the JVM can perform some basic checks to help some class C maintain
> its encapsulation boundary, the responsibility for the meaning
> of the encapsulation, and the restrictions and/or freedoms within
> that boundary, are the sole responsibility of the programmer of
> C.java.  If I, the author of C, am claiming that, of two fields, one
> is always non-null, then it is up to me to enforce those rules in all
> states of my class, including constructors (start states) and any methods
> which can create new states (whether constructors or regular methods).
>
> A working hypothesis on our project so far has been that withfield
> is so much like putfield, and inline instance fields are so much like
> final identity instance fields, that parallel restrictions are appropriate
> for the two instructions.  Penciling this out, we would get to a place
> where a class C can only issue putfield or withfield instructions inside
> its own constructors.  This is a consistent view, but I do not believe
> that it is the best view, and I’d like to decouple withfield from
> putfield-on-a-final to be more like plain old putfield, in some ways.
>
> My aim here is to keep withfield alive as a tool for likely future
> translation strategies (including of non-Java languages), which
> exposes, not the current envisioned uses of withfield in Java
> constructors, but its natural set of capabilities in the JVM.
>
> What is the natural set of capabilities of withfield?  It is more
> basic and fundamental than putfield-on-a-final, and at the
> same time does *more* than putfield-on-a-final.  Note that
> putfield-on-a-final is just one operation out of a suite of
> required operations in a constructor of a class (since you
> need a putfield-on-a-final for each of the class’s final fields,
> according to Java rules).  Note on the other hand that
> withfield has the same effect as running a constructor
> which copies out all the old fields from the old instance
> and writes the new value into the selected field, then
> returns the new instance.  Seen from this point of view,
> withfield is both simpler and more powerful than
> putfield-on-a-final, and does not fit at all into an easy
> analogy.
>
> The withfield instruction is also inherently more secure
> than putfield-on-a-final, because its design does not allow
> it to invalidate any pre-existing instance; it can only ever
> create a new instance.  The set of security failure modes
> for withfield is completely different from putfield-on-a-final.
> This means that there is no particular reason to restrict
> withfield to execute only in constructors.
>
> What about creating an *invalid* new instance?  Well, that’s
> where the JVM says, “it’s not my responsibility”. As noted
> above, the sole responsibility for defining and enforcing the
> invariants of an encapsulation is the human author of the
> original source file.  The JVM protects this encapsulation,
> not by reading the user’s mind, but by enforcing boundaries,
> primarily the boundary around the nest of classes that result
> from the compilation of C.java.  Within the nest, any type
> can access any private member of any nestmate.  Outside
> the nest, private members are strictly inaccessible.
> (This strict rule can be bent by special reflection modes,
> and by nestmate injection, but it can’t be broken.)
>
> Under this theory, the withfield instruction is the elemental
> factory mechanism for creating new inline classes.  The coder
> of the source file defining the field has full control to create
> new instances with arbitrary field settings.  In the current
> language, this still goes only through user-written constructors,
> but that could change.  In any case, the JVM design needs to
> support the language and *also* natural abilities of the JVM.
>
> This leads me to what I think is the right design for
> withfield.  The permission to execute withfield should
> be derived, not from its placement within a constructor,
> but rather from its placement in a nest.  In effect, when
> you execute withfield, you should get access checked as if
> the field you were referring to is private, even if it has
> some other marking (public, protected, package).  That
> other marking is good and useful, but it pertains only
> to getfield.
>
> This doesn’t call for any change to today’s translation
> strategies, but it unlocks the JVM’s natural abilities
> for future strategies and features.
>
> Why make the change?  After all, restricting withfield
> like putfield-on-a-final doesn’t hurt anything today.
> Suppose some language feature in the future requires ad
> hoc field replacement. (I call one version of such a feature
> “reconstructors”, and another “with-expressions”.)
> In that case, javac can contrive synthetic constructors
> which isolate all required withfield instructions, so
> that the putfield-on-a-final constraints can be satisfied.
> But there’s a cost to this:  Those synthetic constructors
> become extra noise in the classfile, and if they are opened
> outside the nest, they can be security hazards.  Another
> cost is the loss of dynamicity:  You can’t inject a hidden
> class to work on your inline class if the hidden class can
> only define its own constructors, right?
>
> But I think we have learned some lessons about fancy
> compile-time adapters:  They are complex, they obscure
> the code for the JIT, they can open up surprise encapsulation
> flaws, they cannot be assigned dynamically.  The nestmate
> work improves all of these problems, by uniformly defining
> private access to apply equally to all members of a nest,
> not just to a single class.  Although the nestmate access
> rules themselves are more complex than the original JVM
> rules for private access, the overall system is better because
> we can rip out the various synthetic bridges we used to
> require.  The overall model for “what does private mean?”
> is simpler, not more complex: “private means all nestmates
> are equal”.  On balance this helps security by simplifying
> the model, so that bridge methods can be dropped.
>
> I want to keep the model simple, and not introduce (today)
> a new kind of access control just for the withfield instruction,
> nor do I want it to mimic the baroque and complex access
> control for putfield-on-a-final.
>
> To summarize:  The simplest rule for access checking a
> withfield instruction is to say, “pretend the field was
> declared private, and perform access checks”.  That’s
> it; the rest follows from the rules we have already laid
> down.
>
> Thus, the security analysis of a class can concentrate
> on the access declarations of its fields.  There will be
> no pressure to generate adapter methods regardless
> of where the language goes.  Other languages can
> use the natural semantics of “withfield” to create
> and enforce their own notions of encapsulation.
> And future versions of Java can use indy, condy,
> hidden classes, and whatever else to create flexible
> methods, on the fly, that work with inline classes.
>
> There are two anchors to my argument here.
> One is that the access control of putfield-on-a-final
> is a bad model to replicate for a new instruction.
> The other is that we shouldn’t limit ourselves to
> the current uses of withfield (as a surrogate for
> putfield-on-a-final).  Let’s design for the future,
> or at least for the natural capabilities of the JVM,
> not for the exact output of today’s translation
> strategies.
>
> — John