access control for withfield bytecode, compared to putfield

Thu Apr 9 05:29:04 UTC 2020

In the Java language fields can be final or not, and independently
can be access controlled at one of four levels of access:  public,
protected, package, and private.

Final fields cannot be written to except under very narrow
circumstances:  (a) In an initialization block (static initializer
or constructor body), and (b) only if the static compiler can
prove there has been no previous write (based on the rules
of the language).

We are adding inline classes, whose non-static fields are always
final.  (There are possible meanings for non-final fields of inline
classes, but nothing I’m saying today interacts or interferes
with any known such meanings.)  Behaviorally, an inline class
behaves like a class with all-final non-static fields, *and* it has
its identity radically suppressed by the JVM.  In the language,
a constructor for an inline class is approximately indistinguishable
from a constructor for a regular class with all-final non-static fields.
In particular, a constructor of any class (inline or regular identity)
is empowered, by rules of the the language, to set each of its (final,
non-static) fields exactly once along any path through the constructor.

All of this hangs together nicely. When we translate to the JVM,
the reading of any non-static field always uses the getfield instruction,
and the access checks built into the JVM enforce the language access
rules for that field—and this is true equally for inline and identity
classes (the JVM doesn’t care).  However, we have to use distinct
tactics for translating assignments to fields.  The existing putfield
instruction has no possible applicability to inline classes, because
it assumes you can pass it an instance pointer, execute it, and the
*same instance pointer* will refer to the updated instance.  This
cannot possibly work with inline classes (unless we add a whole
new layer of “larval” states to inline classes—which would not be
thrifty design).

Instead, setting the field of an inline class needs a new bytecode ,
a new sibling of getfield and putfield, which we call withfield.
Its output is a new instance of the same inline class whose
field values are all identical to those in the old instance, except
for the one field referred to by the withfield instruction.  Thus:

* getfield consumes a reference and returns a value (I) → (F)
* putfield consumes both and returns a side effect (I F) & state → () & state′
* withfield  consumes same as putfield and produces a new instance (I F) → (I′)

The access checking rules are fairly uniform for all of these
instructions.  If the field F of C has protection level P, unless a client
has access to level P of C, then it cannot execute (cannot even resolve)
the instruction that tries to access F.  In the case of putfield or
withfield, if F is final (and for withfield that is currently always
the case, though that could change), then an additional check
is made, to ensure that F is only being set in a legitimate context.
More in a moment on what “legitimate” means for this “context”.
The getfield instruction only has to pass the access check, and then
the client has full access to read the value of the field.  This works
pleasingly like the source-level expression which fetches the field
value.

Currently, for a non-static final field, both “putfield” and “withfield”
are generated only inside of constructors, which have rigid rules,
in the source language, that ensure nothing too fishy can happen.

For an identity class C, it would be extremely fishy if the classfile of
C were able to execute putfield instructions outside of one of C’s
constructors.  The reason for this is that a constructor of C would
be able to produce a supposedly all-final instance of C, but then
some other method of C would be (in principle) be able to overwrite
one of C’s supposedly final fields with some other value, by executing
a putfield instruction in that other method.  Now, the JVM doesn’t
fully trust final fields even today (because they change state at most
once from default to some other value), but if maliciously spun
classfiles were able to perform “putfield” at will on fully constructed
objects, it might be possible to create paradoxes that could lead
to unpredictable behavior.  For this reason, not only doesn’t the
JVM fully trust final fields, but it also forbids classes from executing
putfield on their own final fields, except inside of constructors.
In essence, putfield on a final field is a special restricted operating
mode of putfield which has unusually tight restrictions on its
execution.  In this note I’d like to call it out with a special name,
putfield-on-a-final.

Note that the JVM does *not* fully enforce the Java source language
rules for field initialization:  At the JVM level, a constructor can
run putfield-on-a-final, on some given field, zero, one, or many
times, where the Java language requires at most one, and exactly
one on normal exits.  The JVM simply provides a reasonable backstop
check, preventing certain failure modes due either to javac bugs
or (what’s more sinister) intentionally broken class files.

The main responsibility for ensuring the integrity of some class
C is, and always will be, C’s compilation unit C.java, as faithfully
compiled by javac into a nest of classes containing at least C.class
maybe other nestmates.

This is an important point to back up and take notice of:  While
the JVM can perform some basic checks to help some class C maintain
its encapsulation boundary, the responsibility for the meaning
of the encapsulation, and the restrictions and/or freedoms within
that boundary, are the sole responsibility of the programmer of
C.java.  If I, the author of C, am claiming that, of two fields, one
is always non-null, then it is up to me to enforce those rules in all
states of my class, including constructors (start states) and any methods
which can create new states (whether constructors or regular methods).

A working hypothesis on our project so far has been that withfield
is so much like putfield, and inline instance fields are so much like
final identity instance fields, that parallel restrictions are appropriate
for the two instructions.  Penciling this out, we would get to a place
where a class C can only issue putfield or withfield instructions inside
its own constructors.  This is a consistent view, but I do not believe
that it is the best view, and I’d like to decouple withfield from
putfield-on-a-final to be more like plain old putfield, in some ways.

My aim here is to keep withfield alive as a tool for likely future
translation strategies (including of non-Java languages), which
exposes, not the current envisioned uses of withfield in Java
constructors, but its natural set of capabilities in the JVM.

What is the natural set of capabilities of withfield?  It is more
basic and fundamental than putfield-on-a-final, and at the
same time does *more* than putfield-on-a-final.  Note that
putfield-on-a-final is just one operation out of a suite of
required operations in a constructor of a class (since you
need a putfield-on-a-final for each of the class’s final fields,
according to Java rules).  Note on the other hand that
withfield has the same effect as running a constructor
which copies out all the old fields from the old instance
and writes the new value into the selected field, then
returns the new instance.  Seen from this point of view,
withfield is both simpler and more powerful than
putfield-on-a-final, and does not fit at all into an easy
analogy.

The withfield instruction is also inherently more secure
than putfield-on-a-final, because its design does not allow
it to invalidate any pre-existing instance; it can only ever
create a new instance.  The set of security failure modes
for withfield is completely different from putfield-on-a-final.
This means that there is no particular reason to restrict
withfield to execute only in constructors.

What about creating an *invalid* new instance?  Well, that’s
where the JVM says, “it’s not my responsibility”. As noted
above, the sole responsibility for defining and enforcing the
invariants of an encapsulation is the human author of the
original source file.  The JVM protects this encapsulation,
not by reading the user’s mind, but by enforcing boundaries,
primarily the boundary around the nest of classes that result
from the compilation of C.java.  Within the nest, any type
can access any private member of any nestmate.  Outside
the nest, private members are strictly inaccessible.
(This strict rule can be bent by special reflection modes,
and by nestmate injection, but it can’t be broken.)

Under this theory, the withfield instruction is the elemental
factory mechanism for creating new inline classes.  The coder
of the source file defining the field has full control to create
new instances with arbitrary field settings.  In the current
language, this still goes only through user-written constructors,
but that could change.  In any case, the JVM design needs to
support the language and *also* natural abilities of the JVM.

This leads me to what I think is the right design for
withfield.  The permission to execute withfield should
be derived, not from its placement within a constructor,
but rather from its placement in a nest.  In effect, when
you execute withfield, you should get access checked as if
the field you were referring to is private, even if it has
some other marking (public, protected, package).  That
other marking is good and useful, but it pertains only
to getfield.

This doesn’t call for any change to today’s translation
strategies, but it unlocks the JVM’s natural abilities
for future strategies and features.

Why make the change?  After all, restricting withfield
like putfield-on-a-final doesn’t hurt anything today.
Suppose some language feature in the future requires ad
hoc field replacement. (I call one version of such a feature
“reconstructors”, and another “with-expressions”.)
In that case, javac can contrive synthetic constructors
which isolate all required withfield instructions, so
that the putfield-on-a-final constraints can be satisfied.
But there’s a cost to this:  Those synthetic constructors
become extra noise in the classfile, and if they are opened
outside the nest, they can be security hazards.  Another
cost is the loss of dynamicity:  You can’t inject a hidden
class to work on your inline class if the hidden class can
only define its own constructors, right?

But I think we have learned some lessons about fancy
compile-time adapters:  They are complex, they obscure
the code for the JIT, they can open up surprise encapsulation
flaws, they cannot be assigned dynamically.  The nestmate
work improves all of these problems, by uniformly defining
private access to apply equally to all members of a nest,
not just to a single class.  Although the nestmate access
rules themselves are more complex than the original JVM
rules for private access, the overall system is better because
we can rip out the various synthetic bridges we used to
require.  The overall model for “what does private mean?”
is simpler, not more complex: “private means all nestmates
are equal”.  On balance this helps security by simplifying
the model, so that bridge methods can be dropped.

I want to keep the model simple, and not introduce (today)
a new kind of access control just for the withfield instruction,
nor do I want it to mimic the baroque and complex access
control for putfield-on-a-final.

To summarize:  The simplest rule for access checking a
withfield instruction is to say, “pretend the field was
declared private, and perform access checks”.  That’s
it; the rest follows from the rules we have already laid
down.

Thus, the security analysis of a class can concentrate
on the access declarations of its fields.  There will be
no pressure to generate adapter methods regardless
of where the language goes.  Other languages can
use the natural semantics of “withfield” to create
and enforce their own notions of encapsulation.
And future versions of Java can use indy, condy,
hidden classes, and whatever else to create flexible
methods, on the fly, that work with inline classes.

There are two anchors to my argument here.
One is that the access control of putfield-on-a-final
is a bad model to replicate for a new instruction.
The other is that we shouldn’t limit ourselves to
the current uses of withfield (as a surrogate for
putfield-on-a-final).  Let’s design for the future,
or at least for the natural capabilities of the JVM,
not for the exact output of today’s translation
strategies.

— John