Proposal: Static/final constructors for bucket-3 primitive classes.
John Rose
john.r.rose at oracle.com
Thu Dec 9 04:30:50 UTC 2021
We have considered, at various points in the last six years or more,
allowing user-defined primitive types to define (under user control)
their own default values. The syntax is unimportant, but the concept is
simple: Surely the user who defines a primitive type can also define
default initializer expressions for each of the fields.
But this would be a trail of tears, which we have chosen to avoid, each
time the suggestion comes up.
This feature is often visualized as a predefined bit pattern, which the
JVM would keep handy, and just stamp down wherever a default initializer
is needed. It’s can’t really be that simple, but even such a bit
pattern is problematic.
First of all is the problem of declaring the bit pattern. Java natively
uses the side effects of `<clinit>` to define constants using ad hoc
bytecodes; it also defines (for some types but not others) a concept of
constant expression. Neither of those fits well into a classfile that
would define a primitive with a default bit pattern.
If the bit pattern is defined using ad hoc bytecode, it must be defined
in a new pseudo-method (not `<clinit>`), to execute not *during* the
initialization of the newly-declared primitive class, but *before*.
(Surely not! a reader might exclaim, but this is the sort of subtlety we
have to deal with.) During initialization of a class C, all fields of
its own type C must be initialized *before* the first bytecode of
`<clinit>` executes, so that the static initializer code has something
to write on. So there must be a “default value definition” phase,
call it `<defaultvalueinit>`, added after linking and before
initialization of C, so C’s `<clinit>` method has something to work
with. This `<defaultvalueinit>` is really the body of a no-argument
constructor of C, or its twin. A no-argument constructor of C is not a
problem, but having it execute before C’s `<clinit>` block is a huge
irregularity, which the JVM spec is not organized to support, at
present.
This would turn into both JVMS and JLS spec. complexity, and more odd
corners (and odd states) in the Java user experience. Sure, a user will
say, “but I promise not to do anything odd; I just want *this field*
to be the value `(int)1`”. Yes, but a spec. must define not only the
expected usages, but all possible usages, with no poorly-defined states.
OK, so if `<defaultvalueinit>` is not the place to define to define this
elusive bit pattern, what about something more declarative, like a
`ConstantValue` attribute? Surely we could put a similarly structured
`DefaultValue` attribute on every non-static field of a value type, and
that would give the JVM enough information to synthesize the required
bit pattern *before* it runs `<clinit>`.
Consider the user model here: A primitive declaration would allow its
fields to have non-zero default values, *but only drawn from the
restricted set of constant expressions*, because those are the ones
which fit in the `ConstantValue` attribute. (They are true bit patterns
in the constant pool, plus `String` constants.) There is no previous
place in Java where we make such a restriction, except `case` labels.
Can you hear the groans of users as we try to explain why only constant
expressions are allowed in that context? That’s the muzak of the
trail of tears I mentioned above.
But we have condy to fix that (someone will surely say). But that’s
problematic, because the resolution of constant pool constants of a
class C requires C to be at least linked, and if the condy expression
makes a self-reference to C itself, that will trigger C’s
initialization, at an awkward moment. Have you ever debugged a tangled
initialization circularity, marked by mysterious NPEs on variables you
*know* you initialized? I have. It’s a stop on the trail of tears I
mentioned.
But if we really worked hard, and added a bunch of stuff to the JVMS and
JLS, and persuaded users not to bother us about the odd restrictions (to
constant expressions, or expressions which “don’t touch the class
itself”), we *could* define some sort of declarative default value
initialization.
What then? Well, ask the JVM engineers how they initialize heap
variables, because those are the affected paths. Those parts of the JVM
are among the most performance-sensitive. Currently, when a new object
or array is created, its whole body (except the header) is sprayed with
a nice even coat of all-zero-bit machine words. This is pretty fast,
and it’s important to keep it fast. What if creating an array
required painting some beautifully crafted arabesque of a bit pattern
defined by a creative user? Well, it’s doable, but much more
complicated. You need to load the bit pattern into live registers and
(if it’s an array of C) keep them live while you paint the whole
array. That’s got to be more expensive than spraying zeroes.
(There’s even hardware that’s good for spraying zeroes, on some
machines.) Basically, if we generously allowed users even a limited set
of pre-defined default primitive values, we would be inviting them to
create mysterious performance problems *for their clients*.
Reflective creation of objects and arrays is also complicated by
non-zero defaults, of course. When you reflectively create a heap node,
today you compute its size, allocate its memory, store some metadata to
its header, and paint the rest zero. That turns into something more
complicated (see above about live registers) and metadata-driven, in the
presence of non-zero defaults.
I haven’t yet mentioned *reference* fields, but those are another can
of worms. The JVM vigorously tracks references. Suppose your primitive
had a String-valued field, and you were allowed to declare a non-null
default value for it, say `"empty"`. If one of your customers creates
an array of these things, suddenly there is a GC card mark (for many
GCs) on *every element of the array*, and that is *before you do
anything useful with it*.
References also support circularity, including indirect cycles from an
instance of C back to C itself. Can you guarantee that the computation
of some tricky reference for your default value of `C.foo` won’t
require linking of C itself, and a vicious circularity? No, you
can’t, and you won’t like the feeling of debugging such a thing
either. Trail of tears, again.
Finally, depending on which of the above flawed tactics is chosen for
representing user-selected default values, there is the possibility that
JVM code can observe a variable V of type C in its pre-initialization
state, because (a) C’s initialization specification is being loaded or
evaluated somehow, and (b) the variable V has been allocated but is
waiting for an initialization bit pattern. (V might be a static of C,
or something in a related dependent class. Also it could be a
multi-threading situation, where V is being observed via a race
condition; those are very hard to keep straight.) During those moments,
if V is loaded, then (voila!) it will have either garbage or those good
old all-zero bits in it. And the abstraction we were laboring to secure
will be subverted. This usually doesn’t happen, but when it’s an
accident it’s a very subtle bug, and when it’s on purpose it turns
into a security escalation.
It’s best to keep the simple default all-zero conventions. They are
robust and understandable and regular. When they are inconvenient,
users will find workarounds.
I hope this helps.
— John
On 5 Dec 2021, at 10:36, Brian Goetz wrote:
> The following was received on valhalla-spec-comments.
>
> Summary: Various syntax options for no-arg constructors of "bucket 3"
> primitives, to enable users to pick a default value other than zero.
>
> Analysis: The suggestion is well-intentioned, but it is built on some
> significant misunderstandings of the problem we are facing.
>
> It assumes that it is sensible to allow a non-zero default value of a
> primitive to be specified by the class declaration. While it is
> entirely understandable why one would want this, the problem is not
> that there isn't a good syntax for it (there obviously is), nor that
> running the constructor multiple times is the problem --
> it is deeper than that. Numerous safety properties derive from the
> fact that newly allocated objects and arrays are bulk-initialized to
> zero; compromising this seems likely to lead to exploits.
More information about the valhalla-spec-observers
mailing list