Proposal: Static/final constructors for bucket-3 primitive classes.

John Rose john.r.rose at oracle.com
Thu Dec 9 04:30:50 UTC 2021


We have considered, at various points in the last six years or more, 
allowing user-defined primitive types to define (under user control) 
their own default values.  The syntax is unimportant, but the concept is 
simple:  Surely the user who defines a primitive type can also define 
default initializer expressions for each of the fields.

But this would be a trail of tears, which we have chosen to avoid, each 
time the suggestion comes up.

This feature is often visualized as a predefined bit pattern, which the 
JVM would keep handy, and just stamp down wherever a default initializer 
is needed.  It’s can’t really be that simple, but even such a bit 
pattern is problematic.

First of all is the problem of declaring the bit pattern.  Java natively 
uses the side effects of `<clinit>` to define constants using ad hoc 
bytecodes; it also defines (for some types but not others) a concept of 
constant expression.  Neither of those fits well into a classfile that 
would define a primitive with a default bit pattern.

If the bit pattern is defined using ad hoc bytecode, it must be defined 
in a new pseudo-method (not `<clinit>`), to execute not *during* the 
initialization of the newly-declared primitive class, but *before*.  
(Surely not! a reader might exclaim, but this is the sort of subtlety we 
have to deal with.)  During initialization of a class C, all fields of 
its own type C must be initialized *before* the first bytecode of 
`<clinit>` executes, so that the static initializer code has something 
to write on.  So there must be a “default value definition” phase, 
call it `<defaultvalueinit>`, added after linking and before 
initialization of C, so C’s `<clinit>` method has something to work 
with.  This `<defaultvalueinit>` is really the body of a no-argument 
constructor of C, or its twin.  A no-argument constructor of C is not a 
problem, but having it execute before C’s `<clinit>` block is a huge 
irregularity, which the JVM spec is not organized to support, at 
present.

This would turn into both JVMS and JLS spec. complexity, and more odd 
corners (and odd states) in the Java user experience.  Sure, a user will 
say, “but I promise not to do anything odd; I just want *this field* 
to be the value `(int)1`”.  Yes, but a spec. must define not only the 
expected usages, but all possible usages, with no poorly-defined states.

OK, so if `<defaultvalueinit>` is not the place to define to define this 
elusive bit pattern, what about something more declarative, like a 
`ConstantValue` attribute?  Surely we could put a similarly structured 
`DefaultValue` attribute on every non-static field of a value type, and 
that would give the JVM enough information to synthesize the required 
bit pattern *before* it runs `<clinit>`.

Consider the user model here:  A primitive declaration would allow its 
fields to have non-zero default values, *but only drawn from the 
restricted set of constant expressions*, because those are the ones 
which fit in the `ConstantValue` attribute.  (They are true bit patterns 
in the constant pool, plus `String` constants.)  There is no previous 
place in Java where we make such a restriction, except `case` labels.  
Can you hear the groans of users as we try to explain why only constant 
expressions are allowed in that context?  That’s the muzak of the 
trail of tears I mentioned above.

But we have condy to fix that (someone will surely say).  But that’s 
problematic, because the resolution of constant pool constants of a 
class C requires C to be at least linked, and if the condy expression 
makes a self-reference to C itself, that will trigger C’s 
initialization, at an awkward moment.  Have you ever debugged a tangled 
initialization circularity, marked by mysterious NPEs on variables you 
*know* you initialized?  I have.  It’s a stop on the trail of tears I 
mentioned.

But if we really worked hard, and added a bunch of stuff to the JVMS and 
JLS, and persuaded users not to bother us about the odd restrictions (to 
constant expressions, or expressions which “don’t touch the class 
itself”), we *could* define some sort of declarative default value 
initialization.

What then?  Well, ask the JVM engineers how they initialize heap 
variables, because those are the affected paths.  Those parts of the JVM 
are among the most performance-sensitive.  Currently, when a new object 
or array is created, its whole body (except the header) is sprayed with 
a nice even coat of all-zero-bit machine words.  This is pretty fast, 
and it’s important to keep it fast.  What if creating an array 
required painting some beautifully crafted arabesque of a bit pattern 
defined by a creative user?  Well, it’s doable, but much more 
complicated.  You need to load the bit pattern into live registers and 
(if it’s an array of C) keep them live while you paint the whole 
array.  That’s got to be more expensive than spraying zeroes.  
(There’s even hardware that’s good for spraying zeroes, on some 
machines.)  Basically, if we generously allowed users even a limited set 
of pre-defined default primitive values, we would be inviting them to 
create mysterious performance problems *for their clients*.

Reflective creation of objects and arrays is also complicated by 
non-zero defaults, of course.  When you reflectively create a heap node, 
today you compute its size, allocate its memory, store some metadata to 
its header, and paint the rest zero.  That turns into something more 
complicated (see above about live registers) and metadata-driven, in the 
presence of non-zero defaults.

I haven’t yet mentioned *reference* fields, but those are another can 
of worms.  The JVM vigorously tracks references.  Suppose your primitive 
had a String-valued field, and you were allowed to declare a non-null 
default value for it, say `"empty"`.  If one of your customers creates 
an array of these things, suddenly there is a GC card mark (for many 
GCs) on *every element of the array*, and that is *before you do 
anything useful with it*.

References also support circularity, including indirect cycles from an 
instance of C back to C itself.  Can you guarantee that the computation 
of some tricky reference for your default value of `C.foo` won’t 
require linking of C itself, and a vicious circularity?  No, you 
can’t, and you won’t like the feeling of debugging such a thing 
either.  Trail of tears, again.

Finally, depending on which of the above flawed tactics is chosen for 
representing user-selected default values, there is the possibility that 
JVM code can observe a variable V of type C in its pre-initialization 
state, because (a) C’s initialization specification is being loaded or 
evaluated somehow, and (b) the variable V has been allocated but is 
waiting for an initialization bit pattern.  (V might be a static of C, 
or something in a related dependent class.  Also it could be a 
multi-threading situation, where V is being observed via a race 
condition; those are very hard to keep straight.)  During those moments, 
if V is loaded, then (voila!) it will have either garbage or those good 
old all-zero bits in it.  And the abstraction we were laboring to secure 
will be subverted.  This usually doesn’t happen, but when it’s an 
accident it’s a very subtle bug, and when it’s on purpose it turns 
into a security escalation.

It’s best to keep the simple default all-zero conventions.  They are 
robust and understandable and regular.  When they are inconvenient, 
users will find workarounds.

I hope this helps.

— John

On 5 Dec 2021, at 10:36, Brian Goetz wrote:

> The following was received on valhalla-spec-comments.
>
> Summary: Various syntax options for no-arg constructors of "bucket 3" 
> primitives, to enable users to pick a default value other than zero.
>
> Analysis: The suggestion is well-intentioned, but it is built on some 
> significant misunderstandings of the problem we are facing.
>
> It assumes that it is sensible to allow a non-zero default value of a 
> primitive to be specified by the class declaration.  While it is 
> entirely understandable why one would want this, the problem is not 
> that there isn't a good syntax for it (there obviously is), nor that 
> running the constructor multiple times is the problem --
> it is deeper than that.  Numerous safety properties derive from the 
> fact that newly allocated objects and arrays are bulk-initialized to 
> zero; compromising this seems likely to lead to exploits.


More information about the valhalla-spec-observers mailing list