Proposal: Static/final constructors for bucket-3 primitive classes.
Remi Forax
forax at univ-mlv.fr
Thu Dec 9 07:12:17 UTC 2021
> From: "John Rose" <john.r.rose at oracle.com>
> To: "Brian Goetz" <brian.goetz at oracle.com>
> Cc: "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>, "clement
> cherlin" <clement.cherlin at gmail.com>
> Sent: Thursday, December 9, 2021 5:30:50 AM
> Subject: Re: Proposal: Static/final constructors for bucket-3 primitive classes.
> We have considered, at various points in the last six years or more, allowing
> user-defined primitive types to define (under user control) their own default
> values. The syntax is unimportant, but the concept is simple: Surely the user
> who defines a primitive type can also define default initializer expressions
> for each of the fields.
> But this would be a trail of tears, which we have chosen to avoid, each time the
> suggestion comes up.
> This feature is often visualized as a predefined bit pattern, which the JVM
> would keep handy, and just stamp down wherever a default initializer is needed.
> It’s can’t really be that simple, but even such a bit pattern is problematic.
> First of all is the problem of declaring the bit pattern. Java natively uses the
> side effects of <clinit> to define constants using ad hoc bytecodes; it also
> defines (for some types but not others) a concept of constant expression.
> Neither of those fits well into a classfile that would define a primitive with
> a default bit pattern.
> If the bit pattern is defined using ad hoc bytecode, it must be defined in a new
> pseudo-method (not <clinit> ), to execute not during the initialization of the
> newly-declared primitive class, but before . (Surely not! a reader might
> exclaim, but this is the sort of subtlety we have to deal with.) During
> initialization of a class C, all fields of its own type C must be initialized
> before the first bytecode of <clinit> executes, so that the static initializer
> code has something to write on. So there must be a “default value definition”
> phase, call it <defaultvalueinit> , added after linking and before
> initialization of C, so C’s <clinit> method has something to work with. This
> <defaultvalueinit> is really the body of a no-argument constructor of C, or its
> twin. A no-argument constructor of C is not a problem, but having it execute
> before C’s <clinit> block is a huge irregularity, which the JVM spec is not
> organized to support, at present.
> This would turn into both JVMS and JLS spec. complexity, and more odd corners
> (and odd states) in the Java user experience. Sure, a user will say, “but I
> promise not to do anything odd; I just want this field to be the value (int)1
> ”. Yes, but a spec. must define not only the expected usages, but all possible
> usages, with no poorly-defined states.
> OK, so if <defaultvalueinit> is not the place to define to define this elusive
> bit pattern, what about something more declarative, like a ConstantValue
> attribute? Surely we could put a similarly structured DefaultValue attribute on
> every non-static field of a value type, and that would give the JVM enough
> information to synthesize the required bit pattern before it runs <clinit> .
> Consider the user model here: A primitive declaration would allow its fields to
> have non-zero default values, but only drawn from the restricted set of
> constant expressions , because those are the ones which fit in the
> ConstantValue attribute. (They are true bit patterns in the constant pool, plus
> String constants.) There is no previous place in Java where we make such a
> restriction, except case labels. Can you hear the groans of users as we try to
> explain why only constant expressions are allowed in that context? That’s the
> muzak of the trail of tears I mentioned above.
> But we have condy to fix that (someone will surely say).
you read my mind :)
> But that’s problematic, because the resolution of constant pool constants of a
> class C requires C to be at least linked, and if the condy expression makes a
> self-reference to C itself, that will trigger C’s initialization, at an awkward
> moment. Have you ever debugged a tangled initialization circularity, marked by
> mysterious NPEs on variables you know you initialized? I have. It’s a stop on
> the trail of tears I mentioned.
> But if we really worked hard, and added a bunch of stuff to the JVMS and JLS,
> and persuaded users not to bother us about the odd restrictions (to constant
> expressions, or expressions which “don’t touch the class itself”), we could
> define some sort of declarative default value initialization.
> What then? Well, ask the JVM engineers how they initialize heap variables,
> because those are the affected paths. Those parts of the JVM are among the most
> performance-sensitive. Currently, when a new object or array is created, its
> whole body (except the header) is sprayed with a nice even coat of all-zero-bit
> machine words. This is pretty fast, and it’s important to keep it fast. What if
> creating an array required painting some beautifully crafted arabesque of a bit
> pattern defined by a creative user? Well, it’s doable, but much more
> complicated. You need to load the bit pattern into live registers and (if it’s
> an array of C) keep them live while you paint the whole array. That’s got to be
> more expensive than spraying zeroes. (There’s even hardware that’s good for
> spraying zeroes, on some machines.) Basically, if we generously allowed users
> even a limited set of pre-defined default primitive values, we would be
> inviting them to create mysterious performance problems for their clients .
> Reflective creation of objects and arrays is also complicated by non-zero
> defaults, of course. When you reflectively create a heap node, today you
> compute its size, allocate its memory, store some metadata to its header, and
> paint the rest zero. That turns into something more complicated (see above
> about live registers) and metadata-driven, in the presence of non-zero
> defaults.
> I haven’t yet mentioned reference fields, but those are another can of worms.
> The JVM vigorously tracks references. Suppose your primitive had a
> String-valued field, and you were allowed to declare a non-null default value
> for it, say "empty" . If one of your customers creates an array of these
> things, suddenly there is a GC card mark (for many GCs) on every element of the
> array , and that is before you do anything useful with it .
> References also support circularity, including indirect cycles from an instance
> of C back to C itself. Can you guarantee that the computation of some tricky
> reference for your default value of C.foo won’t require linking of C itself,
> and a vicious circularity? No, you can’t, and you won’t like the feeling of
> debugging such a thing either. Trail of tears, again.
> Finally, depending on which of the above flawed tactics is chosen for
> representing user-selected default values, there is the possibility that JVM
> code can observe a variable V of type C in its pre-initialization state,
> because (a) C’s initialization specification is being loaded or evaluated
> somehow, and (b) the variable V has been allocated but is waiting for an
> initialization bit pattern. (V might be a static of C, or something in a
> related dependent class. Also it could be a multi-threading situation, where V
> is being observed via a race condition; those are very hard to keep straight.)
> During those moments, if V is loaded, then (voila!) it will have either garbage
> or those good old all-zero bits in it. And the abstraction we were laboring to
> secure will be subverted. This usually doesn’t happen, but when it’s an
> accident it’s a very subtle bug, and when it’s on purpose it turns into a
> security escalation.
> It’s best to keep the simple default all-zero conventions. They are robust and
> understandable and regular. When they are inconvenient, users will find
> workarounds.
> I hope this helps.
I fully agree, i think it's better to do the opposite and force the fact that all primitive value classes (Bucket 3) must have a default constructor and that constructor have a fixed bytecode instructions.
If a user does not provide a constructor without parameter, the compiler will provide one and the verifier will check that this constructor exist.
If a user want to provide that constructor to be able to add javadoc on it, it should have only one instruction which is to call default() with no parameter,
something like
public primitive value class Complex {
public Complex() {
default();
}
}
>From the VM POV, it's an initfactory with a defaultvalue (or whatever the name of that bytecode) + areturn,
so this can be easily check by the VM.
The idea of forcing to have such constructor is to help users to think that whatever they do, people will still be able to create an empty B3.
> — John
Rémi
> On 5 Dec 2021, at 10:36, Brian Goetz wrote:
>> The following was received on valhalla-spec-comments.
>> Summary: Various syntax options for no-arg constructors of "bucket 3"
>> primitives, to enable users to pick a default value other than zero.
>> Analysis: The suggestion is well-intentioned, but it is built on some
>> significant misunderstandings of the problem we are facing.
>> It assumes that it is sensible to allow a non-zero default value of a primitive
>> to be specified by the class declaration. While it is entirely understandable
>> why one would want this, the problem is not that there isn't a good syntax for
>> it (there obviously is), nor that running the constructor multiple times is the
>> problem --
>> it is deeper than that. Numerous safety properties derive from the fact that
>> newly allocated objects and arrays are bulk-initialized to zero; compromising
>> this seems likely to lead to exploits.
More information about the valhalla-spec-observers
mailing list