Proposal: Static/final constructors for bucket-3 primitive classes.

Clement Cherlin ccherlin at gmail.com
Sun Dec 5 23:09:20 UTC 2021


On Sun, Dec 5, 2021 at 12:36 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>
> The following was received on valhalla-spec-comments.
>
> Summary: Various syntax options for no-arg constructors of "bucket 3"
> primitives, to enable users to pick a default value other than zero.
>
> Analysis: The suggestion is well-intentioned, but it is built on some
> significant misunderstandings of the problem we are facing.
>
> It assumes that it is sensible to allow a non-zero default value of a
> primitive to be specified by the class declaration.  While it is
> entirely understandable why one would want this, the problem is not that
> there isn't a good syntax for it (there obviously is), nor that running
> the constructor multiple times is the problem -- it is deeper than
> that.  Numerous safety properties derive from the fact that newly
> allocated objects and arrays are bulk-initialized to zero; compromising
> this seems likely to lead to exploits.

Thank you for your feedback. However, far from leading to new exploits,
my suggestion is aimed at fixing the flaws inherent in the current
design that make it extremely, unnecessarily difficult to use correctly
as a primitive class author.

It makes the assumption that the all zeroes value can and should be the
default value for every single primitive class. Initializing to zero is
simple, unambiguous and efficient. It is perfectly reasonable to have
all-zeroes as the "default default", so to speak. However, it is
completely unacceptable to make the "default default" the one and only
default, because it creates a value that was never constructed.
Numerous safety properties of existing classes also derive from the fact
that every instance was initialized by a constructor; compromising this
will inevitably lead to the same kinds of exploits that serialization did.

Consider a very slowly-growing, but not constant set of values which
ought to be expandable at runtime, such as, say, media type codes
for a transcoding server that supports dynamic plugins. It's not
constant, so it can't be an enum. We must validate any new instance
against a canonical list of permitted values before allowing it to be
constructed, lest invalid (possibly malicious) values sneak into the
system.

public primitive record MediaCode(byte b1, byte b2, byte b3, byte b4) {
    public MediaCode {
        if (!isValidMediaCode(b1, b2, b3, b4))
            throw new IllegalArgumentException();
    }
}

An invalid MediaCode of 0,0,0,0 is now trivially constructable, perhaps
accidentally, using

MediaCode[] mediaCodes = new MediaCode[numMediaCodes];
// time passes, mediaCodes is partially but not completely filled...
MediaCode whoops = mediaCodes[numMediaCodes - 1];

Which permits injecting "nul" bytes into, say, a byte stream that will be
deserialized by C code expecting null-terminated strings, or recognizing
as a "media file" something that is very much not.

Sounds like that could easily lead to an exploit to me. And class
authors are helpless to prevent this easily foreseeable error. Even
making the constructor private won't help, because the zero default
cannot be suppressed, hidden or prevented in any way.

I've seen the suggestion "Make the class private". If the only solution
to the problem is to hide from it, that is a tacit admission that the
current design is unworkable.

Now consider the problems caused by the unwanted but mandatory
implicit initializers in this class:

public primitive class LongRational {
    private long numerator = 0;
    private long denominator = 0;
    ...
}

which I don't think I need to elaborate.

These are just two examples I thought of off the top of my head. I can
invent dozens more plausible ways that the all-zeroes default will
create exploitable bugs with very little effort, and you know that the,
ahem, professional bug exploiters will have even less trouble.

The following excerpt is from "Towards Better Serialization"
(Brian Goetz, June 2019),
https://cr.openjdk.java.net/~briangoetz/amber/serialization.html

> In an object-oriented system, the role of the constructor is to initialize
> an object with its invariants established; this allows the rest of the
> system to assume a basic degree of object integrity. In theory, we
> should be able to reason about the possible states an object might be
> in by reading the code for its constructors and any methods that
> mutate the object's state. But because serialization constitutes a
> hidden public constructor, you have to also reason about the state
> that objects might be in based on previous versions of the code
> (whose source code might not even exist any more, to say nothing
> of maliciously constructed bytestreams). By bypassing constructors,
> serialization completely subverts the integrity of the object model.

Strong words. "The role of the constructor is to initialize an object
with its invariants established." "Serialization constitutes a hidden
public constructor...", and "...bypassing constructors... completely
subverts the integrity of the object model."

I fully agree with all of those statements and sentiments.

Unless authors waste up to 8 bytes of space in every instance by
including an "isConstructed" boolean, or waste time revalidating the
state of every instance in every method call, the integrity of the
object model is subverted. Is not introducing footguns an important
goal? Is maintaining the integrity of the object model an important
goal?

There will be a compromise somewhere, but forcing all-zeroes on every
primitive class is the *wrong* compromise.

How is the JVM bulk-initializing an array to an author-controlled
default value via memcpy (or equivalent) likely to lead to exploits?
Specifically, how is it any more likely to lead to exploits than the
JVM initializing an array to an arbitrary, uncontrolled, possibly
inherently invalid default value via calloc (or equivalent)?

If static/final were required on primitive class constructors (or there
was another way to initialize an array, more on that later) then there
would be no possible way for an exception to be thrown mid-array-
initialization. Is that not safe?

If you really want belt-and-suspenders safety, the JVM can initialize to
zero, then reinitialize with a constructed default. I don't see the need
for it, but it's a possibility.

Really think about the LongRational case. If default-zero initialization
can make a simple numeric type (one of the primary anticipated use
cases for primitive classes) so unsafe that the *default instance* will
throw ArithmeticException if one so much as looks at it, what are we
doing?

Decreeing that primitive classes cannot ever opt out of an unsanitized,
unvalidated, all-zeroes value will render them completely unsuitable
for some roles that they would otherwise be ideal for. At that point,
we might as well drop Bucket 3 entirely and stick with nullable value
classes, since those have a preexisting, if unfortunate, default.

I do not want to see primitive class initialization become a foreseeable
and preventable disaster like serialization was. Any mistakes in the
design will be a lasting part of Java, for future developers to curse
and future blackhat hackers to exploit.

Cheers,
Clement Cherlin

> -------- Forwarded Message --------
> Subject:        Proposal: Static/final constructors for bucket-3 primitive
> classes.
> Date:   Fri, 3 Dec 2021 21:15:50 -0600
> From:   Clement Cherlin <clement.cherlin at gmail.com>
> To:     valhalla-spec-comments at openjdk.java.net
>
>
>
> Motivation: A concern with primitive classes (bucket 3) is that the
> all-zeroes default value may be inappropriate or even invalid in some
> cases. This proposal suggests a language enhancement to give primitive
> class authors control over the default value of their class without,
> in most cases, requiring a constructor call to create an instance.
>
> Proposed language change:
> Primitive classes can apply either the keyword "static" or the
> keyword "final", but not both, to their no-argument constructor.
>
> A "final" no-arg constructor is evaluated once, at compile time. The
> constructed object is treated as a static final constant, and can be
> folded as a constant, or copied verbatim whenever a default value of
> that class is instantiated.
>
> A "static" no-arg constructor is evaluated once, when the class is loaded.
> The
> constructed object is copied verbatim whenever a default value of that
> class is instantiated.
>
> Justification:
> Presuming that non-zero default values need to exist, and we're going
> to be constructing lots and lots of primitive objects and arrays of
> primitive objects, it behooves us to make initialization of default
> values as efficient as possible. Much of the time, there will be no
> need to call a constructor / factory method, just make a copy of a
> pre-existing default value (perhaps lazily).
>
> Related work:
> For classes without sensible default values, I have another proposal I
> am working on to make initializing arrays of primitive objects possible
> and efficient, without resorting to the all-zeroes default.
>
> Cheers,
> Clement Cherlin


More information about the valhalla-spec-observers mailing list