FAQ: why no user-defined default instances?
John Rose
john.r.rose at oracle.com
Wed Apr 19 22:51:35 UTC 2023
This is a recurrent question, so I wrote an answer.
https://cr.openjdk.org/~jrose/values/why-zero-defaults.html
To avoid link problems, here is the markdown source as well:
--------
Question: Why not just allow me, the Valhalla user, to define my own
default instance for my class?
After all, for a `Person` value class, the `String name` field should be
an empty string by default, not `null`. And for a `RationalNumber`
value class, the `int denominator` of the default instance should be `1`
not `0`, and if it is a `BigInteger` it should be `BigInteger.ONE`. Why
is Java refusing to give me control over those decisions?
Since the default instance is derived from a special constructor in the
class declaration, why can't that allow the user to specify a different
(not all-zero) default instance, as an ad hoc constructor body? How
hard can it be for a class to collect some default values for fields,
and then “stamp out” their bit pattern into every new uninitialized
variable of the class? Isn’t it an unwelcome burden for users to cope
with a mandated default value for their class?
Answer: After long discussion and multiple re-evaluation, the Valhalla
design time will support default values already naturally present in the
Java language, but not others. In short, fields will always start out
with their natural zero value (which is `null` for references).
Valhalla values can expose their own natural default values (bundles of
zero field-values) but no other defaults.
This is a compromise of complexity versus expressiveness. (Such
compromises are typical in Java.) There are a number of reasons for
this one.
1. The complexity at the VM level does not seem to be justified since
there are simple workarounds at the user level. (See below for notes on
workarounds and VM complexity.)
2. “Stamping out” zeroes is fundamentally simpler to do than
“stamping out” some other pattern. Especially managed references:
If you “stamp out” GC-managed references, you have to have a
store-barrier conversation with the GC at each reference, if the GC is
designed that way. This factor all by itself will slow array creation
down, no matter what other design decisions are taken.
3. Also, the all-zero value will be visible in some states, even though
we are trying to make it disappear. At least while initial field
initializers are executed, and probably at other times as well. That
will lead to confusion.
Note that none of the options being discussed here makes any use of
field initializers, because those apply to all constructors. A new form
of field initializer would be required to create a low-ceremony syntax
for non-zero default instances.
The VM-defined default field values are admittedly not suitable, in many
cases. A class abstraction author has a right and duty to define and
enforce the valid states of fields, and that may not include nulls or
zeroes in the class fields. In refusing to supply user-defined default
field values, Valhalla requires users to employ workarounds for
unsuitably initialized fields.
1. The simplest workaround is not to expose the default value. Allow
`null` to be the default value of your class, just as with all
pre-Valhalla classes. The standard workarounds for the `null` value
apply in all cases. Valhalla can succeed even if it doesn’t fully
solve all problems with `null`.
2. A field being null (or some other zero) can be kept inside the class
abstraction by defining an access method. There is no particular reason
why class fields must be made accessible to the public. Instead, an
access method can detect the unsuitable state (often `null`) and replace
it with a better state, defined by the class author. This involves a
little more ceremony (a test and branch) than offered by a special
syntax (one-time assignment) but it works. It is of course a workaround
used even before Valhalla.
3. As a variation of the test-and branch workaround, a numeric zero
value, if unsuitable, can be converted using exclusive OR to a different
value, yielding code which is probably as fast as a raw field read. The
class would define a private static final constant `FDV` of the
preferred field default value, and encode and decode field values
appropriately, as `this.f0=f^FDV` and then `return f0^FDV`, where `f0`
is the zero-default field in the class’s private implementation.
We could allow an ad-hoc constructor body written by the user to
“poke” non-default values into fields, and run it exactly once,
capturing the instance state to use for all future default
initializations. But this would have a number of problems.
1. The initial value of `this` is in fact has to be an all-zero default,
which means the `aconst_init` bytecode, used at the beginning of all
constructors including the initial one, must always yield the all-zero
default, regardless of what the language says. Therefore, there is
always a risk of the all-zero default showing up later, due to an
insufficiently protected use of `aconst_init`.
2. If the ad-hoc constructor body itself builds arrays or other objects
requiring the default value, there is a vicious circularity, requiring
new JVM specification language.
3. The ad-hoc constructor body must be assigned, by the JVM
specification, an order to execute relative to the whole of `<clinit>`,
and this again is not trivial to specify or implement. Normally, object
constructors run after a class is fully initialized (unless they are
initiated during the `<clinit>` activity itself). But in this case an
object constructor, just the one special one, would presumably have to
run either before all `<clinit>` activity (as an early initialization
step) or else lazily in a separately specified and implemented phase of
class setup.
In any case, array creation performance is likely to take some kind of
performance hit. The issue is that a every array type will have to
store the bit pattern to “stamp out” when creating an array that has
an element class with a non-zero default. One might think that this is
a pay-as-you go feature, incurring cost only to classes which default
non-zero defaults, but that it not completely true. There are places in
the JVM and JDK where arrays (and other objects) are created from
dynamically selected types. Such places are likely to need a new
slow-path branch to handle the possibility that the selected type
requires special special handling for a non-zero default. Tests and
branches are cheap but it cannot be assumed that they are free.
As noted above, managed references are a particularly difficult cost to
control, whe considering the initialization of non-zero defaults,
particularly for arrays. It is somewhat dispiriting to contemplate a
flurry of GC activity to create the default state of an array,
immediately followed by a copy of non-default values, initiating a
second flurry on the same locations. One might avoid this particular
problem by allowing only primitive values to be “stamped out”, but
that reduces the utility of a user-defined default, and forces users
into workarounds anyway.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-experts/attachments/20230419/406a12ab/attachment.htm>
More information about the valhalla-spec-experts
mailing list