Revisiting default values
Gernot Neppert
mcnepp02 at googlemail.com
Wed Mar 17 16:09:11 UTC 2021
I like your idea of having the programmer mark explicitly which primitive
classes should support the 'zero-default' case.
However, I suggest to revert the meaning of the Marker interface to
'ZeroDefaultable'.
Why? Because it better matches the idea that an implementing type is more
capable than a type that does not implement it.
Then it can be used as a type-bound for generic functions.
As an example, have a look at Collection#toArray(IntFunction<T> generator).
This could then have the signature:
<T extends ZeroDefaultable> toArray(IntFunction<T> generator)
The following compile-time rules would apply:
For a type "ND" that does not implement 'ZeroDefaultable', the compiler
would ensure two things:
1. Force initialization of a member of that type, either at declaration
point or in a constructor, exactly as it does now for final members.
2. Forbid the expression "new ND[N]". This also includes disallowing the
lambda-expression "ND[]::new".
For being able to write generic code that create arrays of any type, the
JDK would provide a standard function in class java.util.Arrays such as
static <T> T[] newArray(int dimension, T initializer)
This would leave us with the corner case of accessing uninitialized
variables of derived classes via constructors of base-classes (or
indirectly via virtual dispatch from such a constructor).
In my observation, this case is extremely rare, and can and should be
neglected, as it represents a programming error already today.
Am Mi., 17. März 2021 um 16:14 Uhr schrieb Brian Goetz <
brian.goetz at oracle.com>:
> Let me propose another strategy for Bucket 3. It could be implemented
> at either the VM or language level, but the latter probably needs some
> help from the VM anyway. The idea is that the default value is
> _indistinguishable from null_. Strawman:
>
> - Classes can be marked as default-hostile (e.g., `primitive class X
> implements NoGoodDefault`);
> - Prior to dereferencing a default-hostile class, a check is made
> against the default value, and an NPE is thrown if it is the default value;
> - When widening to a reference type, a check is made if it is the
> default value, and if so, is converted to null;
> - When narrowing from a reference type, a check is made for null, and
> if so, converted to the default value;
> - It is allowable to compare `x == null`, which is intepreted as
> "widen x to X.ref, and compare";
> - (optional) the interface NoGoodDefault could have a method that
> optimizes the check, such as by using a pivot field, or the language/VM
> could try to automatically pick a pivot field.
>
> Classes which opt for NoGoodDefault will be slower than those that do
> not due to the check, but they will flatten. Essentially, this lets
> authors choose between "zero means default" and "zero means null", at
> some cost.
>
> A risk here is that ignorant users who don't understand the tradeoffs
> will say "oh, great, there's my nullable primitive types", overuse them,
> and then say "primitive types are slow, java sucks." The goal here
> would be to provide _safety_ for primitive types for which the default
> is dangerous.
>
>
> On 3/15/2021 11:52 AM, Brian Goetz wrote:
> > Picking this issue up again. To summarize Dan's buckets:
> >
> > Bucket 1 -- the zero default is in the domain, and is a sensible
> > default value. Zero for numerics, empty optionals.
> >
> > Bucket 2 -- there is a sensible default value, but all-zero-bits isn't
> > it.
> >
> > Bucket 3 -- there simply is no sensible default value.
> >
> >
> > Ultimately, though, this is not about defaults; it is about
> > _uninitialized variables_. The default only comes into play when the
> > user uses an uninitialized variable, which usually means (a)
> > uninitialized fields or (b) uninitialized array elements. It is
> > possible that the language could give us seat belts to dramatically
> > narrow the chance of uninitialized fields, but uninitialized array
> > elements are much harder to stamp out.
> >
> > It is an attractive distraction to get caught up in designing
> > mechanisms for supplying an alternate default ("just let the user
> > declare a no-arg constructor"), but this is focusing on the "writing
> > code" part of the problem, not the "keeping code safe" part of the
> > problem.
> >
> > In some sense, it is the existence (and size) of Bucket 1 that causes
> > the problem; Bucket 1 is what gives us our sense that it is safe to
> > use uninitialized variables. In the current language, uninitialized
> > reference variables are also safe in that if you use them before they
> > are initialized, you get an exception before anything bad can happen.
> > Uninitialized primitives in today's language are more dangerous,
> > because we may interpret the uninitialized value, but this has been a
> > problem we've been able to live with because today's primitives are
> > pretty limited and zero is usually a good-enough default in most
> > domains. As we extend primitives to look more like objects, with
> > behavior, this gets harder.
> >
> >
> > Both buckets 2 and 3 can be remediated without help from the language
> > or VM, perhaps inconveniently, by careful coding on the part of the
> > author of the primitive class:
> >
> > - don't expose fields to users (a good practice anyway)
> > - check for zero on entry to each method
> >
> > These are options A and E. The difference between Buckets 2 (A) and 3
> > (E) in this model is what do we do when we find a zero; for bucket 2,
> > we substitute some pre-baked value and use that, and for bucket 3, we
> > throw something (what we throw is a separate discussion.) The various
> > remediation techniques Dan offers represents a menu which allows us to
> > trade off reliability/cost/intrusiveness.
> >
> > I think we should lean on the model currently implemented by reference
> > types, where _accessing_ an uninitialized field is OK, but _using_ the
> > value in the field is not. If we have:
> >
> > String s;
> >
> > All of the following are fine:
> >
> > String t = s;
> > if (s == null) { ... }
> > if (s == t) { ... }
> >
> > The thing that is not fine is s-dot-something. These are the E/F/G
> > options, not the H/I options.
> >
> > Secondarily, H/I, which attempt to hide the default, create another
> > problem down the road: when we get to specialized generics,
> > `T.default` would become partial.
> >
> > Some of the solutions for Bucket 3 generalize well enough to Bucket 2
> > that we might consider merging them (though there are still messy
> > details). Option F, for example, injects code at the top of each
> > method body:
> >
> > int m() {
> > if (this == <zero-value>)
> > throw new NullPointerException();
> > /* body of m */
> > }
> >
> > into the top of each method; a corresponding feature for Bucket 2
> > might inject slightly different code:
> >
> > int m() {
> > if (this == <zero-value>)
> > return <better-default>.m();
> > /* body of m */
> > }
> >
> >
> > Another thing that has evolved since we started this discussion is
> > recognizing the difference between .val and .ref projections. Imagine
> > you could declare your membership in bucket 3:
> >
> > __bucket_3 primitive class NGD { ... }
> >
> > If, in addition to some way of generating an NPE on dereference (F, G,
> > etc), we mucked with the conversion of NGD.val to NGD.ref (which the
> > compiler can inject code on), we could actually put a null on top of
> > the stack. Then, code like:
> >
> > if (ngd == null) { ... }
> >
> > would actually work, because to do the comparison, we'd first promote
> > ngd to a reference type (null is already a reference), and we'd
> > compare two nulls.
> >
> >
> >
> > On 7/10/2020 2:23 PM, Dan Smith wrote:
> >> Brian pointed out that my list of candidate inline classes in the
> Identity Warnings JEP (JDK-8249100) includes a number of classes that,
> despite being "value-based classes" and disavowing their identity, might
> not end up as inline classes. The problem? Default values.
> >>
> >> This might be a good time to revisit the open design issues surrounding
> default values and see if we can make some progress.
> >>
> >> Background/status quo: every inline class has a default instance, which
> provides the initial value of fields and array components that have the
> inline type (e.g., in 'new Point[10]'). It's also the prototype instance
> used to create all other instances (start with 'vdefault', then apply
> 'withfield' as needed). The default value is, by fiat, the class instance
> produced by setting all fields to *their* default values. Often, but not
> always, this means field/array initialization amounts to setting all the
> bits to 0. Importantly, no user code is involved in creating a default
> instance.
> >>
> >> Real code is always useful for grounding design discussions, so let's
> start there. Among the classes I listed as inline class candidates, we can
> put them in three buckets:
> >>
> >> Bucket #1: Have a reasonable default, as declared.
> >> - wrapper classes (the primitive zeros)
> >> - Optional & friends (empty)
> >> - From java.time: Instant (start of 1970-01-01), LocalTime (midnight),
> Duration (0s), Period (0d), Year (1 BC, if that's acceptable)
> >>
> >> Bucket #2: Could have a reasonable default after re-interpreting fields.
> >> - From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime,
> ZonedDateTime, OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion,
> MinguoDate, HijrahDate, JapaneseDate, ThaiBuddhistDate (months and days
> should be nonzero; null Strings, ZoneIds, HijrahChronologies, and
> JapaneseEras require special handling)
> >> - ListN, SetN, MapN (null array interpreted as empty)
> >>
> >> Bucket #3: No good default.
> >> - Runtime.Version (need a non-null List<Integer>)
> >> - ProcessHandleImpl (need a valid process ID)
> >> - List12, Set12, Map1 (need a non-null value)
> >> - All ConstantDesc implementations (need real class & method names,
> etc.)
> >>
> >> There's some subjectivity between the 2nd and 3rd buckets, but the idea
> behind the 2nd is that, with some translation layer between physical fields
> and interpretation of those fields, we can come up with an intuitive
> default (e.g., "0 means January"; "a null String means time zone 'UTC'").
> In contrast, in the third bucket, any attempt to define a default value is
> going to be pretty unintuitive ("A null method name means 'toString'").
> >>
> >> The question here is how much work the JVM and language are willing to
> do, or how much work we're willing to ask clients to do, in order to
> support use cases that don't fall into Bucket #1.
> >>
> >> I don't think totally excluding Buckets #2 and #3 is a very good
> outcome. It means that, in many cases, inline classes need to be built up
> exclusively from primitives or other inline types, because if you use
> reference types, your default value will have a null field. (Sometimes, as
> in Optional, null fields have straightforward interpretations, but most of
> the time programs are designed to prevent them.)
> >>
> >> Whether we support Bucket #2 but not Bucket #3 is a harder question. It
> wouldn't be so bad if none of the examples above in Bucket #3 become inline
> classes—for the most part they're handled via interfaces, anyway.
> (Counterpoint: inline class instances that are immediately typed with
> interface types still potentially provide a performance boost.) But I'm
> also not sure this is representative. We've noted before that many use
> cases, like database records or data structure cursors, don't have
> meaningful defaults (what's a default mailing address?). The ConstantDesc
> classes really illustrate this, even though they happen to not be public.
> >>
> >> Another observation is that if we support Bucket #3 but not Bucket #2,
> that's probably not a big deal—I'm not sure anybody really *wants* to deal
> with the default instance; it's just the price you pay for being an inline
> class. If there's a way to opt out of that extra weirdness and move from
> Bucket #2 to Bucket #3, great.
> >>
> >> With that discussion in mind, here are some summaries of approaches
> we've considered, or that I think we ought to consider, for supporting
> buckets #2 and #3. (This is as best as I recall. If there's something I've
> missed, add it to the list!)
> >>
> >> [Weighing in for myself: my current preference is to do one of F, G, or
> I. I'm not that interested in supporting Bucket #2, for reasons given
> above, although Option A works for programmers who really want it.]
> >>
> >>
> >>
> >> === Solutions to support Bucket #2 ===
> >>
> >> Two broad strategies here: re-interpreting fields (A, B), and
> re-interpreting the default instance (C, D).
> >>
> >> ---
> >>
> >> Option A: Encourage programmers to re-interpret fields
> >>
> >> Guidance to programmers: when you declare an inline class, identify any
> fields for which the default instance should hold something other than
> zero/null; define a mapping for your implementation from zero/null to the
> value you want.
> >>
> >> One way to do this is to define a (possibly private) getter for each
> field, and include logic like 'return month + 1' or 'return id == null ?
> "UTC" : id'. Or maybe you inline that logic, as long as you're careful to
> do so everywhere. Importantly, you also need to reverse the logic in your
> constructor—for the sake of '==', if somebody manually creates the default
> instance, you should set fields to zero/null.
> >>
> >> This doesn't work if you want public fields, but that's life as an OO
> programmer.
> >>
> >> In this approach, it would be important that inline classes be expected
> to document their default instance in Javadoc (perhaps with a new Javadoc
> tag)—the interpretation of the default instance is less apparent to users
> than "all zeros".
> >>
> >> Limitations:
> >>
> >> - It's a fairly error-prone approach. Programmers will absolutely
> forget to apply the mapping in one place, and everything will be fine until
> somebody tries to invoke a particular method on the default instance. Put
> that bug in a security-sensitive context, and maybe you have an exploit.
> (Something that could help some is choosing good names—call your field
> 'monthIndex', not plain 'month', to remind yourself that it's zero-based.)
> >>
> >> - Performance impact of an extra layer of computation on all field
> accesses. Probably not a big deal in general, but all those null checks,
> etc., could have a negative impact in certain contexts. And the
> *appearance* of extra cost might scare programmers away from doing the
> right thing ("eh, I probably won't use the default value anyway, I'll just
> ignore it to make my code faster").
> >>
> >> ---
> >>
> >> Option B: Language support for field re-interpretation
> >>
> >> The language allows inline classes to declare fields with mappings
> to/from an internal representation. Just like Option A, but with guarantees
> that the internal representation isn't inappropriately accessed directly.
> >>
> >> This pulls on a thread we explored a bit for Amber awhile back, some
> form of "abstract fields" or "virtual fields". Maybe there's something
> there, but it seems like a general-purpose feature, and one we're not
> likely to reach a final solution on anytime soon.
> >>
> >> ---
> >>
> >> Option C: Language support for a designated default
> >>
> >> The language provides some way for programmers to declare the "logical"
> default instance (something like a special static field). The compiler
> inserts a test for the "physical" default on any field/array access, and
> replaces it with the logical default.
> >>
> >> That is:
> >>
> >> Point p = points[3];
> >>
> >> compiles to
> >>
> >> point p$0 = points[3];
> >> Point p = (p$0 == [vdefault Point]) ? Point.DEFAULT : p$0;
> >>
> >> This is much less bug-prone than Option A—the compiler does all the
> work—and much more achievable in the short/medium term than Option B.
> >>
> >> Compared to Option B, this pushes the computation overhead from inline
> class field accesses to reads of the inline type from fields/arrays. I
> don't know if that's good or bad—maybe a wash, heavily dependent on the use
> case.
> >>
> >> A few big problems:
> >>
> >> - The physical default still exists, and malicious bytecode can use it.
> If programmers want strong guarantees, they'll have to check and throw
> wherever an untrusted instance is provided. (Clients with access to the
> inline class's fields have to do so, too.)
> >>
> >> - Covariant arrays mean every read from any array type that might be
> flattened (Object[], Runnable[], ConstantDesc[], ...) has to go through
> translation logic.
> >>
> >> - There's an assumption here that the programmer doesn't intend to use
> the physical default as a valid non-default instance. That's hard for the
> compiler to enforce, and weird stuff happens in fields/arrays if the
> programmer doesn't prevent it. (Could be mitigated with extra implicit
> logic on field/array writes or in constructors.)
> >>
> >> ---
> >>
> >> Option D: JVM support for a designated default
> >>
> >> The VM allows inline classes to designate a logical default instance,
> and the field/array access instructions map from the physical default to
> the logical default. The 'vdefault' instruction produces the logical
> default instance; something else is used by the class's factories to build
> from the physical default.
> >>
> >> This addresses the first two problems with Option C—the VM gives strong
> guarantees, and can make the translation a virtual operation of certain
> arrays.
> >>
> >> To address the second problem, it seems like we'd need the more complex
> logic I hinted at: on writes, map the physical default to the logical
> default, and map the logical default to the physical default. Do the
> reverse on reads.
> >>
> >> The problem here is bytecode complexity/slowdowns. We've already added
> some complexity to 'aaload'/'aastore' (covariant flattened arrays), and
> anticipate similar changes to 'putfield'/'getfield' (specialized fields),
> so maybe that means we might as well do more. Or maybe it means we're
> already over budget. :-)
> >>
> >> From the users' perspective, if any performance reduction on
> reads/writes can be limited to the inline classes in Bucket #2, *all* the
> options have a similar cost, whether imposed by the programmer, language,
> or VM. So, to a first approximation, slower opcode execution is fine.
> >>
> >>
> >>
> >> === Solutions to support Bucket #3 ===
> >>
> >> Two broad strategies here: rejecting member accesses on the default
> instance (E, F, G), and preventing programs from ever seeing the default
> instance (H, I).
> >>
> >> ---
> >>
> >> Option E: Encourage programmers to guard against default instances
> >>
> >> Guidance to programmers: if you don't like your class's default
> instance, check for it in your methods and throw. Maybe Java SE defines a
> new RuntimeException to encourage this.
> >>
> >> The simple way to do this is with some boilerplate at the start of all
> your methods:
> >>
> >> if (this == MyClass.default) throw new InvalidDefaultException();
> >>
> >> More permissive classes could just do some validation on the fields
> that are relevant to a particular operation. (E.g., 'getMonth' doesn't care
> if 'zoneId' is null.)
> >>
> >> This doesn't work if you want public fields, but that's life as an OO
> programmer.
> >>
> >> It's not ideal that an invalid instance can float around a program
> until somebody trips on one of these checks, rather than detecting the
> invalid value earlier—we're propagating the NPE problem. And it takes some
> getting used to that there are two null-like values in the reference type's
> domain.
> >>
> >> ---
> >>
> >> Option F: Language support for default instance guards
> >>
> >> An inline class declaration can indicate that the default instance is
> invalid. The compiler generates guards, as in Option E, at the start of all
> instance method bodies, and perhaps on all field accesses outside of those
> methods.
> >>
> >> Programmers give up finer-grained control, but get more safety. I'm
> sure most would be happy with that trade.
> >>
> >> Improper/separately-compiled bytecode can skip the field access checks,
> but that's a minor concern.
> >>
> >> Same issues as Option E regarding adding a "new NPE" to the platform.
> >>
> >> ---
> >>
> >> Option G: JVM support for default instance guards
> >>
> >> Inline class files can indicate that their default instance is invalid.
> All attempts to operate on that instance (via field/method accesses, other
> than 'withfield') result in an exception.
> >>
> >> This tightens up Option F, making it just as impossible to access
> members of the default instance as it is to access members of 'null'.
> >>
> >> Same issues as Option E regarding adding a "new NPE" to the platform.
> >>
> >> ---
> >>
> >> Option H: Language checks on field/array reads
> >>
> >> An inline class declaration can indicate that the default instance is
> invalid. Every field and array access that may involved an uninitialized
> field/array component of that inline type gets augmented with a check that
> rejects reads of the default value (treating it as "you forgot to
> initialize this variable").
> >>
> >> That is:
> >>
> >> Point p = points[3];
> >>
> >> compiles to
> >>
> >> point p$0 = points[3];
> >> if (p$0 == [vdefault Point]) throw new UninitializedVariableException();
> >> Point p = p$0;
> >>
> >> This is much like Option C, and has roughly the same
> advantages/problems. There's not a strong guarantee that the default value
> won't pop up from untrusted bytecode (or unreliable inline class authors),
> and lots of array types need guards.
> >>
> >> ---
> >>
> >> Option I: JVM checks on field/array reads
> >>
> >> Inline class files can indicate that their default instance is invalid.
> When reading from a field/array component of the inline type
> ('getfield'/'getstatic'/'aaload'), an exception is thrown if the default
> value is found (treating it as "you forgot to initialize this variable").
> The 'vdefault' instruction, like 'withfield', is illegal outside of the
> inline class's nest.
> >>
> >> Better than Option H in that it can be optimized to occur on only
> certain reads, and in that it provides strong guarantees—only the inline
> class can ever "see" the default instance.
> >>
> >> Well, unless the inline class chooses to share that instance with the
> world. Not sure how we prevent that. But maybe at that point, anything
> bad/weird that happens is the author's own fault. (E.g., putting the
> default value in an array will make that component effectively
> "uninitialized" again.)
> >>
> >> Like Option D, there's a question of whether we're willing to add this
> complexity to the 'getifled'/'getstatic'/'aaload' instructions. My sense is
> that at least it's less complexity than you have in Option D.
> >>
> >
>
>
More information about the valhalla-spec-observers
mailing list