From webseiten.designer at googlemail.com Wed May 4 16:46:13 2022 From: webseiten.designer at googlemail.com (Tim Feuerbach) Date: Wed, 4 May 2022 18:46:13 +0200 Subject: Default-zeroness as a special case of non-nullability Message-ID: ?Reading the user model stacking thread of April, the proposed splitting of the three buckets into individual "knobs" does sound like a promising solution, albeit I wonder whether some options might sound scary (or not scary enough!) to us developers who prefer not to think about the intrinsics of memory layout in the JVM. Factoring out the atomicity decision is a good idea; tearing could lead to some nasty surprises. However, I want to address the topic of zero-defaultness of (former?) B3 primitives, since it introduces an enticing property previously denied to user types: non-nullability. It's something people wanted for years from Java, and makes them look jealously towards other languages. Primitive classes will not only be considered for their performance improvements, but also for their null safety. Some people *will* ignore all the red flags (all-zero default, non-atomicity in the old user model) and declare primitives just because of this property, even though their default makes no sense or even violates the class's invariant. But one uninitialized field later, instead of a NullPointerException, they will get a good old January 1st, 1970 for their CustomInstant, which is not that much better. So: why not offer full choice of nullability at the user site, where we can force the user to pick a value? Obviously, this is out of scope of Valhalla, but I'd like to ask whether the proposed user model can leave that door open for the future, seeing that Brian Goetz has thrown 'T!' into the ring as a possible spelling of .val/.zero (I know, bikeshedding!). In a world where non-nullabilty and zero-default are separate things, 'Instant!' would just mean the former; the only difference would be how fields and arrays of this type are created. Zero-default types get away with not initializing any of them. On the other hand, mere non-nullable type declarations could require the user to specify a value following the same rules as final fields. Trying to assign null at any point in time to a _non-null field or variable will fail. The tricky part are arrays; but here Kevin Bourrillion proposed requiring a fill value: On Wed, Apr 27, 2022 at 9:36 PM Kevin Bourrillion > wrote: > By the way, as we talk about this zero problem, these are the example cases > that go through my head: > (Type R) e.g. Rational, EmployeeId: the default value is illegal; can't > even construct it on purpose. Every method on it *should* call > `checkValid()` first. Might as well repurpose it as a pseudo-null. Bugs > could be prevented by some analogue of aftermarket nullness analysis. > (Type I) e.g. Instant: the default value is legal, but it's a bad default > value (while moderately guessable, it's arbitrary/meaningless). This makes > the strongest case for being reference-only. Or it has to add a `boolean > isValid` field (always set to true) to join Type R above. > (Type C) e.g. Complex: the default value is a decent choice -- guessable, > but probably not the identity of the *most* common reduction op (which I > would guess is multiplication). > (Type O) e.g. Optional, OptionalInt, UnsignedLong: the default value is the > best possible kind -- guessable, and the identity of the presumably most > common reduction operation. > For type I, we would probably ban nonempty array instance creation > expressions! This would force the arrays to be created by > `Collection.toArray()` or by new alternative value-capable versions of > `Arrays.fill()` and `Arrays.setAll()` which accept a size instead of a > premade array. Actually, if the new Arrays.fill() could short-circuit when > passed `TheType.default` then we might want to do this for types C and O > too; why not make users be explicit. (Personally, I would prefer _zero-ok types to always require explicitly assigning the default, not only for arrays. It always felt wrong that linters and IDEs would yell at me for making "0" and "false" explicit.) There is, of course, the topic of serialization. While missing primitive fields have a fallback, arbitrary _non-null types do not. Summarizing the proposal: * a _non-null B1 does not differ from a nullable B1, except for additional constraints on declaration and assignment * a _non-null B2 does not need a null channel if the compiler can prove that access of an uninitialized value is impossible (no leaking 'this' before assignment, no subclass/static access), increasing the chance of being flattened on the heap * a _non-null B3 is a _non-null B2 with an optimized default value that may be accessed before initialization; atomicity is a separate knob From scolebourne at joda.org Thu May 5 09:48:34 2022 From: scolebourne at joda.org (Stephen Colebourne) Date: Thu, 5 May 2022 10:48:34 +0100 Subject: User model: terminology In-Reply-To: References: Message-ID: On Wed, 4 May 2022 at 16:06, Brian Goetz wrote: Overall, things are looking positive in Valhalla. I agree 100% with Kevin's document, particularly the initial definitions of Values, Variables, Containers, Kinds of Values and the essential characteristics of Objects. I still believe there is a profitable direction of thought in imagining what Java would be like if references were visible in source code: class Person { private ref name; private ref birthDate; private int customerScore; } This way of thinking makes it more obvious why == does what it does for example. I still think it would be worth exposing an actual `ref` type as part of Valhalla as it helps join the whole model in people's minds, and allows null to be managed better (maybe it is just a new name for `Optiona`?. I'm onboard with the idea that authors can opt-in to accepting non-atomicity, although I worry that it will be overused and not properly understood. I'm also onboard with the idea that authors can opt-in to accepting that zero-is-ok. I think that both are significant problems, thus I want both the syntax and defaults to reflect the safe choice (B2 safe-atomicity zero-not-ok). > class B1 { } // ref only > value class B3 { } // ref and val projections > value-based class B2 { } // ref only Given the above, while this reads very nicely, it runs far too big a risk of exposing bad zeroes. > - A term for all non-identity classes. (Previously, all classes had identity.) After reading Kevins document I'm a bit wary of using "value" for this term. Because as per the document, values have a quite specific meaning in the value-variable-reference model, and I think "value class" might well be muddying the water. "inline" is OK, but a bit meh. "primitive" isn't right at all. I'd like to propose "struct" : public class B1 {...} public struct B2 {...} I know that "struct" has some baggage and is usually used for a compound set of fields, but I don't see any particular reason why it couldn't fit here. As a word it evokes ideas of a "bundle of memory" and "being passed around without overhead". It goes nicely in text and code. I actually think it is better to *not* use "class" here. Java already has a thing where one declaration produces multiple classes - enums, and we generally call them "enum", not "enum class". This is another reason not to use "value class". > - A term for what we've been calling atomicity: that instances cannot > appear to be torn, even when published under race. (Previously, all > classes had this property.) I'd like to propose "raw" for this term (non-atomicity). I've used it in my mental model for a few days and it seems to work quite well. IMO "fragile" or "unsafe" don't quite fit with what I'd expect from a programming language term. "non-atomic" would be OK, but is a bit verbose. public raw struct B2a {...} Some phrases: "All existing primitives are raw" "The JVM may choose to treat a struct as raw even if not defined as such, but only where it can prove it is safe to do so." I don't think there is any need to mention raw at the use-site, but if there was it works well - `Complex.raw c = ...` > - A term for those non-identity classes which do not _require_ a > reference. These must have a valid zero, and give rise to two types, > what we've been calling the "ref" and "val" projections. To me, the term "primitive" is most closely associated with the fact that zero-is-ok and that there are two projections (int and Integer). Thus I think the term here is "primitive". public primitive struct B3 {...} public raw primitive struct B3a {...} Although there is a case for dropping "struct" in code, the *term* is clearer with struct. Saying "primitive struct" clearly delineates them from the original primitives. For naming, I would say that ".ref" type would get the good name, leading to ".primitive" as the projection (or perhaps ".val"). A type aliasing feature could then be used: alias int = Integer.primitive or public primitive int struct Integer {...} I prefer aliasing as it is needed to make sense of the int/Integer char/Character mess we already have. HTH Stephen