Revisiting default values

Tue Jul 28 22:42:21 UTC 2020

> On Jul 28, 2020, at 11:33 AM, Tobi Ajila <Tobi_Ajila at ca.ibm.com> wrote:
> 
> > Bucket #3 classes must be reference-default, and fields/arrays of their inline type are illegal outside of the declaring class. The declaring class can provide a flat array factory if it wants to. (A new idea from Tobi, he'll write it up for the thread.)

I've since come to see this as a variant of Option L or Option M: we apply some restrictions + analysis to guarantee that uninitialized fields/arrays are never exposed. In this case, the guarantee is easy to prove because nobody can declare fields/arrays at all, except the class author.

> This approach is appealing for the following reasons: no additional JVM complexity (ie. no bytecode checks for the bad default value), no javac boilerplate (ie. guards on member access, guards on method entries, etc.). On the other there are two big drawbacks: no instance field flattening for these types, and creating flattened arrays is a bit unnatural since it has to be done via a factory.

The biggest problem I see with approaches that prevent use of 'anewarray' is that they violate our uniform bytecode design, which is crucial to specialization. That is: how do I allocate a flat array of T in something like ArrayList? I can't be calling arbitrary factory methods depending on T.

There's also a problem of exactly what these array factory methods are supposed to do. Sure, we can blame the author if they choose to leak garbage data through the factory. But... what are they going to put in the array, if not garbage data? This is really more of a Bucket #2 solution, where there exists some reasonable default to fill the array with.

> I think it would help if we had a clear sense as to what proportion of inline-types we think will have this "bad default" problem. Last year when we discussed null-default inline types the thinking was that about 75% of the motivation for null-defaults was migrating VBC, 20% for security, 5% for "I want null in my value set.". My assumption is that the vast majority of inline-types will not be migrated types, they will be new types. If this is correct then it would appear that the default value problem is really a problem for a minority of inline-types. 

My two cents: this is not about migrated vs. new types. This is about what's being modeled. A certain subset of inline classes will model some sort of numeric quantity with a natural "zero" value. Many others—I'd predict more than 50%, though it will depend a lot on how accommodating we are to these use cases—will represent non-numeric data without any "zero" analog. These will often wrap non-null references (strings, for example).

(Challenge: can we think of any use cases for inline classes that have a natural all-zeros default value *other than* a numeric zero, a singleton with no fields, or the equivalent of Optional.empty()? Maybe a collection of boolean flags? Once you've got references, it's pretty unusual to expect them to be null.)

Within the subset that doesn't have a good default, it's often the case that the class has limited exposure, and some programmers might happily trade safety guarantees for performance, knowing they can trust all clients (or if there's a bug, they'll catch it in testing). So maybe they'll be fine with the all-zeros default story. But any class that belongs to a public API, or even that has significant non-public exposure, is going to want to be confident that it's operating on valid data.

> I would argue that the costs should be limited to types that want to opt-in to not expose their default value or un-initialized value.

Yes, agreed. Major demerits for any approach that imposes costs on programs that don't make use of no-default inline classes.

> I think its important to decide if we want this kind of feature but also what we are willing to give up to get it.

The right way to think about it is this: there exist many classes that don't need identity and also don't have natural defaults. We're not going to make those classes cease to exist. It's not a "yes or no" choice, it's a "what is the sanctioned approach?" choice.

The "yes or no" framing leads to attempts to compare performance with or without checks. But the "which approach" choice means choosing between performance of:
- An identity class
- A class with hand-coded checks in methods
- A class that automatically checks member accesses, like we do with null
- A dynamic requirement that fields/arrays of a certain class type have to be initialized before they're read
- Etc.