Revisiting default values

Dan Smith daniel.smith at oracle.com
Tue Jul 21 18:41:11 UTC 2020


> On Jul 20, 2020, at 10:27 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
> 
> That said, doing so in the language is potentially more viable.  It would mean, for classes that opt into this treatment:
> 
>  - Ensuring that `C.default` evaluates to the right thing
>  - Preventing `this` from escaping the constructor (which might be a good thing to enforce for inline classes anyway)
>  - Ensuring all fields are DA (which we do already), and that assignments to fields in ctors are not their default value 
>  - Translating `new Foo[n]` (and reflective equivalent) with something that initializes the array elements
> 
> The goal is to keep default instances from being observed.  If we lock down `this` from constructors, the major cost here is instantiating arrays of these things, but we already optimize array initialization loops like this pretty well.  
> 
> Overall this doesn't seem terrible.  It means that the cost of this is borne by the users of classes that opt into this treatment, and keeps the complexity out of the VM.  It does mean that "attackers" can generate bytecode to generate bad instances (a problem we have with multiple vectors today.)  
> 
> Call this "L".  

More letters!

Expanding on ways to support Bucket #3 by ensuring initialization of fields/arrays:

---

Option L: Language requires field/array initialization

An inline class may be declared to have no default. Fields and arrays of that class's inline type must be provably initialized (via compiler analysis) before they are read or published.

Instance fields of the class's inline type must be initialized before a method call involving 'this' occurs. (It's already illegal to allow the constructor to return before initialization.)

Static fields... seem hopeless, so maybe must have a reference type (perhaps implicitly). Maybe we can do an analysis that permits some very simple cases, but once you allow method calls of almost any sort, you've lost. (We'd have to prove that no initialization of *other* classes triggered by <clinit> refers to the field before it has been initialized.)

Arrays must be initialized at creation time, either with an array initializer ("Address[] as = { x, y, z };") or via a trusted API ("Address[] as = Arrays.of(i -> x);"). We might introduce a language sugar for the trusted API ("Address[] as = { i -> x };"). We *could* support two-stage initialization via things like 'Arrays.fill', but analysis to track uninitialized arrays from creation to filling doesn't seem worthwhile.

This is less expressive, obviously. In particular, many comfortable idioms for initializing an array won't work. As a case study: what happens in generic code like ArrayList? When it wants to allocate its array (we're in a specialized world where T has been specialized to 'QAddress;'), what value does it fill the array with? Nothing is available, because at this point the list is empty, and it's just allocating storage for later. I guess ArrayList (and similar data structures) has to have a special back door, and we're left to trust the author not to expose the uninitialized payload.

As with all language features, there's also the question of what happens when a class file doesn't conform to the language's rules. Option L can't really stand alone—it needs to be backed up by some other option when the language's guarantees fail.

---

Option M: JVM requires field/array initialization

Inline class files can indicate that their default instance is invalid. Fields and arrays of that class's inline type must be provably initialized (via verification or related analysis) before they are read or published.

All the compile-time analysis of Option L applies here, because the language compiler needs to be sure its generated class files are valid.

We can use some new verification types to track the initialization status of 'this', the way we do to require 'super' calls today. You don't have a fully formed 'Foo', capable of being passed to other methods, etc., until all fields are initialized. This would also apply to 'defaultvalue' for an inline class with a field of a default-less inline type.

Again, static fields are hopeless, it's an error to use the inline type as a static field type.

'anewarray' of the inline type is illegal, except within a trusted API. That API promises to initialize every array component before publishing the array. (We won't try to guarantee this with an analysis—the API is trusted because it has been vetted by humans.) In addition to some standard factory methods, we could decide that the inline class itself is always a trusted API.

(A related approach was discussed at our last EG meeting, but with much less expressiveness: inline-typed fields are always illegal, and arrays can only be allocated by the class author.)

This closes the backdoor of other bytecode not playing by the language's rules. The expressiveness problems of Option L remain—e.g., ArrayList's early allocation strategy is impossible.



More information about the valhalla-spec-observers mailing list