Experimentation with build time and runtime class initialization in qbicc

Fri May 27 18:11:16 UTC 2022

Thanks for the explanation.

I realize that qbicc is largely an exploration, and the current status 
is a proof-of-concept, so I'm not trying to denigrate the work you've 
done or score cheap points by picking holes; I am sure this hole is one 
you are already well aware of.  Instead, I'm trying to highlight (as 
DanH also has) the fact that, no matter how much we might like to think 
of this as a "tool-based transformation", what is really going on is 
that a new language is being invented, and its semantics are being 
retroactively applied to the existing Java language -- which is of 
course a dangerous and difficult game.  (The issue I raised demonstrates 
that this particular new language is not yet up to the level of safety 
we expect from Java, but again, the current goal of qbicc is to explore 
the possibilities, not to seriously propose a new model for 
initialization.  So, all good.)

As a language designer, I have no quibble with inventing new languages, 
or with evolving the Java language to do new things -- indeed, that's 
what I do every day.  But if we are to evolve the Java language, we must 
do so honestly and holistically.  How we approach shifting the timing 
constraints of initialization is going to require careful thought, and 
likely multiple iterations, before we arrive at a programming model that 
meets the safety, transparency, and performance requirements such a 
feature would demand.

As to scope, as already indicated in Mark's announcement, we intend to 
take an incremental approach.  We are definitely interested in exploring 
whether there is a sufficiently safe and transparent programming model 
that could get us to build-time initialization, but it is not currently 
the first priority of Leyden -- there are more foundational (and less 
intrusive) things we should address first.  But the topic is definitely 
fair game for discussion, and I suspect it will take several iterations 
before we would get to anything we could consider putting in 
"everybody's Java."

Cheers,
-Brian

On 5/27/2022 12:15 PM, David P Grove wrote:
>
> From: Brian Goetz<brian.goetz at oracle.com>
>>  From reading your notes, it seems that at build time, you start with the root class(es), execute their <clinit>,
>> which will cause loading of more classes, more <clinits>, and you iterate until there are no new classes to initialize.
>> You then treat the statics as roots, and serialize those objects to the initial heap image.  But before doing that,
>> you exclude (zero out) any which are marked as "reinitialize at runtime."
> This is correct.  In addition, as qbicc serializes each object, it also looks for annotations on instance fields that indicate that instead of serializing the build-time value of the instance field it should substitute a different value (FileDescriptor is a motivating example...we want to serialize a closed FileDescriptor to ensure any runtime reads/writes through it will result in the proper exception being raised).
>
>> ... what happens in cases like this:
>>
>> class Aliased {
>>         @RuntimeInitialized private static final Socket s = ...;
>   >       private static final Socket copy = s;
>   >}
>
> First, I'll say what this code snippet would do with qbicc, then I'll say what the program should be to get the semantics the programmer probably intended.
>
> At build time, qbicc will execute the <clinit> of Aliased, presumably a Socket object will be allocated by ... and references to that Socket object will be stored in s and copy.  Any build-time usage of either s or copy via a build-time executed getfield will get a reference to that Socket object in the build-time heap.  The @RuntimeInitialized has no impact on the build-time execution of code. At the end of compilation, when we serialize the static fields for Aliased, we will write null for s and a reference to the serialized Socket object for copy.  In the generated code, all getfields to s will be preceded by checks to ensure that the <rtinit> method for s has been executed (similar to how a clinit check would be generated in a JVM).  Since copy does not have a <rtinit>, getfields to copy in the generated code will not be preceded by any checks.   The first time s is accessed at runtime, the ... code will be executed by the <rtinit> method and a new Socket object will be created and stored in s.  The fields s and copy will now point to distinct Socket objects. Usages of the Socket object reachable from copy would likely result in an exception because the backing FileDescriptor for the Socket object referenced from copy would have been modified during the serialization process so that its instance fields have values as if the FileDescriptor had been closed.
>
> Using the syntax above, one would need to write this code to get the intended aliasing at both build-time and runtime.
> class Aliased {
>          @RuntimeInitilalized private static final Socket s = ...;
>          @RuntimeInitilalized private static final Socket copy = s;
> }
>
> The way we would actually write this pattern in qbicc today is a little more indirect because we (1) we don't want to change javac and (2) we don’t want to directly edit OpenJDK source code (to make it easier to consume updates). Therefore, we define a "patch class" with a @RuntimeAspect annotation that qbicc combines with the unmodified bytecodes of the Aliased class to get what we need.  I've added a third field just to emphasize that we need to allow the <rtinit> of a class to be a subset of its <clinit>.
>
> class Aliased {
>          private static final Socket s = ...;
>          private static final Socket copy = s;
>         private static final Object anotherField = ...
> }
>
> @RuntimeAspect(Aliased.class)
> class Aliased_RT {
>          private static final Socket s = ...;
>          private static final Socket copy = s;
> }
>
> The only part of the Aliased_RT class we are interested in is the <clinit> method that javac generated for it.  The qbicc compiler takes Aliased_RT's <clinit> and uses it as the <rtinit> method for the fields s and copy of the Aliased class.  The rest of the Aliased_RT class is ignored.
>
> If one was able to change javac, then the simpler @RuntimeInitialized syntax you had used would be better.  From a single class definition, javac could generate both a <clinit> method that initialized s, copy, and anotherField and an <rtinit> method that initialized s and copy.
>
> Finally, qbicc does not attempt to recognize when an object that is directly referred to by a @RuntimeInitialized static field is also reachable in some other (perhaps deeply nested) way.  As a result, it is certainly possible to write programs where build-time and runtime-time identity (==) of two access paths is different.  So far, this hasn't been an issue for us, but it is one of the ways in which one could detect at runtime that something non-standard has happened.
>
> Hope this explains more clearly without being tediously long,
>
> --dave
>
>