Experimentation with build time and runtime class initialization in qbicc

Sat May 28 17:15:23 UTC 2022

Certainly everything comes with a tradeoff.

But I would argue that the cost of the current workaround to influence 
static field initializations - make a separate static inner class for a 
static field that should be initialized separately - is even higher 
because it requires a full class data structure just to hold a single 
field. Even in the JDK, the number of inner classes named "Lazy" is 
growing. A more fine-grained initialization of fields within a class can 
help to reduce such overhead.

-Christian

On 5/28/22 09:58, Brian Goetz wrote:
> I too agree that the "soupy" nature of <clinit> makes 
> reverse-engineering difficult, and that this alternate translation 
> would make things easier for an after-the-fact analysis tool that is 
> trying to reason about what computations could be safely shifted in time.
>
> But, keep in mind that it's not a free lunch.  To point out the 
> obvious tradeoff: this turns into a startup hit for every dynamically 
> executed Java program (larger classfiles, more bytecodes, more 
> methods).  This is a tradeoff we would have to consider carefully, 
> since making Java startup slower in general is not a cost we should 
> take on lightly, especially given the charter of this project.  So, 
> something for the "could consider" list, but not a slam-dunk.
>
>
>
>
>
> On 5/28/2022 12:39 PM, Christian Wimmer wrote:
>> Hi,
>>
>> I agree with the "soupy nature" of <clinit> methods mentioned below. 
>> This makes it impossible in general to reverse-engineer which parts 
>> of <clinit> initialize which static field. One suggestion how that 
>> could be improved: Instead of emitting a single <clinit> method, 
>> javac can emit separate <clinit_XXX> methods for each static field 
>> that is initialized inline as part of the field declaration, as well 
>> as each static{} block. With a consistent naming scheme of these 
>> methods, it would be much easier to run some initializations at build 
>> time and some at run time. For compatibility, the <clinit> method 
>> could be a chain of invocations of the <clinit_XXX> methods (or maybe 
>> <clinit> itself is no longer necessary at all).
>>
>> So for example a class
>>
>> class MyClass {
>>   static Object o1 = "abc";
>>   static {
>>     foo();
>>   }
>>   static Object o2 = 42;
>> }
>>
>> the Java compiler would create the methods (written here with 
>> disassembled bytecode)
>>
>> <clinit_o1>() {
>>   o1 = "abc"
>> }
>> <clinit_$0>() {
>>   foo();
>> }
>> <clinit_o2>() {
>>   o2 = 42;
>> }
>> <clinit>() {
>>   <clinit_o1>();
>>   <clinit_$0>();
>>   <clinit_o2>();
>> }
>>
>> Why such a scheme? It is much easier to prove here that the field o2 
>> can be initialized at build time regardless of what foo() is doing, 
>> and then remove the run-time initialization of o2 by replacing 
>> <clinit_o2> with an empty method. All of that can be done without 
>> analyzing and modifying the bytecode soup of the current <clinit> 
>> method.
>>
>> -Christian
>>
>>
>> On 5/27/22 08:35, Dan Heidinga wrote:
>>> On Thu, May 26, 2022 at 9:01 PM Brian Goetz <brian.goetz at oracle.com> 
>>> wrote:
>>>> Thanks for providing this.
>>>>
>>>> Something about the qbicc approach here doesn't seem to add up to me.
>>>> Maybe you can tell me what I'm missing.
>>>>
>>>>   From reading your notes, it seems that at build time, you start with
>>>> the root class(es), execute their <clinit>, which will cause 
>>>> loading of
>>>> more classes, more <clinits>, and you iterate until there are no new
>>>> classes to initialize.
>>> With qbicc we embraced the closed-world constraint and mandated that
>>> all class initialization happens at build time.  While we started with
>>> runtime class initialization to bootstrap being able to run more code,
>>> we quickly switched to being all-in on build time init (BTI) due to
>>> the virtuous cycle between BTI and dead code elimination.
>>>
>>>> You then treat the statics as roots, and
>>>> serialize those objects to the initial heap image.  But before doing
>>>> that, you exclude (zero out) any which are marked as "reinitialize at
>>>> runtime."
>>> Right.
>>>
>>>> The rationale for this clearly is that you want to continue the graph
>>>> walk to find all the loadable classes, but then don't want to use the
>>>> polluted value.  But what happens in cases like this:
>>>>
>>>>       class Aliased {
>>>>           @RuntimeInitialized private static final Socket s = ...;
>>>>           private static final Socket copy = s;
>>>>       }
>>>>
>>>> Do you throw on reads of runtime-initialized fields from a 
>>>> <clinit>?  Do
>>>> you walk the heap and find aliases to runtime-initialized values, and
>>>> replace them with something (if so, what?)  Or is the Aliased class
>>>> above just "broken" according to this model, and I encounter a
>>>> stale/nonworking socket in `copy` at runtime, and one that is not
>>>> properly aliased to `s`?  Once an object is initialized at build time,
>>>> its state can escape into all sorts of other places, and just zeroing
>>>> out the static root isn't enough to stamp it out.
>>> This is where the "soupy" nature of <clinit> becomes evident. <clinit>
>>> is a single method that has tremendous side effects, setting static
>>> fields, initializing other classes, starting threads, caching computed
>>> values, etc.  It's very hard to automatically reason about what has
>>> happened in a <clinit> method and what the user intends for those side
>>> effects (if they're even aware of what they all may be!).
>>>
>>> What was the user's intent when they initialized 'copy'?  To record
>>> what the original Socket connection - set up at build time - had been
>>> rather than separately storing the address/port?  If they had a
>>> semantic meaning for `copy` even after `s` had been nulled out, then
>>> automatically resetting `copy` would violate their expectation.
>>>
>>> We need the user to tell us their intent.  If they wanted both `s` &
>>> `copy` to be reset, then they need to be explicit about that and
>>> annotate both fields.  We don't attempt to null all copies of the
>>> value of a @RuntimeInitialized field.
>>>
>>>> Am I missing something?
>>> You seemed to have grasped it correctly =)
>>>
>>> If that field had been a primitive, such as a long, we'd be unable to
>>> track down which other longs in the heap were copies of it or derived
>>> from it.  We wouldn't reset some other location with the value 42
>>> because a @RuntimeInitialized field was set to 42 at build time.  The
>>> programmer has to take responsibility for which fields need to be
>>> reset.  With qbicc, that's annotations.  With Leyden we may be able to
>>> give them a better way to group fields and express how & when they
>>> should be initialized.
>>>
>>> --Dan
>>>
>>>> Thanks,
>>>> -Brian
>>>>
>>>>
>>>> On 5/26/2022 4:22 PM, David P Grove wrote:
>>>>> Hi,
>>>>>
>>>>>                   I’ve appended the contents of the referenced 
>>>>> wiki page in this email.  Apologies in advance if the formatting 
>>>>> doesn’t come through as intended.
>>>>>
>>>>>                   There is a full implementation of this (GPLv2 + 
>>>>> Classpath exception) as part of the qbicc project on GitHub.  
>>>>> There is also a GitHub discussion in the qbicc project that links 
>>>>> to various GitHub issues that capture the history that led to the 
>>>>> current design.  I will not hyperlink to those here so that if 
>>>>> people have any IP concerns, they can avoid seeing them.  They are 
>>>>> easily findable.
>>>>>
>>>>> Regards,
>>>>>
>>>>> --dave
>>>>>
>>>>> ## Overview
>>>>>
>>>>> One of the goals of the qbicc project is to explore technical 
>>>>> approaches for adapting Java's specification of class 
>>>>> initialization to fully support native image compilation.  
>>>>> Enabling build-time evaluation of complex class initialization 
>>>>> logic is essential for obtaining much of the benefits of native 
>>>>> image compilation: reduced memory footprint and fast startup.  
>>>>> However, both the core JDK and many frameworks will not be 
>>>>> primarily be used in native image scenarios.  Therefore, it is 
>>>>> essential that the approach taken for build-time initialization 
>>>>> enables both the existing runtime class initialization and the new 
>>>>> build-time class initialization logic to co-exist. Furthermore, 
>>>>> for as many cases as possible, the class initialization code 
>>>>> should be shared between the two usage scenarios and have 
>>>>> non-surprising semantics in both.
>>>>>
>>>>> ## Build-time Initialization
>>>>>
>>>>> In qbicc, all classes are initialized at build-time. Class 
>>>>> initialization at build time is performed according to the 
>>>>> existing semantics of Java class initialization driven by 
>>>>> build-time execution of the `<clinit>` methods of reachable 
>>>>> classes. The set of reachable classes is determined iteratively, 
>>>>> starting with the program entrypoints and adding the methods and 
>>>>> classes they utilize until no further reachable classes are 
>>>>> discovered (a fixed point is reached).
>>>>>
>>>>> After build-time initialization has completed, a build-time heap 
>>>>> has been constructed that contains the objects that were created 
>>>>> during the build-time execution of the `<clinit>` methods.  Using 
>>>>> the reachable static fields of the reachable program as roots, 
>>>>> this build-time heap is serialized into the native image.  This 
>>>>> set of objects will form the initial runtime heap of the program 
>>>>> when it is executed.
>>>>>
>>>>> ## Runtime Initializers
>>>>>
>>>>> There are cases where one or more initialization actions of a 
>>>>> class **must** be executed at program runtime.  Most typically 
>>>>> these involve the creation of native resources (open files, 
>>>>> threads, etc) that cannot be successfully serialized into the 
>>>>> build time heap.
>>>>>
>>>>> Qbicc supports runtime initialization by allowing static fields of 
>>>>> a classes to be declared as runtime initialized. These fields will 
>>>>> be initialized lazily, at first access, by executing a runtime 
>>>>> initializer (`<rtinit>`) associated with the accessed field.  
>>>>> Runtime initialization is localized: accessing a particular static 
>>>>> field will cause its runtime initializer to be executed but has no 
>>>>> implications for other runtime initializers defined either in the 
>>>>> field's defining class or any superclass or implemented interface 
>>>>> of the field's defining class.
>>>>>
>>>>> When serialized from the build-time heap to the runtime heap, all 
>>>>> runtime-initialized fields will be serialized with the zero 
>>>>> (uninitialized) value appropriate for their type.
>>>>>
>>>>> Qbicc allows related static fields in the same class to share a 
>>>>> common `<rtinit>` method. The first access to any of the fields 
>>>>> will cause the execution of the associated `<rtinit>` method and 
>>>>> the initialization of all the fields.
>>>>>
>>>>> ## Adjusting Heap Serialization
>>>>>
>>>>> For some objects it is necessary to initialize them during 
>>>>> build-time initialization, but "reset" them before they are used 
>>>>> at runtime.
>>>>> Qbicc supports this by allowing fields to be annotated to be 
>>>>> serialized as the type-appropriate zero value or as a primitive 
>>>>> constant value. This value replacement happens as the build time 
>>>>> heap is serialized.
>>>>>
>>>>> One common scenario is to invalidate objects that are wrapping 
>>>>> native resources. For example, when a `FileDescriptor` is 
>>>>> serialized its `fd` and `handle` instance fields are serialized as 
>>>>> `-1` and its `closed` field is serialize as `true`. Thus, any 
>>>>> attempt to use the build-time FileDescriptor at runtime will raise 
>>>>> the appropriate exception.
>>>>>
>>>>> ## Patching: Migration for Existing Classes
>>>>>
>>>>> The runtime initialization mechanisms described above are 
>>>>> currently enabled via a set of annotations.  This allows qbicc to 
>>>>> implement the desired semantics without requiring any changes to 
>>>>> the Java compiler, class file format, or language specification. 
>>>>> In the long term, we believe small modifications to the Java 
>>>>> specification, for example defining a `rtinit { ... }` similar to 
>>>>> the existing `static { ... }` construct could enable a simpler 
>>>>> specification.
>>>>>
>>>>> The primary annotation for runtime initialization is 
>>>>> `RuntimeAspect`.  This annotation is defined on a class and is 
>>>>> interpreted as meaning that the `<clinit>` method of the class 
>>>>> should be interpreted as an `<rtinit>` method.  This method will 
>>>>> not be executed during build-time initialization and instead will 
>>>>> be deferred until the first access of one of the static fields 
>>>>> defined in the class.
>>>>>
>>>>> To allow us to "externally" modify JDK core classes for qbicc, we 
>>>>> have developed an annotation-driven patcher infrastructure. The 
>>>>> patcher allows the declaration of patch classes that add, remove, 
>>>>> and modify the methods and fields of an existing class.  This 
>>>>> modification includes the replacement of the `<clinit>` method and 
>>>>> the declaration of multiple `RuntimeAspect` patch classes.
>>>>>
>>>>> The best way to explore what is possible with the patcher is to 
>>>>> examine the java.base/src directory in the qbicc-class-library 
>>>>> project. It makes extensive use of the patcher annotations to 
>>>>> adapt the core JDK classes to qbicc while still allowing us to 
>>>>> consume the upstream OpenJDK code base via an unmodified git 
>>>>> submodule.
>>>>>
>>>>> ## Design Alternatives
>>>>>
>>>>> A number of alternatives were considered before arriving at the 
>>>>> final design documented here.  The technical discussions and 
>>>>> options considered can be explored starting in qbicc discussion 
>>>>> #764 on GitHub.
>>>>>
>>>>>
>>>>> From: Brian Goetz<brian.goetz at oracle.com>
>>>>> Date: Thursday, May 26, 2022 at 2:21 PM
>>>>> To: David P Grove<groved at us.ibm.com>,"leyden-dev at openjdk.java.net" 
>>>>> <leyden-dev at openjdk.java.net>
>>>>> Subject: [EXTERNAL] Re: Experimentation with build time and 
>>>>> runtime class initialization in qbicc
>>>>>
>>>>> Hi David; Would like to understand more about this, but first, 
>>>>> from an IP-hygiene perspective, documents linked from this list 
>>>>> should be under the OpenJDK terms and conditions. Can you post the 
>>>>> contents of that document here, so there are no
>>>>> ZjQcmQRYFpfptBannerStart
>>>>> This Message Is From an External Sender
>>>>> This message came from outside your organization.
>>>>> ZjQcmQRYFpfptBannerEnd
>>>>> Hi David;
>>>>>
>>>>> Would like to understand more about this, but first, from an 
>>>>> IP-hygiene perspective, documents linked from this list should be 
>>>>> under the OpenJDK terms and conditions.  Can you post the contents 
>>>>> of that document here, so there are no issues there?
>>>>>
>>>>> Thanks,
>>>>> -Brian
>>>>> On 5/26/2022 12:35 PM, David P Grove wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> In the qbicc project, we’ve been exploring options for adapting 
>>>>> Java’s class initialization semantics for native images.  In 
>>>>> particular, we are trying to arrive at a non-surprising semantics 
>>>>> that in a native-image scenarios allows most initialization to 
>>>>> happen at build-time while still enabling runtime initialization 
>>>>> of selected static fields.
>>>>>
>>>>>
>>>>>
>>>>> Our current design and experience is captured 
>>>>> here:https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc<https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc>. 
>>>>> In a nutshell, the idea is to initialize classes via build-time 
>>>>> execution of existing <clinit> methods as per normal Java 
>>>>> semantics while adding per-static-field <rtinit> methods to 
>>>>> provide a capability for runtime-reinitialization of a field 
>>>>> before its first access.
>>>>>
>>>>>
>>>>>
>>>>> --dave
>>>>>
>>>>>
>>>>>
>>>>>
>