Experimentation with build time and runtime class initialization in qbicc

Brian Goetz brian.goetz at oracle.com
Sat May 28 16:58:37 UTC 2022


I too agree that the "soupy" nature of <clinit> makes 
reverse-engineering difficult, and that this alternate translation would 
make things easier for an after-the-fact analysis tool that is trying to 
reason about what computations could be safely shifted in time.

But, keep in mind that it's not a free lunch.  To point out the obvious 
tradeoff: this turns into a startup hit for every dynamically executed 
Java program (larger classfiles, more bytecodes, more methods).  This is 
a tradeoff we would have to consider carefully, since making Java 
startup slower in general is not a cost we should take on lightly, 
especially given the charter of this project.  So, something for the 
"could consider" list, but not a slam-dunk.





On 5/28/2022 12:39 PM, Christian Wimmer wrote:
> Hi,
>
> I agree with the "soupy nature" of <clinit> methods mentioned below. 
> This makes it impossible in general to reverse-engineer which parts of 
> <clinit> initialize which static field. One suggestion how that could 
> be improved: Instead of emitting a single <clinit> method, javac can 
> emit separate <clinit_XXX> methods for each static field that is 
> initialized inline as part of the field declaration, as well as each 
> static{} block. With a consistent naming scheme of these methods, it 
> would be much easier to run some initializations at build time and 
> some at run time. For compatibility, the <clinit> method could be a 
> chain of invocations of the <clinit_XXX> methods (or maybe <clinit> 
> itself is no longer necessary at all).
>
> So for example a class
>
> class MyClass {
>   static Object o1 = "abc";
>   static {
>     foo();
>   }
>   static Object o2 = 42;
> }
>
> the Java compiler would create the methods (written here with 
> disassembled bytecode)
>
> <clinit_o1>() {
>   o1 = "abc"
> }
> <clinit_$0>() {
>   foo();
> }
> <clinit_o2>() {
>   o2 = 42;
> }
> <clinit>() {
>   <clinit_o1>();
>   <clinit_$0>();
>   <clinit_o2>();
> }
>
> Why such a scheme? It is much easier to prove here that the field o2 
> can be initialized at build time regardless of what foo() is doing, 
> and then remove the run-time initialization of o2 by replacing 
> <clinit_o2> with an empty method. All of that can be done without 
> analyzing and modifying the bytecode soup of the current <clinit> method.
>
> -Christian
>
>
> On 5/27/22 08:35, Dan Heidinga wrote:
>> On Thu, May 26, 2022 at 9:01 PM Brian Goetz <brian.goetz at oracle.com> 
>> wrote:
>>> Thanks for providing this.
>>>
>>> Something about the qbicc approach here doesn't seem to add up to me.
>>> Maybe you can tell me what I'm missing.
>>>
>>>   From reading your notes, it seems that at build time, you start with
>>> the root class(es), execute their <clinit>, which will cause loading of
>>> more classes, more <clinits>, and you iterate until there are no new
>>> classes to initialize.
>> With qbicc we embraced the closed-world constraint and mandated that
>> all class initialization happens at build time.  While we started with
>> runtime class initialization to bootstrap being able to run more code,
>> we quickly switched to being all-in on build time init (BTI) due to
>> the virtuous cycle between BTI and dead code elimination.
>>
>>> You then treat the statics as roots, and
>>> serialize those objects to the initial heap image.  But before doing
>>> that, you exclude (zero out) any which are marked as "reinitialize at
>>> runtime."
>> Right.
>>
>>> The rationale for this clearly is that you want to continue the graph
>>> walk to find all the loadable classes, but then don't want to use the
>>> polluted value.  But what happens in cases like this:
>>>
>>>       class Aliased {
>>>           @RuntimeInitialized private static final Socket s = ...;
>>>           private static final Socket copy = s;
>>>       }
>>>
>>> Do you throw on reads of runtime-initialized fields from a 
>>> <clinit>?  Do
>>> you walk the heap and find aliases to runtime-initialized values, and
>>> replace them with something (if so, what?)  Or is the Aliased class
>>> above just "broken" according to this model, and I encounter a
>>> stale/nonworking socket in `copy` at runtime, and one that is not
>>> properly aliased to `s`?  Once an object is initialized at build time,
>>> its state can escape into all sorts of other places, and just zeroing
>>> out the static root isn't enough to stamp it out.
>> This is where the "soupy" nature of <clinit> becomes evident. <clinit>
>> is a single method that has tremendous side effects, setting static
>> fields, initializing other classes, starting threads, caching computed
>> values, etc.  It's very hard to automatically reason about what has
>> happened in a <clinit> method and what the user intends for those side
>> effects (if they're even aware of what they all may be!).
>>
>> What was the user's intent when they initialized 'copy'?  To record
>> what the original Socket connection - set up at build time - had been
>> rather than separately storing the address/port?  If they had a
>> semantic meaning for `copy` even after `s` had been nulled out, then
>> automatically resetting `copy` would violate their expectation.
>>
>> We need the user to tell us their intent.  If they wanted both `s` &
>> `copy` to be reset, then they need to be explicit about that and
>> annotate both fields.  We don't attempt to null all copies of the
>> value of a @RuntimeInitialized field.
>>
>>> Am I missing something?
>> You seemed to have grasped it correctly =)
>>
>> If that field had been a primitive, such as a long, we'd be unable to
>> track down which other longs in the heap were copies of it or derived
>> from it.  We wouldn't reset some other location with the value 42
>> because a @RuntimeInitialized field was set to 42 at build time.  The
>> programmer has to take responsibility for which fields need to be
>> reset.  With qbicc, that's annotations.  With Leyden we may be able to
>> give them a better way to group fields and express how & when they
>> should be initialized.
>>
>> --Dan
>>
>>> Thanks,
>>> -Brian
>>>
>>>
>>> On 5/26/2022 4:22 PM, David P Grove wrote:
>>>> Hi,
>>>>
>>>>                   I’ve appended the contents of the referenced wiki 
>>>> page in this email.  Apologies in advance if the formatting doesn’t 
>>>> come through as intended.
>>>>
>>>>                   There is a full implementation of this (GPLv2 + 
>>>> Classpath exception) as part of the qbicc project on GitHub.  There 
>>>> is also a GitHub discussion in the qbicc project that links to 
>>>> various GitHub issues that capture the history that led to the 
>>>> current design.  I will not hyperlink to those here so that if 
>>>> people have any IP concerns, they can avoid seeing them.  They are 
>>>> easily findable.
>>>>
>>>> Regards,
>>>>
>>>> --dave
>>>>
>>>> ## Overview
>>>>
>>>> One of the goals of the qbicc project is to explore technical 
>>>> approaches for adapting Java's specification of class 
>>>> initialization to fully support native image compilation.  Enabling 
>>>> build-time evaluation of complex class initialization logic is 
>>>> essential for obtaining much of the benefits of native image 
>>>> compilation: reduced memory footprint and fast startup.  However, 
>>>> both the core JDK and many frameworks will not be primarily be used 
>>>> in native image scenarios.  Therefore, it is essential that the 
>>>> approach taken for build-time initialization enables both the 
>>>> existing runtime class initialization and the new build-time class 
>>>> initialization logic to co-exist. Furthermore, for as many cases as 
>>>> possible, the class initialization code should be shared between 
>>>> the two usage scenarios and have non-surprising semantics in both.
>>>>
>>>> ## Build-time Initialization
>>>>
>>>> In qbicc, all classes are initialized at build-time. Class 
>>>> initialization at build time is performed according to the existing 
>>>> semantics of Java class initialization driven by build-time 
>>>> execution of the `<clinit>` methods of reachable classes. The set 
>>>> of reachable classes is determined iteratively, starting with the 
>>>> program entrypoints and adding the methods and classes they utilize 
>>>> until no further reachable classes are discovered (a fixed point is 
>>>> reached).
>>>>
>>>> After build-time initialization has completed, a build-time heap 
>>>> has been constructed that contains the objects that were created 
>>>> during the build-time execution of the `<clinit>` methods.  Using 
>>>> the reachable static fields of the reachable program as roots, this 
>>>> build-time heap is serialized into the native image.  This set of 
>>>> objects will form the initial runtime heap of the program when it 
>>>> is executed.
>>>>
>>>> ## Runtime Initializers
>>>>
>>>> There are cases where one or more initialization actions of a class 
>>>> **must** be executed at program runtime.  Most typically these 
>>>> involve the creation of native resources (open files, threads, etc) 
>>>> that cannot be successfully serialized into the build time heap.
>>>>
>>>> Qbicc supports runtime initialization by allowing static fields of 
>>>> a classes to be declared as runtime initialized. These fields will 
>>>> be initialized lazily, at first access, by executing a runtime 
>>>> initializer (`<rtinit>`) associated with the accessed field.  
>>>> Runtime initialization is localized: accessing a particular static 
>>>> field will cause its runtime initializer to be executed but has no 
>>>> implications for other runtime initializers defined either in the 
>>>> field's defining class or any superclass or implemented interface 
>>>> of the field's defining class.
>>>>
>>>> When serialized from the build-time heap to the runtime heap, all 
>>>> runtime-initialized fields will be serialized with the zero 
>>>> (uninitialized) value appropriate for their type.
>>>>
>>>> Qbicc allows related static fields in the same class to share a 
>>>> common `<rtinit>` method. The first access to any of the fields 
>>>> will cause the execution of the associated `<rtinit>` method and 
>>>> the initialization of all the fields.
>>>>
>>>> ## Adjusting Heap Serialization
>>>>
>>>> For some objects it is necessary to initialize them during 
>>>> build-time initialization, but "reset" them before they are used at 
>>>> runtime.
>>>> Qbicc supports this by allowing fields to be annotated to be 
>>>> serialized as the type-appropriate zero value or as a primitive 
>>>> constant value. This value replacement happens as the build time 
>>>> heap is serialized.
>>>>
>>>> One common scenario is to invalidate objects that are wrapping 
>>>> native resources. For example, when a `FileDescriptor` is 
>>>> serialized its `fd` and `handle` instance fields are serialized as 
>>>> `-1` and its `closed` field is serialize as `true`. Thus, any 
>>>> attempt to use the build-time FileDescriptor at runtime will raise 
>>>> the appropriate exception.
>>>>
>>>> ## Patching: Migration for Existing Classes
>>>>
>>>> The runtime initialization mechanisms described above are currently 
>>>> enabled via a set of annotations.  This allows qbicc to implement 
>>>> the desired semantics without requiring any changes to the Java 
>>>> compiler, class file format, or language specification. In the long 
>>>> term, we believe small modifications to the Java specification, for 
>>>> example defining a `rtinit { ... }` similar to the existing `static 
>>>> { ... }` construct could enable a simpler specification.
>>>>
>>>> The primary annotation for runtime initialization is 
>>>> `RuntimeAspect`.  This annotation is defined on a class and is 
>>>> interpreted as meaning that the `<clinit>` method of the class 
>>>> should be interpreted as an `<rtinit>` method.  This method will 
>>>> not be executed during build-time initialization and instead will 
>>>> be deferred until the first access of one of the static fields 
>>>> defined in the class.
>>>>
>>>> To allow us to "externally" modify JDK core classes for qbicc, we 
>>>> have developed an annotation-driven patcher infrastructure. The 
>>>> patcher allows the declaration of patch classes that add, remove, 
>>>> and modify the methods and fields of an existing class.  This 
>>>> modification includes the replacement of the `<clinit>` method and 
>>>> the declaration of multiple `RuntimeAspect` patch classes.
>>>>
>>>> The best way to explore what is possible with the patcher is to 
>>>> examine the java.base/src directory in the qbicc-class-library 
>>>> project. It makes extensive use of the patcher annotations to adapt 
>>>> the core JDK classes to qbicc while still allowing us to consume 
>>>> the upstream OpenJDK code base via an unmodified git submodule.
>>>>
>>>> ## Design Alternatives
>>>>
>>>> A number of alternatives were considered before arriving at the 
>>>> final design documented here.  The technical discussions and 
>>>> options considered can be explored starting in qbicc discussion 
>>>> #764 on GitHub.
>>>>
>>>>
>>>> From: Brian Goetz<brian.goetz at oracle.com>
>>>> Date: Thursday, May 26, 2022 at 2:21 PM
>>>> To: David P Grove<groved at us.ibm.com>,"leyden-dev at openjdk.java.net" 
>>>> <leyden-dev at openjdk.java.net>
>>>> Subject: [EXTERNAL] Re: Experimentation with build time and runtime 
>>>> class initialization in qbicc
>>>>
>>>> Hi David; Would like to understand more about this, but first, from 
>>>> an IP-hygiene perspective, documents linked from this list should 
>>>> be under the OpenJDK terms and conditions. Can you post the 
>>>> contents of that document here, so there are no
>>>> ZjQcmQRYFpfptBannerStart
>>>> This Message Is From an External Sender
>>>> This message came from outside your organization.
>>>> ZjQcmQRYFpfptBannerEnd
>>>> Hi David;
>>>>
>>>> Would like to understand more about this, but first, from an 
>>>> IP-hygiene perspective, documents linked from this list should be 
>>>> under the OpenJDK terms and conditions.  Can you post the contents 
>>>> of that document here, so there are no issues there?
>>>>
>>>> Thanks,
>>>> -Brian
>>>> On 5/26/2022 12:35 PM, David P Grove wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> In the qbicc project, we’ve been exploring options for adapting 
>>>> Java’s class initialization semantics for native images.  In 
>>>> particular, we are trying to arrive at a non-surprising semantics 
>>>> that in a native-image scenarios allows most initialization to 
>>>> happen at build-time while still enabling runtime initialization of 
>>>> selected static fields.
>>>>
>>>>
>>>>
>>>> Our current design and experience is captured 
>>>> here:https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc<https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc>. 
>>>> In a nutshell, the idea is to initialize classes via build-time 
>>>> execution of existing <clinit> methods as per normal Java semantics 
>>>> while adding per-static-field <rtinit> methods to provide a 
>>>> capability for runtime-reinitialization of a field before its first 
>>>> access.
>>>>
>>>>
>>>>
>>>> --dave
>>>>
>>>>
>>>>
>>>>


More information about the leyden-dev mailing list