Experimentation with build time and runtime class initialization in qbicc
Christian Wimmer
christian.wimmer at oracle.com
Sat May 28 16:39:13 UTC 2022
Hi,
I agree with the "soupy nature" of <clinit> methods mentioned below.
This makes it impossible in general to reverse-engineer which parts of
<clinit> initialize which static field. One suggestion how that could be
improved: Instead of emitting a single <clinit> method, javac can emit
separate <clinit_XXX> methods for each static field that is initialized
inline as part of the field declaration, as well as each static{} block.
With a consistent naming scheme of these methods, it would be much
easier to run some initializations at build time and some at run time.
For compatibility, the <clinit> method could be a chain of invocations
of the <clinit_XXX> methods (or maybe <clinit> itself is no longer
necessary at all).
So for example a class
class MyClass {
static Object o1 = "abc";
static {
foo();
}
static Object o2 = 42;
}
the Java compiler would create the methods (written here with
disassembled bytecode)
<clinit_o1>() {
o1 = "abc"
}
<clinit_$0>() {
foo();
}
<clinit_o2>() {
o2 = 42;
}
<clinit>() {
<clinit_o1>();
<clinit_$0>();
<clinit_o2>();
}
Why such a scheme? It is much easier to prove here that the field o2 can
be initialized at build time regardless of what foo() is doing, and then
remove the run-time initialization of o2 by replacing <clinit_o2> with
an empty method. All of that can be done without analyzing and modifying
the bytecode soup of the current <clinit> method.
-Christian
On 5/27/22 08:35, Dan Heidinga wrote:
> On Thu, May 26, 2022 at 9:01 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>> Thanks for providing this.
>>
>> Something about the qbicc approach here doesn't seem to add up to me.
>> Maybe you can tell me what I'm missing.
>>
>> From reading your notes, it seems that at build time, you start with
>> the root class(es), execute their <clinit>, which will cause loading of
>> more classes, more <clinits>, and you iterate until there are no new
>> classes to initialize.
> With qbicc we embraced the closed-world constraint and mandated that
> all class initialization happens at build time. While we started with
> runtime class initialization to bootstrap being able to run more code,
> we quickly switched to being all-in on build time init (BTI) due to
> the virtuous cycle between BTI and dead code elimination.
>
>> You then treat the statics as roots, and
>> serialize those objects to the initial heap image. But before doing
>> that, you exclude (zero out) any which are marked as "reinitialize at
>> runtime."
> Right.
>
>> The rationale for this clearly is that you want to continue the graph
>> walk to find all the loadable classes, but then don't want to use the
>> polluted value. But what happens in cases like this:
>>
>> class Aliased {
>> @RuntimeInitialized private static final Socket s = ...;
>> private static final Socket copy = s;
>> }
>>
>> Do you throw on reads of runtime-initialized fields from a <clinit>? Do
>> you walk the heap and find aliases to runtime-initialized values, and
>> replace them with something (if so, what?) Or is the Aliased class
>> above just "broken" according to this model, and I encounter a
>> stale/nonworking socket in `copy` at runtime, and one that is not
>> properly aliased to `s`? Once an object is initialized at build time,
>> its state can escape into all sorts of other places, and just zeroing
>> out the static root isn't enough to stamp it out.
> This is where the "soupy" nature of <clinit> becomes evident. <clinit>
> is a single method that has tremendous side effects, setting static
> fields, initializing other classes, starting threads, caching computed
> values, etc. It's very hard to automatically reason about what has
> happened in a <clinit> method and what the user intends for those side
> effects (if they're even aware of what they all may be!).
>
> What was the user's intent when they initialized 'copy'? To record
> what the original Socket connection - set up at build time - had been
> rather than separately storing the address/port? If they had a
> semantic meaning for `copy` even after `s` had been nulled out, then
> automatically resetting `copy` would violate their expectation.
>
> We need the user to tell us their intent. If they wanted both `s` &
> `copy` to be reset, then they need to be explicit about that and
> annotate both fields. We don't attempt to null all copies of the
> value of a @RuntimeInitialized field.
>
>> Am I missing something?
> You seemed to have grasped it correctly =)
>
> If that field had been a primitive, such as a long, we'd be unable to
> track down which other longs in the heap were copies of it or derived
> from it. We wouldn't reset some other location with the value 42
> because a @RuntimeInitialized field was set to 42 at build time. The
> programmer has to take responsibility for which fields need to be
> reset. With qbicc, that's annotations. With Leyden we may be able to
> give them a better way to group fields and express how & when they
> should be initialized.
>
> --Dan
>
>> Thanks,
>> -Brian
>>
>>
>> On 5/26/2022 4:22 PM, David P Grove wrote:
>>> Hi,
>>>
>>> I’ve appended the contents of the referenced wiki page in this email. Apologies in advance if the formatting doesn’t come through as intended.
>>>
>>> There is a full implementation of this (GPLv2 + Classpath exception) as part of the qbicc project on GitHub. There is also a GitHub discussion in the qbicc project that links to various GitHub issues that capture the history that led to the current design. I will not hyperlink to those here so that if people have any IP concerns, they can avoid seeing them. They are easily findable.
>>>
>>> Regards,
>>>
>>> --dave
>>>
>>> ## Overview
>>>
>>> One of the goals of the qbicc project is to explore technical approaches for adapting Java's specification of class initialization to fully support native image compilation. Enabling build-time evaluation of complex class initialization logic is essential for obtaining much of the benefits of native image compilation: reduced memory footprint and fast startup. However, both the core JDK and many frameworks will not be primarily be used in native image scenarios. Therefore, it is essential that the approach taken for build-time initialization enables both the existing runtime class initialization and the new build-time class initialization logic to co-exist. Furthermore, for as many cases as possible, the class initialization code should be shared between the two usage scenarios and have non-surprising semantics in both.
>>>
>>> ## Build-time Initialization
>>>
>>> In qbicc, all classes are initialized at build-time. Class initialization at build time is performed according to the existing semantics of Java class initialization driven by build-time execution of the `<clinit>` methods of reachable classes. The set of reachable classes is determined iteratively, starting with the program entrypoints and adding the methods and classes they utilize until no further reachable classes are discovered (a fixed point is reached).
>>>
>>> After build-time initialization has completed, a build-time heap has been constructed that contains the objects that were created during the build-time execution of the `<clinit>` methods. Using the reachable static fields of the reachable program as roots, this build-time heap is serialized into the native image. This set of objects will form the initial runtime heap of the program when it is executed.
>>>
>>> ## Runtime Initializers
>>>
>>> There are cases where one or more initialization actions of a class **must** be executed at program runtime. Most typically these involve the creation of native resources (open files, threads, etc) that cannot be successfully serialized into the build time heap.
>>>
>>> Qbicc supports runtime initialization by allowing static fields of a classes to be declared as runtime initialized. These fields will be initialized lazily, at first access, by executing a runtime initializer (`<rtinit>`) associated with the accessed field. Runtime initialization is localized: accessing a particular static field will cause its runtime initializer to be executed but has no implications for other runtime initializers defined either in the field's defining class or any superclass or implemented interface of the field's defining class.
>>>
>>> When serialized from the build-time heap to the runtime heap, all runtime-initialized fields will be serialized with the zero (uninitialized) value appropriate for their type.
>>>
>>> Qbicc allows related static fields in the same class to share a common `<rtinit>` method. The first access to any of the fields will cause the execution of the associated `<rtinit>` method and the initialization of all the fields.
>>>
>>> ## Adjusting Heap Serialization
>>>
>>> For some objects it is necessary to initialize them during build-time initialization, but "reset" them before they are used at runtime.
>>> Qbicc supports this by allowing fields to be annotated to be serialized as the type-appropriate zero value or as a primitive constant value. This value replacement happens as the build time heap is serialized.
>>>
>>> One common scenario is to invalidate objects that are wrapping native resources. For example, when a `FileDescriptor` is serialized its `fd` and `handle` instance fields are serialized as `-1` and its `closed` field is serialize as `true`. Thus, any attempt to use the build-time FileDescriptor at runtime will raise the appropriate exception.
>>>
>>> ## Patching: Migration for Existing Classes
>>>
>>> The runtime initialization mechanisms described above are currently enabled via a set of annotations. This allows qbicc to implement the desired semantics without requiring any changes to the Java compiler, class file format, or language specification. In the long term, we believe small modifications to the Java specification, for example defining a `rtinit { ... }` similar to the existing `static { ... }` construct could enable a simpler specification.
>>>
>>> The primary annotation for runtime initialization is `RuntimeAspect`. This annotation is defined on a class and is interpreted as meaning that the `<clinit>` method of the class should be interpreted as an `<rtinit>` method. This method will not be executed during build-time initialization and instead will be deferred until the first access of one of the static fields defined in the class.
>>>
>>> To allow us to "externally" modify JDK core classes for qbicc, we have developed an annotation-driven patcher infrastructure. The patcher allows the declaration of patch classes that add, remove, and modify the methods and fields of an existing class. This modification includes the replacement of the `<clinit>` method and the declaration of multiple `RuntimeAspect` patch classes.
>>>
>>> The best way to explore what is possible with the patcher is to examine the java.base/src directory in the qbicc-class-library project. It makes extensive use of the patcher annotations to adapt the core JDK classes to qbicc while still allowing us to consume the upstream OpenJDK code base via an unmodified git submodule.
>>>
>>> ## Design Alternatives
>>>
>>> A number of alternatives were considered before arriving at the final design documented here. The technical discussions and options considered can be explored starting in qbicc discussion #764 on GitHub.
>>>
>>>
>>> From: Brian Goetz<brian.goetz at oracle.com>
>>> Date: Thursday, May 26, 2022 at 2:21 PM
>>> To: David P Grove<groved at us.ibm.com>,"leyden-dev at openjdk.java.net" <leyden-dev at openjdk.java.net>
>>> Subject: [EXTERNAL] Re: Experimentation with build time and runtime class initialization in qbicc
>>>
>>> Hi David; Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no
>>> ZjQcmQRYFpfptBannerStart
>>> This Message Is From an External Sender
>>> This message came from outside your organization.
>>> ZjQcmQRYFpfptBannerEnd
>>> Hi David;
>>>
>>> Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no issues there?
>>>
>>> Thanks,
>>> -Brian
>>> On 5/26/2022 12:35 PM, David P Grove wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> In the qbicc project, we’ve been exploring options for adapting Java’s class initialization semantics for native images. In particular, we are trying to arrive at a non-surprising semantics that in a native-image scenarios allows most initialization to happen at build-time while still enabling runtime initialization of selected static fields.
>>>
>>>
>>>
>>> Our current design and experience is captured here:https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc<https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc>. In a nutshell, the idea is to initialize classes via build-time execution of existing <clinit> methods as per normal Java semantics while adding per-static-field <rtinit> methods to provide a capability for runtime-reinitialization of a field before its first access.
>>>
>>>
>>>
>>> --dave
>>>
>>>
>>>
>>>
More information about the leyden-dev
mailing list