Experimentation with build time and runtime class initialization in qbicc
Christian Wimmer
christian.wimmer at oracle.com
Sat May 28 17:15:23 UTC 2022
Certainly everything comes with a tradeoff.
But I would argue that the cost of the current workaround to influence
static field initializations - make a separate static inner class for a
static field that should be initialized separately - is even higher
because it requires a full class data structure just to hold a single
field. Even in the JDK, the number of inner classes named "Lazy" is
growing. A more fine-grained initialization of fields within a class can
help to reduce such overhead.
-Christian
On 5/28/22 09:58, Brian Goetz wrote:
> I too agree that the "soupy" nature of <clinit> makes
> reverse-engineering difficult, and that this alternate translation
> would make things easier for an after-the-fact analysis tool that is
> trying to reason about what computations could be safely shifted in time.
>
> But, keep in mind that it's not a free lunch. To point out the
> obvious tradeoff: this turns into a startup hit for every dynamically
> executed Java program (larger classfiles, more bytecodes, more
> methods). This is a tradeoff we would have to consider carefully,
> since making Java startup slower in general is not a cost we should
> take on lightly, especially given the charter of this project. So,
> something for the "could consider" list, but not a slam-dunk.
>
>
>
>
>
> On 5/28/2022 12:39 PM, Christian Wimmer wrote:
>> Hi,
>>
>> I agree with the "soupy nature" of <clinit> methods mentioned below.
>> This makes it impossible in general to reverse-engineer which parts
>> of <clinit> initialize which static field. One suggestion how that
>> could be improved: Instead of emitting a single <clinit> method,
>> javac can emit separate <clinit_XXX> methods for each static field
>> that is initialized inline as part of the field declaration, as well
>> as each static{} block. With a consistent naming scheme of these
>> methods, it would be much easier to run some initializations at build
>> time and some at run time. For compatibility, the <clinit> method
>> could be a chain of invocations of the <clinit_XXX> methods (or maybe
>> <clinit> itself is no longer necessary at all).
>>
>> So for example a class
>>
>> class MyClass {
>> static Object o1 = "abc";
>> static {
>> foo();
>> }
>> static Object o2 = 42;
>> }
>>
>> the Java compiler would create the methods (written here with
>> disassembled bytecode)
>>
>> <clinit_o1>() {
>> o1 = "abc"
>> }
>> <clinit_$0>() {
>> foo();
>> }
>> <clinit_o2>() {
>> o2 = 42;
>> }
>> <clinit>() {
>> <clinit_o1>();
>> <clinit_$0>();
>> <clinit_o2>();
>> }
>>
>> Why such a scheme? It is much easier to prove here that the field o2
>> can be initialized at build time regardless of what foo() is doing,
>> and then remove the run-time initialization of o2 by replacing
>> <clinit_o2> with an empty method. All of that can be done without
>> analyzing and modifying the bytecode soup of the current <clinit>
>> method.
>>
>> -Christian
>>
>>
>> On 5/27/22 08:35, Dan Heidinga wrote:
>>> On Thu, May 26, 2022 at 9:01 PM Brian Goetz <brian.goetz at oracle.com>
>>> wrote:
>>>> Thanks for providing this.
>>>>
>>>> Something about the qbicc approach here doesn't seem to add up to me.
>>>> Maybe you can tell me what I'm missing.
>>>>
>>>> From reading your notes, it seems that at build time, you start with
>>>> the root class(es), execute their <clinit>, which will cause
>>>> loading of
>>>> more classes, more <clinits>, and you iterate until there are no new
>>>> classes to initialize.
>>> With qbicc we embraced the closed-world constraint and mandated that
>>> all class initialization happens at build time. While we started with
>>> runtime class initialization to bootstrap being able to run more code,
>>> we quickly switched to being all-in on build time init (BTI) due to
>>> the virtuous cycle between BTI and dead code elimination.
>>>
>>>> You then treat the statics as roots, and
>>>> serialize those objects to the initial heap image. But before doing
>>>> that, you exclude (zero out) any which are marked as "reinitialize at
>>>> runtime."
>>> Right.
>>>
>>>> The rationale for this clearly is that you want to continue the graph
>>>> walk to find all the loadable classes, but then don't want to use the
>>>> polluted value. But what happens in cases like this:
>>>>
>>>> class Aliased {
>>>> @RuntimeInitialized private static final Socket s = ...;
>>>> private static final Socket copy = s;
>>>> }
>>>>
>>>> Do you throw on reads of runtime-initialized fields from a
>>>> <clinit>? Do
>>>> you walk the heap and find aliases to runtime-initialized values, and
>>>> replace them with something (if so, what?) Or is the Aliased class
>>>> above just "broken" according to this model, and I encounter a
>>>> stale/nonworking socket in `copy` at runtime, and one that is not
>>>> properly aliased to `s`? Once an object is initialized at build time,
>>>> its state can escape into all sorts of other places, and just zeroing
>>>> out the static root isn't enough to stamp it out.
>>> This is where the "soupy" nature of <clinit> becomes evident. <clinit>
>>> is a single method that has tremendous side effects, setting static
>>> fields, initializing other classes, starting threads, caching computed
>>> values, etc. It's very hard to automatically reason about what has
>>> happened in a <clinit> method and what the user intends for those side
>>> effects (if they're even aware of what they all may be!).
>>>
>>> What was the user's intent when they initialized 'copy'? To record
>>> what the original Socket connection - set up at build time - had been
>>> rather than separately storing the address/port? If they had a
>>> semantic meaning for `copy` even after `s` had been nulled out, then
>>> automatically resetting `copy` would violate their expectation.
>>>
>>> We need the user to tell us their intent. If they wanted both `s` &
>>> `copy` to be reset, then they need to be explicit about that and
>>> annotate both fields. We don't attempt to null all copies of the
>>> value of a @RuntimeInitialized field.
>>>
>>>> Am I missing something?
>>> You seemed to have grasped it correctly =)
>>>
>>> If that field had been a primitive, such as a long, we'd be unable to
>>> track down which other longs in the heap were copies of it or derived
>>> from it. We wouldn't reset some other location with the value 42
>>> because a @RuntimeInitialized field was set to 42 at build time. The
>>> programmer has to take responsibility for which fields need to be
>>> reset. With qbicc, that's annotations. With Leyden we may be able to
>>> give them a better way to group fields and express how & when they
>>> should be initialized.
>>>
>>> --Dan
>>>
>>>> Thanks,
>>>> -Brian
>>>>
>>>>
>>>> On 5/26/2022 4:22 PM, David P Grove wrote:
>>>>> Hi,
>>>>>
>>>>> I’ve appended the contents of the referenced
>>>>> wiki page in this email. Apologies in advance if the formatting
>>>>> doesn’t come through as intended.
>>>>>
>>>>> There is a full implementation of this (GPLv2 +
>>>>> Classpath exception) as part of the qbicc project on GitHub.
>>>>> There is also a GitHub discussion in the qbicc project that links
>>>>> to various GitHub issues that capture the history that led to the
>>>>> current design. I will not hyperlink to those here so that if
>>>>> people have any IP concerns, they can avoid seeing them. They are
>>>>> easily findable.
>>>>>
>>>>> Regards,
>>>>>
>>>>> --dave
>>>>>
>>>>> ## Overview
>>>>>
>>>>> One of the goals of the qbicc project is to explore technical
>>>>> approaches for adapting Java's specification of class
>>>>> initialization to fully support native image compilation.
>>>>> Enabling build-time evaluation of complex class initialization
>>>>> logic is essential for obtaining much of the benefits of native
>>>>> image compilation: reduced memory footprint and fast startup.
>>>>> However, both the core JDK and many frameworks will not be
>>>>> primarily be used in native image scenarios. Therefore, it is
>>>>> essential that the approach taken for build-time initialization
>>>>> enables both the existing runtime class initialization and the new
>>>>> build-time class initialization logic to co-exist. Furthermore,
>>>>> for as many cases as possible, the class initialization code
>>>>> should be shared between the two usage scenarios and have
>>>>> non-surprising semantics in both.
>>>>>
>>>>> ## Build-time Initialization
>>>>>
>>>>> In qbicc, all classes are initialized at build-time. Class
>>>>> initialization at build time is performed according to the
>>>>> existing semantics of Java class initialization driven by
>>>>> build-time execution of the `<clinit>` methods of reachable
>>>>> classes. The set of reachable classes is determined iteratively,
>>>>> starting with the program entrypoints and adding the methods and
>>>>> classes they utilize until no further reachable classes are
>>>>> discovered (a fixed point is reached).
>>>>>
>>>>> After build-time initialization has completed, a build-time heap
>>>>> has been constructed that contains the objects that were created
>>>>> during the build-time execution of the `<clinit>` methods. Using
>>>>> the reachable static fields of the reachable program as roots,
>>>>> this build-time heap is serialized into the native image. This
>>>>> set of objects will form the initial runtime heap of the program
>>>>> when it is executed.
>>>>>
>>>>> ## Runtime Initializers
>>>>>
>>>>> There are cases where one or more initialization actions of a
>>>>> class **must** be executed at program runtime. Most typically
>>>>> these involve the creation of native resources (open files,
>>>>> threads, etc) that cannot be successfully serialized into the
>>>>> build time heap.
>>>>>
>>>>> Qbicc supports runtime initialization by allowing static fields of
>>>>> a classes to be declared as runtime initialized. These fields will
>>>>> be initialized lazily, at first access, by executing a runtime
>>>>> initializer (`<rtinit>`) associated with the accessed field.
>>>>> Runtime initialization is localized: accessing a particular static
>>>>> field will cause its runtime initializer to be executed but has no
>>>>> implications for other runtime initializers defined either in the
>>>>> field's defining class or any superclass or implemented interface
>>>>> of the field's defining class.
>>>>>
>>>>> When serialized from the build-time heap to the runtime heap, all
>>>>> runtime-initialized fields will be serialized with the zero
>>>>> (uninitialized) value appropriate for their type.
>>>>>
>>>>> Qbicc allows related static fields in the same class to share a
>>>>> common `<rtinit>` method. The first access to any of the fields
>>>>> will cause the execution of the associated `<rtinit>` method and
>>>>> the initialization of all the fields.
>>>>>
>>>>> ## Adjusting Heap Serialization
>>>>>
>>>>> For some objects it is necessary to initialize them during
>>>>> build-time initialization, but "reset" them before they are used
>>>>> at runtime.
>>>>> Qbicc supports this by allowing fields to be annotated to be
>>>>> serialized as the type-appropriate zero value or as a primitive
>>>>> constant value. This value replacement happens as the build time
>>>>> heap is serialized.
>>>>>
>>>>> One common scenario is to invalidate objects that are wrapping
>>>>> native resources. For example, when a `FileDescriptor` is
>>>>> serialized its `fd` and `handle` instance fields are serialized as
>>>>> `-1` and its `closed` field is serialize as `true`. Thus, any
>>>>> attempt to use the build-time FileDescriptor at runtime will raise
>>>>> the appropriate exception.
>>>>>
>>>>> ## Patching: Migration for Existing Classes
>>>>>
>>>>> The runtime initialization mechanisms described above are
>>>>> currently enabled via a set of annotations. This allows qbicc to
>>>>> implement the desired semantics without requiring any changes to
>>>>> the Java compiler, class file format, or language specification.
>>>>> In the long term, we believe small modifications to the Java
>>>>> specification, for example defining a `rtinit { ... }` similar to
>>>>> the existing `static { ... }` construct could enable a simpler
>>>>> specification.
>>>>>
>>>>> The primary annotation for runtime initialization is
>>>>> `RuntimeAspect`. This annotation is defined on a class and is
>>>>> interpreted as meaning that the `<clinit>` method of the class
>>>>> should be interpreted as an `<rtinit>` method. This method will
>>>>> not be executed during build-time initialization and instead will
>>>>> be deferred until the first access of one of the static fields
>>>>> defined in the class.
>>>>>
>>>>> To allow us to "externally" modify JDK core classes for qbicc, we
>>>>> have developed an annotation-driven patcher infrastructure. The
>>>>> patcher allows the declaration of patch classes that add, remove,
>>>>> and modify the methods and fields of an existing class. This
>>>>> modification includes the replacement of the `<clinit>` method and
>>>>> the declaration of multiple `RuntimeAspect` patch classes.
>>>>>
>>>>> The best way to explore what is possible with the patcher is to
>>>>> examine the java.base/src directory in the qbicc-class-library
>>>>> project. It makes extensive use of the patcher annotations to
>>>>> adapt the core JDK classes to qbicc while still allowing us to
>>>>> consume the upstream OpenJDK code base via an unmodified git
>>>>> submodule.
>>>>>
>>>>> ## Design Alternatives
>>>>>
>>>>> A number of alternatives were considered before arriving at the
>>>>> final design documented here. The technical discussions and
>>>>> options considered can be explored starting in qbicc discussion
>>>>> #764 on GitHub.
>>>>>
>>>>>
>>>>> From: Brian Goetz<brian.goetz at oracle.com>
>>>>> Date: Thursday, May 26, 2022 at 2:21 PM
>>>>> To: David P Grove<groved at us.ibm.com>,"leyden-dev at openjdk.java.net"
>>>>> <leyden-dev at openjdk.java.net>
>>>>> Subject: [EXTERNAL] Re: Experimentation with build time and
>>>>> runtime class initialization in qbicc
>>>>>
>>>>> Hi David; Would like to understand more about this, but first,
>>>>> from an IP-hygiene perspective, documents linked from this list
>>>>> should be under the OpenJDK terms and conditions. Can you post the
>>>>> contents of that document here, so there are no
>>>>> ZjQcmQRYFpfptBannerStart
>>>>> This Message Is From an External Sender
>>>>> This message came from outside your organization.
>>>>> ZjQcmQRYFpfptBannerEnd
>>>>> Hi David;
>>>>>
>>>>> Would like to understand more about this, but first, from an
>>>>> IP-hygiene perspective, documents linked from this list should be
>>>>> under the OpenJDK terms and conditions. Can you post the contents
>>>>> of that document here, so there are no issues there?
>>>>>
>>>>> Thanks,
>>>>> -Brian
>>>>> On 5/26/2022 12:35 PM, David P Grove wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> In the qbicc project, we’ve been exploring options for adapting
>>>>> Java’s class initialization semantics for native images. In
>>>>> particular, we are trying to arrive at a non-surprising semantics
>>>>> that in a native-image scenarios allows most initialization to
>>>>> happen at build-time while still enabling runtime initialization
>>>>> of selected static fields.
>>>>>
>>>>>
>>>>>
>>>>> Our current design and experience is captured
>>>>> here:https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc<https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc>.
>>>>> In a nutshell, the idea is to initialize classes via build-time
>>>>> execution of existing <clinit> methods as per normal Java
>>>>> semantics while adding per-static-field <rtinit> methods to
>>>>> provide a capability for runtime-reinitialization of a field
>>>>> before its first access.
>>>>>
>>>>>
>>>>>
>>>>> --dave
>>>>>
>>>>>
>>>>>
>>>>>
>
More information about the leyden-dev
mailing list