Experimentation with build time and runtime class initialization in qbicc
Brian Goetz
brian.goetz at oracle.com
Sat May 28 16:58:37 UTC 2022
I too agree that the "soupy" nature of <clinit> makes
reverse-engineering difficult, and that this alternate translation would
make things easier for an after-the-fact analysis tool that is trying to
reason about what computations could be safely shifted in time.
But, keep in mind that it's not a free lunch. To point out the obvious
tradeoff: this turns into a startup hit for every dynamically executed
Java program (larger classfiles, more bytecodes, more methods). This is
a tradeoff we would have to consider carefully, since making Java
startup slower in general is not a cost we should take on lightly,
especially given the charter of this project. So, something for the
"could consider" list, but not a slam-dunk.
On 5/28/2022 12:39 PM, Christian Wimmer wrote:
> Hi,
>
> I agree with the "soupy nature" of <clinit> methods mentioned below.
> This makes it impossible in general to reverse-engineer which parts of
> <clinit> initialize which static field. One suggestion how that could
> be improved: Instead of emitting a single <clinit> method, javac can
> emit separate <clinit_XXX> methods for each static field that is
> initialized inline as part of the field declaration, as well as each
> static{} block. With a consistent naming scheme of these methods, it
> would be much easier to run some initializations at build time and
> some at run time. For compatibility, the <clinit> method could be a
> chain of invocations of the <clinit_XXX> methods (or maybe <clinit>
> itself is no longer necessary at all).
>
> So for example a class
>
> class MyClass {
> static Object o1 = "abc";
> static {
> foo();
> }
> static Object o2 = 42;
> }
>
> the Java compiler would create the methods (written here with
> disassembled bytecode)
>
> <clinit_o1>() {
> o1 = "abc"
> }
> <clinit_$0>() {
> foo();
> }
> <clinit_o2>() {
> o2 = 42;
> }
> <clinit>() {
> <clinit_o1>();
> <clinit_$0>();
> <clinit_o2>();
> }
>
> Why such a scheme? It is much easier to prove here that the field o2
> can be initialized at build time regardless of what foo() is doing,
> and then remove the run-time initialization of o2 by replacing
> <clinit_o2> with an empty method. All of that can be done without
> analyzing and modifying the bytecode soup of the current <clinit> method.
>
> -Christian
>
>
> On 5/27/22 08:35, Dan Heidinga wrote:
>> On Thu, May 26, 2022 at 9:01 PM Brian Goetz <brian.goetz at oracle.com>
>> wrote:
>>> Thanks for providing this.
>>>
>>> Something about the qbicc approach here doesn't seem to add up to me.
>>> Maybe you can tell me what I'm missing.
>>>
>>> From reading your notes, it seems that at build time, you start with
>>> the root class(es), execute their <clinit>, which will cause loading of
>>> more classes, more <clinits>, and you iterate until there are no new
>>> classes to initialize.
>> With qbicc we embraced the closed-world constraint and mandated that
>> all class initialization happens at build time. While we started with
>> runtime class initialization to bootstrap being able to run more code,
>> we quickly switched to being all-in on build time init (BTI) due to
>> the virtuous cycle between BTI and dead code elimination.
>>
>>> You then treat the statics as roots, and
>>> serialize those objects to the initial heap image. But before doing
>>> that, you exclude (zero out) any which are marked as "reinitialize at
>>> runtime."
>> Right.
>>
>>> The rationale for this clearly is that you want to continue the graph
>>> walk to find all the loadable classes, but then don't want to use the
>>> polluted value. But what happens in cases like this:
>>>
>>> class Aliased {
>>> @RuntimeInitialized private static final Socket s = ...;
>>> private static final Socket copy = s;
>>> }
>>>
>>> Do you throw on reads of runtime-initialized fields from a
>>> <clinit>? Do
>>> you walk the heap and find aliases to runtime-initialized values, and
>>> replace them with something (if so, what?) Or is the Aliased class
>>> above just "broken" according to this model, and I encounter a
>>> stale/nonworking socket in `copy` at runtime, and one that is not
>>> properly aliased to `s`? Once an object is initialized at build time,
>>> its state can escape into all sorts of other places, and just zeroing
>>> out the static root isn't enough to stamp it out.
>> This is where the "soupy" nature of <clinit> becomes evident. <clinit>
>> is a single method that has tremendous side effects, setting static
>> fields, initializing other classes, starting threads, caching computed
>> values, etc. It's very hard to automatically reason about what has
>> happened in a <clinit> method and what the user intends for those side
>> effects (if they're even aware of what they all may be!).
>>
>> What was the user's intent when they initialized 'copy'? To record
>> what the original Socket connection - set up at build time - had been
>> rather than separately storing the address/port? If they had a
>> semantic meaning for `copy` even after `s` had been nulled out, then
>> automatically resetting `copy` would violate their expectation.
>>
>> We need the user to tell us their intent. If they wanted both `s` &
>> `copy` to be reset, then they need to be explicit about that and
>> annotate both fields. We don't attempt to null all copies of the
>> value of a @RuntimeInitialized field.
>>
>>> Am I missing something?
>> You seemed to have grasped it correctly =)
>>
>> If that field had been a primitive, such as a long, we'd be unable to
>> track down which other longs in the heap were copies of it or derived
>> from it. We wouldn't reset some other location with the value 42
>> because a @RuntimeInitialized field was set to 42 at build time. The
>> programmer has to take responsibility for which fields need to be
>> reset. With qbicc, that's annotations. With Leyden we may be able to
>> give them a better way to group fields and express how & when they
>> should be initialized.
>>
>> --Dan
>>
>>> Thanks,
>>> -Brian
>>>
>>>
>>> On 5/26/2022 4:22 PM, David P Grove wrote:
>>>> Hi,
>>>>
>>>> I’ve appended the contents of the referenced wiki
>>>> page in this email. Apologies in advance if the formatting doesn’t
>>>> come through as intended.
>>>>
>>>> There is a full implementation of this (GPLv2 +
>>>> Classpath exception) as part of the qbicc project on GitHub. There
>>>> is also a GitHub discussion in the qbicc project that links to
>>>> various GitHub issues that capture the history that led to the
>>>> current design. I will not hyperlink to those here so that if
>>>> people have any IP concerns, they can avoid seeing them. They are
>>>> easily findable.
>>>>
>>>> Regards,
>>>>
>>>> --dave
>>>>
>>>> ## Overview
>>>>
>>>> One of the goals of the qbicc project is to explore technical
>>>> approaches for adapting Java's specification of class
>>>> initialization to fully support native image compilation. Enabling
>>>> build-time evaluation of complex class initialization logic is
>>>> essential for obtaining much of the benefits of native image
>>>> compilation: reduced memory footprint and fast startup. However,
>>>> both the core JDK and many frameworks will not be primarily be used
>>>> in native image scenarios. Therefore, it is essential that the
>>>> approach taken for build-time initialization enables both the
>>>> existing runtime class initialization and the new build-time class
>>>> initialization logic to co-exist. Furthermore, for as many cases as
>>>> possible, the class initialization code should be shared between
>>>> the two usage scenarios and have non-surprising semantics in both.
>>>>
>>>> ## Build-time Initialization
>>>>
>>>> In qbicc, all classes are initialized at build-time. Class
>>>> initialization at build time is performed according to the existing
>>>> semantics of Java class initialization driven by build-time
>>>> execution of the `<clinit>` methods of reachable classes. The set
>>>> of reachable classes is determined iteratively, starting with the
>>>> program entrypoints and adding the methods and classes they utilize
>>>> until no further reachable classes are discovered (a fixed point is
>>>> reached).
>>>>
>>>> After build-time initialization has completed, a build-time heap
>>>> has been constructed that contains the objects that were created
>>>> during the build-time execution of the `<clinit>` methods. Using
>>>> the reachable static fields of the reachable program as roots, this
>>>> build-time heap is serialized into the native image. This set of
>>>> objects will form the initial runtime heap of the program when it
>>>> is executed.
>>>>
>>>> ## Runtime Initializers
>>>>
>>>> There are cases where one or more initialization actions of a class
>>>> **must** be executed at program runtime. Most typically these
>>>> involve the creation of native resources (open files, threads, etc)
>>>> that cannot be successfully serialized into the build time heap.
>>>>
>>>> Qbicc supports runtime initialization by allowing static fields of
>>>> a classes to be declared as runtime initialized. These fields will
>>>> be initialized lazily, at first access, by executing a runtime
>>>> initializer (`<rtinit>`) associated with the accessed field.
>>>> Runtime initialization is localized: accessing a particular static
>>>> field will cause its runtime initializer to be executed but has no
>>>> implications for other runtime initializers defined either in the
>>>> field's defining class or any superclass or implemented interface
>>>> of the field's defining class.
>>>>
>>>> When serialized from the build-time heap to the runtime heap, all
>>>> runtime-initialized fields will be serialized with the zero
>>>> (uninitialized) value appropriate for their type.
>>>>
>>>> Qbicc allows related static fields in the same class to share a
>>>> common `<rtinit>` method. The first access to any of the fields
>>>> will cause the execution of the associated `<rtinit>` method and
>>>> the initialization of all the fields.
>>>>
>>>> ## Adjusting Heap Serialization
>>>>
>>>> For some objects it is necessary to initialize them during
>>>> build-time initialization, but "reset" them before they are used at
>>>> runtime.
>>>> Qbicc supports this by allowing fields to be annotated to be
>>>> serialized as the type-appropriate zero value or as a primitive
>>>> constant value. This value replacement happens as the build time
>>>> heap is serialized.
>>>>
>>>> One common scenario is to invalidate objects that are wrapping
>>>> native resources. For example, when a `FileDescriptor` is
>>>> serialized its `fd` and `handle` instance fields are serialized as
>>>> `-1` and its `closed` field is serialize as `true`. Thus, any
>>>> attempt to use the build-time FileDescriptor at runtime will raise
>>>> the appropriate exception.
>>>>
>>>> ## Patching: Migration for Existing Classes
>>>>
>>>> The runtime initialization mechanisms described above are currently
>>>> enabled via a set of annotations. This allows qbicc to implement
>>>> the desired semantics without requiring any changes to the Java
>>>> compiler, class file format, or language specification. In the long
>>>> term, we believe small modifications to the Java specification, for
>>>> example defining a `rtinit { ... }` similar to the existing `static
>>>> { ... }` construct could enable a simpler specification.
>>>>
>>>> The primary annotation for runtime initialization is
>>>> `RuntimeAspect`. This annotation is defined on a class and is
>>>> interpreted as meaning that the `<clinit>` method of the class
>>>> should be interpreted as an `<rtinit>` method. This method will
>>>> not be executed during build-time initialization and instead will
>>>> be deferred until the first access of one of the static fields
>>>> defined in the class.
>>>>
>>>> To allow us to "externally" modify JDK core classes for qbicc, we
>>>> have developed an annotation-driven patcher infrastructure. The
>>>> patcher allows the declaration of patch classes that add, remove,
>>>> and modify the methods and fields of an existing class. This
>>>> modification includes the replacement of the `<clinit>` method and
>>>> the declaration of multiple `RuntimeAspect` patch classes.
>>>>
>>>> The best way to explore what is possible with the patcher is to
>>>> examine the java.base/src directory in the qbicc-class-library
>>>> project. It makes extensive use of the patcher annotations to adapt
>>>> the core JDK classes to qbicc while still allowing us to consume
>>>> the upstream OpenJDK code base via an unmodified git submodule.
>>>>
>>>> ## Design Alternatives
>>>>
>>>> A number of alternatives were considered before arriving at the
>>>> final design documented here. The technical discussions and
>>>> options considered can be explored starting in qbicc discussion
>>>> #764 on GitHub.
>>>>
>>>>
>>>> From: Brian Goetz<brian.goetz at oracle.com>
>>>> Date: Thursday, May 26, 2022 at 2:21 PM
>>>> To: David P Grove<groved at us.ibm.com>,"leyden-dev at openjdk.java.net"
>>>> <leyden-dev at openjdk.java.net>
>>>> Subject: [EXTERNAL] Re: Experimentation with build time and runtime
>>>> class initialization in qbicc
>>>>
>>>> Hi David; Would like to understand more about this, but first, from
>>>> an IP-hygiene perspective, documents linked from this list should
>>>> be under the OpenJDK terms and conditions. Can you post the
>>>> contents of that document here, so there are no
>>>> ZjQcmQRYFpfptBannerStart
>>>> This Message Is From an External Sender
>>>> This message came from outside your organization.
>>>> ZjQcmQRYFpfptBannerEnd
>>>> Hi David;
>>>>
>>>> Would like to understand more about this, but first, from an
>>>> IP-hygiene perspective, documents linked from this list should be
>>>> under the OpenJDK terms and conditions. Can you post the contents
>>>> of that document here, so there are no issues there?
>>>>
>>>> Thanks,
>>>> -Brian
>>>> On 5/26/2022 12:35 PM, David P Grove wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> In the qbicc project, we’ve been exploring options for adapting
>>>> Java’s class initialization semantics for native images. In
>>>> particular, we are trying to arrive at a non-surprising semantics
>>>> that in a native-image scenarios allows most initialization to
>>>> happen at build-time while still enabling runtime initialization of
>>>> selected static fields.
>>>>
>>>>
>>>>
>>>> Our current design and experience is captured
>>>> here:https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc<https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc>.
>>>> In a nutshell, the idea is to initialize classes via build-time
>>>> execution of existing <clinit> methods as per normal Java semantics
>>>> while adding per-static-field <rtinit> methods to provide a
>>>> capability for runtime-reinitialization of a field before its first
>>>> access.
>>>>
>>>>
>>>>
>>>> --dave
>>>>
>>>>
>>>>
>>>>
More information about the leyden-dev
mailing list