Experimentation with build time and runtime class initialization in qbicc
Volker Simonis
volker.simonis at gmail.com
Fri May 27 19:46:37 UTC 2022
Dan Heidinga <heidinga at redhat.com> schrieb am Fr., 27. Mai 2022, 08:36:
> On Thu, May 26, 2022 at 9:01 PM Brian Goetz <brian.goetz at oracle.com>
> wrote:
> >
> > Thanks for providing this.
> >
> > Something about the qbicc approach here doesn't seem to add up to me.
> > Maybe you can tell me what I'm missing.
> >
> > From reading your notes, it seems that at build time, you start with
> > the root class(es), execute their <clinit>, which will cause loading of
> > more classes, more <clinits>, and you iterate until there are no new
> > classes to initialize.
>
> With qbicc we embraced the closed-world constraint and mandated that
> all class initialization happens at build time. While we started with
> runtime class initialization to bootstrap being able to run more code,
> we quickly switched to being all-in on build time init (BTI) due to
> the virtuous cycle between BTI and dead code elimination.
>
> > You then treat the statics as roots, and
> > serialize those objects to the initial heap image. But before doing
> > that, you exclude (zero out) any which are marked as "reinitialize at
> > runtime."
>
> Right.
>
> >
> > The rationale for this clearly is that you want to continue the graph
> > walk to find all the loadable classes, but then don't want to use the
> > polluted value. But what happens in cases like this:
> >
> > class Aliased {
> > @RuntimeInitialized private static final Socket s = ...;
> > private static final Socket copy = s;
> > }
> >
> > Do you throw on reads of runtime-initialized fields from a <clinit>? Do
> > you walk the heap and find aliases to runtime-initialized values, and
> > replace them with something (if so, what?) Or is the Aliased class
> > above just "broken" according to this model, and I encounter a
> > stale/nonworking socket in `copy` at runtime, and one that is not
> > properly aliased to `s`? Once an object is initialized at build time,
> > its state can escape into all sorts of other places, and just zeroing
> > out the static root isn't enough to stamp it out.
>
> This is where the "soupy" nature of <clinit> becomes evident. <clinit>
> is a single method that has tremendous side effects, setting static
> fields, initializing other classes, starting threads, caching computed
> values, etc. It's very hard to automatically reason about what has
> happened in a <clinit> method and what the user intends for those side
> effects (if they're even aware of what they all may be!).
>
> What was the user's intent when they initialized 'copy'? To record
> what the original Socket connection - set up at build time - had been
> rather than separately storing the address/port? If they had a
> semantic meaning for `copy` even after `s` had been nulled out, then
> automatically resetting `copy` would violate their expectation.
>
> We need the user to tell us their intent. If they wanted both `s` &
> `copy` to be reset, then they need to be explicit about that and
> annotate both fields. We don't attempt to null all copies of the
> value of a @RuntimeInitialized field.
>
> >
> > Am I missing something?
>
> You seemed to have grasped it correctly =)
>
> If that field had been a primitive, such as a long, we'd be unable to
> track down which other longs in the heap were copies of it or derived
> from it. We wouldn't reset some other location with the value 42
> because a @RuntimeInitialized field was set to 42 at build time. The
> programmer has to take responsibility for which fields need to be
> reset. With qbicc, that's annotations. With Leyden we may be able to
> give them a better way to group fields and express how & when they
> should be initialized.
>
And with CRaC we don't have to care for build-time initialization at all.
Instead we just have to make sure that "relevant" fields are being reset
before snapshot and correctly re-initialized on resume.
The question is which fields have to be considered "relevant" in the CRaC
context? Intuitively this will be a subset of the @RuntimeInitialized
fields. But for CRaC this question also depends on the snapshot mechanism.
If we're using CRIU to checkpoint a single process, sockets and file
descriptions will certainly be hot candidates for @RuntimeInitialized
fields. On the other hand, if we're snapshotting a complete virtual machine
(e.g. with Firecracker) there's no need to reset/re-init file descriptors
and even sockets might be handled transparently by the OS. Docker
checkpoint is another interesting snapshotting possibility somewhere
between single process and whole VM snapshotting.
>
> --Dan
>
> >
> > Thanks,
> > -Brian
> >
> >
> > On 5/26/2022 4:22 PM, David P Grove wrote:
> > > Hi,
> > >
> > > I’ve appended the contents of the referenced wiki
> page in this email. Apologies in advance if the formatting doesn’t come
> through as intended.
> > >
> > > There is a full implementation of this (GPLv2 +
> Classpath exception) as part of the qbicc project on GitHub. There is also
> a GitHub discussion in the qbicc project that links to various GitHub
> issues that capture the history that led to the current design. I will not
> hyperlink to those here so that if people have any IP concerns, they can
> avoid seeing them. They are easily findable.
> > >
> > > Regards,
> > >
> > > --dave
> > >
> > > ## Overview
> > >
> > > One of the goals of the qbicc project is to explore technical
> approaches for adapting Java's specification of class initialization to
> fully support native image compilation. Enabling build-time evaluation of
> complex class initialization logic is essential for obtaining much of the
> benefits of native image compilation: reduced memory footprint and fast
> startup. However, both the core JDK and many frameworks will not be
> primarily be used in native image scenarios. Therefore, it is essential
> that the approach taken for build-time initialization enables both the
> existing runtime class initialization and the new build-time class
> initialization logic to co-exist. Furthermore, for as many cases as
> possible, the class initialization code should be shared between the two
> usage scenarios and have non-surprising semantics in both.
> > >
> > > ## Build-time Initialization
> > >
> > > In qbicc, all classes are initialized at build-time. Class
> initialization at build time is performed according to the existing
> semantics of Java class initialization driven by build-time execution of
> the `<clinit>` methods of reachable classes. The set of reachable classes
> is determined iteratively, starting with the program entrypoints and adding
> the methods and classes they utilize until no further reachable classes are
> discovered (a fixed point is reached).
> > >
> > > After build-time initialization has completed, a build-time heap has
> been constructed that contains the objects that were created during the
> build-time execution of the `<clinit>` methods. Using the reachable static
> fields of the reachable program as roots, this build-time heap is
> serialized into the native image. This set of objects will form the
> initial runtime heap of the program when it is executed.
> > >
> > > ## Runtime Initializers
> > >
> > > There are cases where one or more initialization actions of a class
> **must** be executed at program runtime. Most typically these involve the
> creation of native resources (open files, threads, etc) that cannot be
> successfully serialized into the build time heap.
> > >
> > > Qbicc supports runtime initialization by allowing static fields of a
> classes to be declared as runtime initialized. These fields will be
> initialized lazily, at first access, by executing a runtime initializer
> (`<rtinit>`) associated with the accessed field. Runtime initialization is
> localized: accessing a particular static field will cause its runtime
> initializer to be executed but has no implications for other runtime
> initializers defined either in the field's defining class or any superclass
> or implemented interface of the field's defining class.
> > >
> > > When serialized from the build-time heap to the runtime heap, all
> runtime-initialized fields will be serialized with the zero (uninitialized)
> value appropriate for their type.
> > >
> > > Qbicc allows related static fields in the same class to share a common
> `<rtinit>` method. The first access to any of the fields will cause the
> execution of the associated `<rtinit>` method and the initialization of all
> the fields.
> > >
> > > ## Adjusting Heap Serialization
> > >
> > > For some objects it is necessary to initialize them during build-time
> initialization, but "reset" them before they are used at runtime.
> > > Qbicc supports this by allowing fields to be annotated to be
> serialized as the type-appropriate zero value or as a primitive constant
> value. This value replacement happens as the build time heap is serialized.
> > >
> > > One common scenario is to invalidate objects that are wrapping native
> resources. For example, when a `FileDescriptor` is serialized its `fd` and
> `handle` instance fields are serialized as `-1` and its `closed` field is
> serialize as `true`. Thus, any attempt to use the build-time FileDescriptor
> at runtime will raise the appropriate exception.
> > >
> > > ## Patching: Migration for Existing Classes
> > >
> > > The runtime initialization mechanisms described above are currently
> enabled via a set of annotations. This allows qbicc to implement the
> desired semantics without requiring any changes to the Java compiler, class
> file format, or language specification. In the long term, we believe small
> modifications to the Java specification, for example defining a `rtinit {
> ... }` similar to the existing `static { ... }` construct could enable a
> simpler specification.
> > >
> > > The primary annotation for runtime initialization is `RuntimeAspect`.
> This annotation is defined on a class and is interpreted as meaning that
> the `<clinit>` method of the class should be interpreted as an `<rtinit>`
> method. This method will not be executed during build-time initialization
> and instead will be deferred until the first access of one of the static
> fields defined in the class.
> > >
> > > To allow us to "externally" modify JDK core classes for qbicc, we have
> developed an annotation-driven patcher infrastructure. The patcher allows
> the declaration of patch classes that add, remove, and modify the methods
> and fields of an existing class. This modification includes the
> replacement of the `<clinit>` method and the declaration of multiple
> `RuntimeAspect` patch classes.
> > >
> > > The best way to explore what is possible with the patcher is to
> examine the java.base/src directory in the qbicc-class-library project. It
> makes extensive use of the patcher annotations to adapt the core JDK
> classes to qbicc while still allowing us to consume the upstream OpenJDK
> code base via an unmodified git submodule.
> > >
> > > ## Design Alternatives
> > >
> > > A number of alternatives were considered before arriving at the final
> design documented here. The technical discussions and options considered
> can be explored starting in qbicc discussion #764 on GitHub.
> > >
> > >
> > > From: Brian Goetz<brian.goetz at oracle.com>
> > > Date: Thursday, May 26, 2022 at 2:21 PM
> > > To: David P Grove<groved at us.ibm.com>,"leyden-dev at openjdk.java.net" <
> leyden-dev at openjdk.java.net>
> > > Subject: [EXTERNAL] Re: Experimentation with build time and runtime
> class initialization in qbicc
> > >
> > > Hi David; Would like to understand more about this, but first, from an
> IP-hygiene perspective, documents linked from this list should be under the
> OpenJDK terms and conditions. Can you post the contents of that document
> here, so there are no
> > > ZjQcmQRYFpfptBannerStart
> > > This Message Is From an External Sender
> > > This message came from outside your organization.
> > > ZjQcmQRYFpfptBannerEnd
> > > Hi David;
> > >
> > > Would like to understand more about this, but first, from an
> IP-hygiene perspective, documents linked from this list should be under the
> OpenJDK terms and conditions. Can you post the contents of that document
> here, so there are no issues there?
> > >
> > > Thanks,
> > > -Brian
> > > On 5/26/2022 12:35 PM, David P Grove wrote:
> > >
> > > Hi,
> > >
> > >
> > >
> > > In the qbicc project, we’ve been exploring options for adapting Java’s
> class initialization semantics for native images. In particular, we are
> trying to arrive at a non-surprising semantics that in a native-image
> scenarios allows most initialization to happen at build-time while still
> enabling runtime initialization of selected static fields.
> > >
> > >
> > >
> > > Our current design and experience is captured here:
> https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc<
> https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc>. In a
> nutshell, the idea is to initialize classes via build-time execution of
> existing <clinit> methods as per normal Java semantics while adding
> per-static-field <rtinit> methods to provide a capability for
> runtime-reinitialization of a field before its first access.
> > >
> > >
> > >
> > > --dave
> > >
> > >
> > >
> > >
> >
>
>
More information about the leyden-dev
mailing list