Re: Experimentation with build time and runtime class initialization in qbicc
Hi, I’ve appended the contents of the referenced wiki page in this email. Apologies in advance if the formatting doesn’t come through as intended. There is a full implementation of this (GPLv2 + Classpath exception) as part of the qbicc project on GitHub. There is also a GitHub discussion in the qbicc project that links to various GitHub issues that capture the history that led to the current design. I will not hyperlink to those here so that if people have any IP concerns, they can avoid seeing them. They are easily findable. Regards, --dave ## Overview One of the goals of the qbicc project is to explore technical approaches for adapting Java's specification of class initialization to fully support native image compilation. Enabling build-time evaluation of complex class initialization logic is essential for obtaining much of the benefits of native image compilation: reduced memory footprint and fast startup. However, both the core JDK and many frameworks will not be primarily be used in native image scenarios. Therefore, it is essential that the approach taken for build-time initialization enables both the existing runtime class initialization and the new build-time class initialization logic to co-exist. Furthermore, for as many cases as possible, the class initialization code should be shared between the two usage scenarios and have non-surprising semantics in both. ## Build-time Initialization In qbicc, all classes are initialized at build-time. Class initialization at build time is performed according to the existing semantics of Java class initialization driven by build-time execution of the `<clinit>` methods of reachable classes. The set of reachable classes is determined iteratively, starting with the program entrypoints and adding the methods and classes they utilize until no further reachable classes are discovered (a fixed point is reached). After build-time initialization has completed, a build-time heap has been constructed that contains the objects that were created during the build-time execution of the `<clinit>` methods. Using the reachable static fields of the reachable program as roots, this build-time heap is serialized into the native image. This set of objects will form the initial runtime heap of the program when it is executed. ## Runtime Initializers There are cases where one or more initialization actions of a class **must** be executed at program runtime. Most typically these involve the creation of native resources (open files, threads, etc) that cannot be successfully serialized into the build time heap. Qbicc supports runtime initialization by allowing static fields of a classes to be declared as runtime initialized. These fields will be initialized lazily, at first access, by executing a runtime initializer (`<rtinit>`) associated with the accessed field. Runtime initialization is localized: accessing a particular static field will cause its runtime initializer to be executed but has no implications for other runtime initializers defined either in the field's defining class or any superclass or implemented interface of the field's defining class. When serialized from the build-time heap to the runtime heap, all runtime-initialized fields will be serialized with the zero (uninitialized) value appropriate for their type. Qbicc allows related static fields in the same class to share a common `<rtinit>` method. The first access to any of the fields will cause the execution of the associated `<rtinit>` method and the initialization of all the fields. ## Adjusting Heap Serialization For some objects it is necessary to initialize them during build-time initialization, but "reset" them before they are used at runtime. Qbicc supports this by allowing fields to be annotated to be serialized as the type-appropriate zero value or as a primitive constant value. This value replacement happens as the build time heap is serialized. One common scenario is to invalidate objects that are wrapping native resources. For example, when a `FileDescriptor` is serialized its `fd` and `handle` instance fields are serialized as `-1` and its `closed` field is serialize as `true`. Thus, any attempt to use the build-time FileDescriptor at runtime will raise the appropriate exception. ## Patching: Migration for Existing Classes The runtime initialization mechanisms described above are currently enabled via a set of annotations. This allows qbicc to implement the desired semantics without requiring any changes to the Java compiler, class file format, or language specification. In the long term, we believe small modifications to the Java specification, for example defining a `rtinit { ... }` similar to the existing `static { ... }` construct could enable a simpler specification. The primary annotation for runtime initialization is `RuntimeAspect`. This annotation is defined on a class and is interpreted as meaning that the `<clinit>` method of the class should be interpreted as an `<rtinit>` method. This method will not be executed during build-time initialization and instead will be deferred until the first access of one of the static fields defined in the class. To allow us to "externally" modify JDK core classes for qbicc, we have developed an annotation-driven patcher infrastructure. The patcher allows the declaration of patch classes that add, remove, and modify the methods and fields of an existing class. This modification includes the replacement of the `<clinit>` method and the declaration of multiple `RuntimeAspect` patch classes. The best way to explore what is possible with the patcher is to examine the java.base/src directory in the qbicc-class-library project. It makes extensive use of the patcher annotations to adapt the core JDK classes to qbicc while still allowing us to consume the upstream OpenJDK code base via an unmodified git submodule. ## Design Alternatives A number of alternatives were considered before arriving at the final design documented here. The technical discussions and options considered can be explored starting in qbicc discussion #764 on GitHub. From: Brian Goetz <brian.goetz@oracle.com> Date: Thursday, May 26, 2022 at 2:21 PM To: David P Grove <groved@us.ibm.com>, "leyden-dev@openjdk.java.net" <leyden-dev@openjdk.java.net> Subject: [EXTERNAL] Re: Experimentation with build time and runtime class initialization in qbicc Hi David; Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi David; Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no issues there? Thanks, -Brian On 5/26/2022 12:35 PM, David P Grove wrote: Hi, In the qbicc project, we’ve been exploring options for adapting Java’s class initialization semantics for native images. In particular, we are trying to arrive at a non-surprising semantics that in a native-image scenarios allows most initialization to happen at build-time while still enabling runtime initialization of selected static fields. Our current design and experience is captured here: https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc<https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc>. In a nutshell, the idea is to initialize classes via build-time execution of existing <clinit> methods as per normal Java semantics while adding per-static-field <rtinit> methods to provide a capability for runtime-reinitialization of a field before its first access. --dave
Thanks for providing this. Something about the qbicc approach here doesn't seem to add up to me. Maybe you can tell me what I'm missing. From reading your notes, it seems that at build time, you start with the root class(es), execute their <clinit>, which will cause loading of more classes, more <clinits>, and you iterate until there are no new classes to initialize. You then treat the statics as roots, and serialize those objects to the initial heap image. But before doing that, you exclude (zero out) any which are marked as "reinitialize at runtime." The rationale for this clearly is that you want to continue the graph walk to find all the loadable classes, but then don't want to use the polluted value. But what happens in cases like this: class Aliased { @RuntimeInitialized private static final Socket s = ...; private static final Socket copy = s; } Do you throw on reads of runtime-initialized fields from a <clinit>? Do you walk the heap and find aliases to runtime-initialized values, and replace them with something (if so, what?) Or is the Aliased class above just "broken" according to this model, and I encounter a stale/nonworking socket in `copy` at runtime, and one that is not properly aliased to `s`? Once an object is initialized at build time, its state can escape into all sorts of other places, and just zeroing out the static root isn't enough to stamp it out. Am I missing something? Thanks, -Brian On 5/26/2022 4:22 PM, David P Grove wrote:
Hi,
I’ve appended the contents of the referenced wiki page in this email. Apologies in advance if the formatting doesn’t come through as intended.
There is a full implementation of this (GPLv2 + Classpath exception) as part of the qbicc project on GitHub. There is also a GitHub discussion in the qbicc project that links to various GitHub issues that capture the history that led to the current design. I will not hyperlink to those here so that if people have any IP concerns, they can avoid seeing them. They are easily findable.
Regards,
--dave
## Overview
One of the goals of the qbicc project is to explore technical approaches for adapting Java's specification of class initialization to fully support native image compilation. Enabling build-time evaluation of complex class initialization logic is essential for obtaining much of the benefits of native image compilation: reduced memory footprint and fast startup. However, both the core JDK and many frameworks will not be primarily be used in native image scenarios. Therefore, it is essential that the approach taken for build-time initialization enables both the existing runtime class initialization and the new build-time class initialization logic to co-exist. Furthermore, for as many cases as possible, the class initialization code should be shared between the two usage scenarios and have non-surprising semantics in both.
## Build-time Initialization
In qbicc, all classes are initialized at build-time. Class initialization at build time is performed according to the existing semantics of Java class initialization driven by build-time execution of the `<clinit>` methods of reachable classes. The set of reachable classes is determined iteratively, starting with the program entrypoints and adding the methods and classes they utilize until no further reachable classes are discovered (a fixed point is reached).
After build-time initialization has completed, a build-time heap has been constructed that contains the objects that were created during the build-time execution of the `<clinit>` methods. Using the reachable static fields of the reachable program as roots, this build-time heap is serialized into the native image. This set of objects will form the initial runtime heap of the program when it is executed.
## Runtime Initializers
There are cases where one or more initialization actions of a class **must** be executed at program runtime. Most typically these involve the creation of native resources (open files, threads, etc) that cannot be successfully serialized into the build time heap.
Qbicc supports runtime initialization by allowing static fields of a classes to be declared as runtime initialized. These fields will be initialized lazily, at first access, by executing a runtime initializer (`<rtinit>`) associated with the accessed field. Runtime initialization is localized: accessing a particular static field will cause its runtime initializer to be executed but has no implications for other runtime initializers defined either in the field's defining class or any superclass or implemented interface of the field's defining class.
When serialized from the build-time heap to the runtime heap, all runtime-initialized fields will be serialized with the zero (uninitialized) value appropriate for their type.
Qbicc allows related static fields in the same class to share a common `<rtinit>` method. The first access to any of the fields will cause the execution of the associated `<rtinit>` method and the initialization of all the fields.
## Adjusting Heap Serialization
For some objects it is necessary to initialize them during build-time initialization, but "reset" them before they are used at runtime. Qbicc supports this by allowing fields to be annotated to be serialized as the type-appropriate zero value or as a primitive constant value. This value replacement happens as the build time heap is serialized.
One common scenario is to invalidate objects that are wrapping native resources. For example, when a `FileDescriptor` is serialized its `fd` and `handle` instance fields are serialized as `-1` and its `closed` field is serialize as `true`. Thus, any attempt to use the build-time FileDescriptor at runtime will raise the appropriate exception.
## Patching: Migration for Existing Classes
The runtime initialization mechanisms described above are currently enabled via a set of annotations. This allows qbicc to implement the desired semantics without requiring any changes to the Java compiler, class file format, or language specification. In the long term, we believe small modifications to the Java specification, for example defining a `rtinit { ... }` similar to the existing `static { ... }` construct could enable a simpler specification.
The primary annotation for runtime initialization is `RuntimeAspect`. This annotation is defined on a class and is interpreted as meaning that the `<clinit>` method of the class should be interpreted as an `<rtinit>` method. This method will not be executed during build-time initialization and instead will be deferred until the first access of one of the static fields defined in the class.
To allow us to "externally" modify JDK core classes for qbicc, we have developed an annotation-driven patcher infrastructure. The patcher allows the declaration of patch classes that add, remove, and modify the methods and fields of an existing class. This modification includes the replacement of the `<clinit>` method and the declaration of multiple `RuntimeAspect` patch classes.
The best way to explore what is possible with the patcher is to examine the java.base/src directory in the qbicc-class-library project. It makes extensive use of the patcher annotations to adapt the core JDK classes to qbicc while still allowing us to consume the upstream OpenJDK code base via an unmodified git submodule.
## Design Alternatives
A number of alternatives were considered before arriving at the final design documented here. The technical discussions and options considered can be explored starting in qbicc discussion #764 on GitHub.
From: Brian Goetz<brian.goetz@oracle.com> Date: Thursday, May 26, 2022 at 2:21 PM To: David P Grove<groved@us.ibm.com>,"leyden-dev@openjdk.java.net" <leyden-dev@openjdk.java.net> Subject: [EXTERNAL] Re: Experimentation with build time and runtime class initialization in qbicc
Hi David; Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi David;
Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no issues there?
Thanks, -Brian On 5/26/2022 12:35 PM, David P Grove wrote:
Hi,
In the qbicc project, we’ve been exploring options for adapting Java’s class initialization semantics for native images. In particular, we are trying to arrive at a non-surprising semantics that in a native-image scenarios allows most initialization to happen at build-time while still enabling runtime initialization of selected static fields.
Our current design and experience is captured here:https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc<https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc>. In a nutshell, the idea is to initialize classes via build-time execution of existing <clinit> methods as per normal Java semantics while adding per-static-field <rtinit> methods to provide a capability for runtime-reinitialization of a field before its first access.
--dave
On Thu, May 26, 2022 at 9:01 PM Brian Goetz <brian.goetz@oracle.com> wrote:
Thanks for providing this.
Something about the qbicc approach here doesn't seem to add up to me. Maybe you can tell me what I'm missing.
From reading your notes, it seems that at build time, you start with the root class(es), execute their <clinit>, which will cause loading of more classes, more <clinits>, and you iterate until there are no new classes to initialize.
With qbicc we embraced the closed-world constraint and mandated that all class initialization happens at build time. While we started with runtime class initialization to bootstrap being able to run more code, we quickly switched to being all-in on build time init (BTI) due to the virtuous cycle between BTI and dead code elimination.
You then treat the statics as roots, and serialize those objects to the initial heap image. But before doing that, you exclude (zero out) any which are marked as "reinitialize at runtime."
Right.
The rationale for this clearly is that you want to continue the graph walk to find all the loadable classes, but then don't want to use the polluted value. But what happens in cases like this:
class Aliased { @RuntimeInitialized private static final Socket s = ...; private static final Socket copy = s; }
Do you throw on reads of runtime-initialized fields from a <clinit>? Do you walk the heap and find aliases to runtime-initialized values, and replace them with something (if so, what?) Or is the Aliased class above just "broken" according to this model, and I encounter a stale/nonworking socket in `copy` at runtime, and one that is not properly aliased to `s`? Once an object is initialized at build time, its state can escape into all sorts of other places, and just zeroing out the static root isn't enough to stamp it out.
This is where the "soupy" nature of <clinit> becomes evident. <clinit> is a single method that has tremendous side effects, setting static fields, initializing other classes, starting threads, caching computed values, etc. It's very hard to automatically reason about what has happened in a <clinit> method and what the user intends for those side effects (if they're even aware of what they all may be!). What was the user's intent when they initialized 'copy'? To record what the original Socket connection - set up at build time - had been rather than separately storing the address/port? If they had a semantic meaning for `copy` even after `s` had been nulled out, then automatically resetting `copy` would violate their expectation. We need the user to tell us their intent. If they wanted both `s` & `copy` to be reset, then they need to be explicit about that and annotate both fields. We don't attempt to null all copies of the value of a @RuntimeInitialized field.
Am I missing something?
You seemed to have grasped it correctly =) If that field had been a primitive, such as a long, we'd be unable to track down which other longs in the heap were copies of it or derived from it. We wouldn't reset some other location with the value 42 because a @RuntimeInitialized field was set to 42 at build time. The programmer has to take responsibility for which fields need to be reset. With qbicc, that's annotations. With Leyden we may be able to give them a better way to group fields and express how & when they should be initialized. --Dan
Thanks, -Brian
On 5/26/2022 4:22 PM, David P Grove wrote:
Hi,
I’ve appended the contents of the referenced wiki page in this email. Apologies in advance if the formatting doesn’t come through as intended.
There is a full implementation of this (GPLv2 + Classpath exception) as part of the qbicc project on GitHub. There is also a GitHub discussion in the qbicc project that links to various GitHub issues that capture the history that led to the current design. I will not hyperlink to those here so that if people have any IP concerns, they can avoid seeing them. They are easily findable.
Regards,
--dave
## Overview
One of the goals of the qbicc project is to explore technical approaches for adapting Java's specification of class initialization to fully support native image compilation. Enabling build-time evaluation of complex class initialization logic is essential for obtaining much of the benefits of native image compilation: reduced memory footprint and fast startup. However, both the core JDK and many frameworks will not be primarily be used in native image scenarios. Therefore, it is essential that the approach taken for build-time initialization enables both the existing runtime class initialization and the new build-time class initialization logic to co-exist. Furthermore, for as many cases as possible, the class initialization code should be shared between the two usage scenarios and have non-surprising semantics in both.
## Build-time Initialization
In qbicc, all classes are initialized at build-time. Class initialization at build time is performed according to the existing semantics of Java class initialization driven by build-time execution of the `<clinit>` methods of reachable classes. The set of reachable classes is determined iteratively, starting with the program entrypoints and adding the methods and classes they utilize until no further reachable classes are discovered (a fixed point is reached).
After build-time initialization has completed, a build-time heap has been constructed that contains the objects that were created during the build-time execution of the `<clinit>` methods. Using the reachable static fields of the reachable program as roots, this build-time heap is serialized into the native image. This set of objects will form the initial runtime heap of the program when it is executed.
## Runtime Initializers
There are cases where one or more initialization actions of a class **must** be executed at program runtime. Most typically these involve the creation of native resources (open files, threads, etc) that cannot be successfully serialized into the build time heap.
Qbicc supports runtime initialization by allowing static fields of a classes to be declared as runtime initialized. These fields will be initialized lazily, at first access, by executing a runtime initializer (`<rtinit>`) associated with the accessed field. Runtime initialization is localized: accessing a particular static field will cause its runtime initializer to be executed but has no implications for other runtime initializers defined either in the field's defining class or any superclass or implemented interface of the field's defining class.
When serialized from the build-time heap to the runtime heap, all runtime-initialized fields will be serialized with the zero (uninitialized) value appropriate for their type.
Qbicc allows related static fields in the same class to share a common `<rtinit>` method. The first access to any of the fields will cause the execution of the associated `<rtinit>` method and the initialization of all the fields.
## Adjusting Heap Serialization
For some objects it is necessary to initialize them during build-time initialization, but "reset" them before they are used at runtime. Qbicc supports this by allowing fields to be annotated to be serialized as the type-appropriate zero value or as a primitive constant value. This value replacement happens as the build time heap is serialized.
One common scenario is to invalidate objects that are wrapping native resources. For example, when a `FileDescriptor` is serialized its `fd` and `handle` instance fields are serialized as `-1` and its `closed` field is serialize as `true`. Thus, any attempt to use the build-time FileDescriptor at runtime will raise the appropriate exception.
## Patching: Migration for Existing Classes
The runtime initialization mechanisms described above are currently enabled via a set of annotations. This allows qbicc to implement the desired semantics without requiring any changes to the Java compiler, class file format, or language specification. In the long term, we believe small modifications to the Java specification, for example defining a `rtinit { ... }` similar to the existing `static { ... }` construct could enable a simpler specification.
The primary annotation for runtime initialization is `RuntimeAspect`. This annotation is defined on a class and is interpreted as meaning that the `<clinit>` method of the class should be interpreted as an `<rtinit>` method. This method will not be executed during build-time initialization and instead will be deferred until the first access of one of the static fields defined in the class.
To allow us to "externally" modify JDK core classes for qbicc, we have developed an annotation-driven patcher infrastructure. The patcher allows the declaration of patch classes that add, remove, and modify the methods and fields of an existing class. This modification includes the replacement of the `<clinit>` method and the declaration of multiple `RuntimeAspect` patch classes.
The best way to explore what is possible with the patcher is to examine the java.base/src directory in the qbicc-class-library project. It makes extensive use of the patcher annotations to adapt the core JDK classes to qbicc while still allowing us to consume the upstream OpenJDK code base via an unmodified git submodule.
## Design Alternatives
A number of alternatives were considered before arriving at the final design documented here. The technical discussions and options considered can be explored starting in qbicc discussion #764 on GitHub.
From: Brian Goetz<brian.goetz@oracle.com> Date: Thursday, May 26, 2022 at 2:21 PM To: David P Grove<groved@us.ibm.com>,"leyden-dev@openjdk.java.net" <leyden-dev@openjdk.java.net> Subject: [EXTERNAL] Re: Experimentation with build time and runtime class initialization in qbicc
Hi David; Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi David;
Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no issues there?
Thanks, -Brian On 5/26/2022 12:35 PM, David P Grove wrote:
Hi,
In the qbicc project, we’ve been exploring options for adapting Java’s class initialization semantics for native images. In particular, we are trying to arrive at a non-surprising semantics that in a native-image scenarios allows most initialization to happen at build-time while still enabling runtime initialization of selected static fields.
Our current design and experience is captured here:https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc<https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc>. In a nutshell, the idea is to initialize classes via build-time execution of existing <clinit> methods as per normal Java semantics while adding per-static-field <rtinit> methods to provide a capability for runtime-reinitialization of a field before its first access.
--dave
Dan Heidinga <heidinga@redhat.com> schrieb am Fr., 27. Mai 2022, 08:36:
On Thu, May 26, 2022 at 9:01 PM Brian Goetz <brian.goetz@oracle.com> wrote:
Thanks for providing this.
Something about the qbicc approach here doesn't seem to add up to me. Maybe you can tell me what I'm missing.
From reading your notes, it seems that at build time, you start with the root class(es), execute their <clinit>, which will cause loading of more classes, more <clinits>, and you iterate until there are no new classes to initialize.
With qbicc we embraced the closed-world constraint and mandated that all class initialization happens at build time. While we started with runtime class initialization to bootstrap being able to run more code, we quickly switched to being all-in on build time init (BTI) due to the virtuous cycle between BTI and dead code elimination.
You then treat the statics as roots, and serialize those objects to the initial heap image. But before doing that, you exclude (zero out) any which are marked as "reinitialize at runtime."
Right.
The rationale for this clearly is that you want to continue the graph walk to find all the loadable classes, but then don't want to use the polluted value. But what happens in cases like this:
class Aliased { @RuntimeInitialized private static final Socket s = ...; private static final Socket copy = s; }
Do you throw on reads of runtime-initialized fields from a <clinit>? Do you walk the heap and find aliases to runtime-initialized values, and replace them with something (if so, what?) Or is the Aliased class above just "broken" according to this model, and I encounter a stale/nonworking socket in `copy` at runtime, and one that is not properly aliased to `s`? Once an object is initialized at build time, its state can escape into all sorts of other places, and just zeroing out the static root isn't enough to stamp it out.
This is where the "soupy" nature of <clinit> becomes evident. <clinit> is a single method that has tremendous side effects, setting static fields, initializing other classes, starting threads, caching computed values, etc. It's very hard to automatically reason about what has happened in a <clinit> method and what the user intends for those side effects (if they're even aware of what they all may be!).
What was the user's intent when they initialized 'copy'? To record what the original Socket connection - set up at build time - had been rather than separately storing the address/port? If they had a semantic meaning for `copy` even after `s` had been nulled out, then automatically resetting `copy` would violate their expectation.
We need the user to tell us their intent. If they wanted both `s` & `copy` to be reset, then they need to be explicit about that and annotate both fields. We don't attempt to null all copies of the value of a @RuntimeInitialized field.
Am I missing something?
You seemed to have grasped it correctly =)
If that field had been a primitive, such as a long, we'd be unable to track down which other longs in the heap were copies of it or derived from it. We wouldn't reset some other location with the value 42 because a @RuntimeInitialized field was set to 42 at build time. The programmer has to take responsibility for which fields need to be reset. With qbicc, that's annotations. With Leyden we may be able to give them a better way to group fields and express how & when they should be initialized.
And with CRaC we don't have to care for build-time initialization at all. Instead we just have to make sure that "relevant" fields are being reset before snapshot and correctly re-initialized on resume. The question is which fields have to be considered "relevant" in the CRaC context? Intuitively this will be a subset of the @RuntimeInitialized fields. But for CRaC this question also depends on the snapshot mechanism. If we're using CRIU to checkpoint a single process, sockets and file descriptions will certainly be hot candidates for @RuntimeInitialized fields. On the other hand, if we're snapshotting a complete virtual machine (e.g. with Firecracker) there's no need to reset/re-init file descriptors and even sockets might be handled transparently by the OS. Docker checkpoint is another interesting snapshotting possibility somewhere between single process and whole VM snapshotting.
--Dan
Thanks, -Brian
On 5/26/2022 4:22 PM, David P Grove wrote:
Hi,
I’ve appended the contents of the referenced wiki
page in this email. Apologies in advance if the formatting doesn’t come through as intended.
There is a full implementation of this (GPLv2 +
Classpath exception) as part of the qbicc project on GitHub. There is also a GitHub discussion in the qbicc project that links to various GitHub issues that capture the history that led to the current design. I will not hyperlink to those here so that if people have any IP concerns, they can avoid seeing them. They are easily findable.
Regards,
--dave
## Overview
One of the goals of the qbicc project is to explore technical
approaches for adapting Java's specification of class initialization to fully support native image compilation. Enabling build-time evaluation of complex class initialization logic is essential for obtaining much of the benefits of native image compilation: reduced memory footprint and fast startup. However, both the core JDK and many frameworks will not be primarily be used in native image scenarios. Therefore, it is essential that the approach taken for build-time initialization enables both the existing runtime class initialization and the new build-time class initialization logic to co-exist. Furthermore, for as many cases as possible, the class initialization code should be shared between the two usage scenarios and have non-surprising semantics in both.
## Build-time Initialization
In qbicc, all classes are initialized at build-time. Class
initialization at build time is performed according to the existing semantics of Java class initialization driven by build-time execution of the `<clinit>` methods of reachable classes. The set of reachable classes is determined iteratively, starting with the program entrypoints and adding the methods and classes they utilize until no further reachable classes are discovered (a fixed point is reached).
After build-time initialization has completed, a build-time heap has
been constructed that contains the objects that were created during the build-time execution of the `<clinit>` methods. Using the reachable static fields of the reachable program as roots, this build-time heap is serialized into the native image. This set of objects will form the initial runtime heap of the program when it is executed.
## Runtime Initializers
There are cases where one or more initialization actions of a class
**must** be executed at program runtime. Most typically these involve the creation of native resources (open files, threads, etc) that cannot be successfully serialized into the build time heap.
Qbicc supports runtime initialization by allowing static fields of a
classes to be declared as runtime initialized. These fields will be initialized lazily, at first access, by executing a runtime initializer (`<rtinit>`) associated with the accessed field. Runtime initialization is localized: accessing a particular static field will cause its runtime initializer to be executed but has no implications for other runtime initializers defined either in the field's defining class or any superclass or implemented interface of the field's defining class.
When serialized from the build-time heap to the runtime heap, all
runtime-initialized fields will be serialized with the zero (uninitialized) value appropriate for their type.
Qbicc allows related static fields in the same class to share a common
`<rtinit>` method. The first access to any of the fields will cause the execution of the associated `<rtinit>` method and the initialization of all the fields.
## Adjusting Heap Serialization
For some objects it is necessary to initialize them during build-time
initialization, but "reset" them before they are used at runtime.
Qbicc supports this by allowing fields to be annotated to be serialized as the type-appropriate zero value or as a primitive constant value. This value replacement happens as the build time heap is serialized.
One common scenario is to invalidate objects that are wrapping native resources. For example, when a `FileDescriptor` is serialized its `fd` and `handle` instance fields are serialized as `-1` and its `closed` field is serialize as `true`. Thus, any attempt to use the build-time FileDescriptor at runtime will raise the appropriate exception.
## Patching: Migration for Existing Classes
The runtime initialization mechanisms described above are currently enabled via a set of annotations. This allows qbicc to implement the desired semantics without requiring any changes to the Java compiler, class file format, or language specification. In the long term, we believe small modifications to the Java specification, for example defining a `rtinit { ... }` similar to the existing `static { ... }` construct could enable a simpler specification.
The primary annotation for runtime initialization is `RuntimeAspect`. This annotation is defined on a class and is interpreted as meaning that the `<clinit>` method of the class should be interpreted as an `<rtinit>` method. This method will not be executed during build-time initialization and instead will be deferred until the first access of one of the static fields defined in the class.
To allow us to "externally" modify JDK core classes for qbicc, we have developed an annotation-driven patcher infrastructure. The patcher allows the declaration of patch classes that add, remove, and modify the methods and fields of an existing class. This modification includes the replacement of the `<clinit>` method and the declaration of multiple `RuntimeAspect` patch classes.
The best way to explore what is possible with the patcher is to examine the java.base/src directory in the qbicc-class-library project. It makes extensive use of the patcher annotations to adapt the core JDK classes to qbicc while still allowing us to consume the upstream OpenJDK code base via an unmodified git submodule.
## Design Alternatives
A number of alternatives were considered before arriving at the final design documented here. The technical discussions and options considered can be explored starting in qbicc discussion #764 on GitHub.
From: Brian Goetz<brian.goetz@oracle.com> Date: Thursday, May 26, 2022 at 2:21 PM To: David P Grove<groved@us.ibm.com>,"leyden-dev@openjdk.java.net" < leyden-dev@openjdk.java.net> Subject: [EXTERNAL] Re: Experimentation with build time and runtime class initialization in qbicc
Hi David; Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi David;
Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no issues there?
Thanks, -Brian On 5/26/2022 12:35 PM, David P Grove wrote:
Hi,
In the qbicc project, we’ve been exploring options for adapting Java’s class initialization semantics for native images. In particular, we are trying to arrive at a non-surprising semantics that in a native-image scenarios allows most initialization to happen at build-time while still enabling runtime initialization of selected static fields.
Our current design and experience is captured here: https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc< https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc>. In a nutshell, the idea is to initialize classes via build-time execution of existing <clinit> methods as per normal Java semantics while adding per-static-field <rtinit> methods to provide a capability for runtime-reinitialization of a field before its first access.
--dave
If that field had been a primitive, such as a long, we'd be unable to track down which other longs in the heap were copies of it or derived from it. We wouldn't reset some other location with the value 42 because a @RuntimeInitialized field was set to 42 at build time. The programmer has to take responsibility for which fields need to be reset. With qbicc, that's annotations. With Leyden we may be able to give them a better way to group fields and express how & when they should be initialized.
I know Dan knows this, but for the broader audience, let me remind everyone that annotations are not likely to be a suitable mechanism for anything other than prototyping here. If something affects language semantics (and all of this does), it needs to be part of the language. But they're a fine tool for prototypes and proofs-of-concept.
Hi, I agree with the "soupy nature" of <clinit> methods mentioned below. This makes it impossible in general to reverse-engineer which parts of <clinit> initialize which static field. One suggestion how that could be improved: Instead of emitting a single <clinit> method, javac can emit separate <clinit_XXX> methods for each static field that is initialized inline as part of the field declaration, as well as each static{} block. With a consistent naming scheme of these methods, it would be much easier to run some initializations at build time and some at run time. For compatibility, the <clinit> method could be a chain of invocations of the <clinit_XXX> methods (or maybe <clinit> itself is no longer necessary at all). So for example a class class MyClass { static Object o1 = "abc"; static { foo(); } static Object o2 = 42; } the Java compiler would create the methods (written here with disassembled bytecode) <clinit_o1>() { o1 = "abc" } <clinit_$0>() { foo(); } <clinit_o2>() { o2 = 42; } <clinit>() { <clinit_o1>(); <clinit_$0>(); <clinit_o2>(); } Why such a scheme? It is much easier to prove here that the field o2 can be initialized at build time regardless of what foo() is doing, and then remove the run-time initialization of o2 by replacing <clinit_o2> with an empty method. All of that can be done without analyzing and modifying the bytecode soup of the current <clinit> method. -Christian On 5/27/22 08:35, Dan Heidinga wrote:
On Thu, May 26, 2022 at 9:01 PM Brian Goetz <brian.goetz@oracle.com> wrote:
Thanks for providing this.
Something about the qbicc approach here doesn't seem to add up to me. Maybe you can tell me what I'm missing.
From reading your notes, it seems that at build time, you start with the root class(es), execute their <clinit>, which will cause loading of more classes, more <clinits>, and you iterate until there are no new classes to initialize. With qbicc we embraced the closed-world constraint and mandated that all class initialization happens at build time. While we started with runtime class initialization to bootstrap being able to run more code, we quickly switched to being all-in on build time init (BTI) due to the virtuous cycle between BTI and dead code elimination.
You then treat the statics as roots, and serialize those objects to the initial heap image. But before doing that, you exclude (zero out) any which are marked as "reinitialize at runtime." Right.
The rationale for this clearly is that you want to continue the graph walk to find all the loadable classes, but then don't want to use the polluted value. But what happens in cases like this:
class Aliased { @RuntimeInitialized private static final Socket s = ...; private static final Socket copy = s; }
Do you throw on reads of runtime-initialized fields from a <clinit>? Do you walk the heap and find aliases to runtime-initialized values, and replace them with something (if so, what?) Or is the Aliased class above just "broken" according to this model, and I encounter a stale/nonworking socket in `copy` at runtime, and one that is not properly aliased to `s`? Once an object is initialized at build time, its state can escape into all sorts of other places, and just zeroing out the static root isn't enough to stamp it out. This is where the "soupy" nature of <clinit> becomes evident. <clinit> is a single method that has tremendous side effects, setting static fields, initializing other classes, starting threads, caching computed values, etc. It's very hard to automatically reason about what has happened in a <clinit> method and what the user intends for those side effects (if they're even aware of what they all may be!).
What was the user's intent when they initialized 'copy'? To record what the original Socket connection - set up at build time - had been rather than separately storing the address/port? If they had a semantic meaning for `copy` even after `s` had been nulled out, then automatically resetting `copy` would violate their expectation.
We need the user to tell us their intent. If they wanted both `s` & `copy` to be reset, then they need to be explicit about that and annotate both fields. We don't attempt to null all copies of the value of a @RuntimeInitialized field.
Am I missing something? You seemed to have grasped it correctly =)
If that field had been a primitive, such as a long, we'd be unable to track down which other longs in the heap were copies of it or derived from it. We wouldn't reset some other location with the value 42 because a @RuntimeInitialized field was set to 42 at build time. The programmer has to take responsibility for which fields need to be reset. With qbicc, that's annotations. With Leyden we may be able to give them a better way to group fields and express how & when they should be initialized.
--Dan
Thanks, -Brian
On 5/26/2022 4:22 PM, David P Grove wrote:
Hi,
I’ve appended the contents of the referenced wiki page in this email. Apologies in advance if the formatting doesn’t come through as intended.
There is a full implementation of this (GPLv2 + Classpath exception) as part of the qbicc project on GitHub. There is also a GitHub discussion in the qbicc project that links to various GitHub issues that capture the history that led to the current design. I will not hyperlink to those here so that if people have any IP concerns, they can avoid seeing them. They are easily findable.
Regards,
--dave
## Overview
One of the goals of the qbicc project is to explore technical approaches for adapting Java's specification of class initialization to fully support native image compilation. Enabling build-time evaluation of complex class initialization logic is essential for obtaining much of the benefits of native image compilation: reduced memory footprint and fast startup. However, both the core JDK and many frameworks will not be primarily be used in native image scenarios. Therefore, it is essential that the approach taken for build-time initialization enables both the existing runtime class initialization and the new build-time class initialization logic to co-exist. Furthermore, for as many cases as possible, the class initialization code should be shared between the two usage scenarios and have non-surprising semantics in both.
## Build-time Initialization
In qbicc, all classes are initialized at build-time. Class initialization at build time is performed according to the existing semantics of Java class initialization driven by build-time execution of the `<clinit>` methods of reachable classes. The set of reachable classes is determined iteratively, starting with the program entrypoints and adding the methods and classes they utilize until no further reachable classes are discovered (a fixed point is reached).
After build-time initialization has completed, a build-time heap has been constructed that contains the objects that were created during the build-time execution of the `<clinit>` methods. Using the reachable static fields of the reachable program as roots, this build-time heap is serialized into the native image. This set of objects will form the initial runtime heap of the program when it is executed.
## Runtime Initializers
There are cases where one or more initialization actions of a class **must** be executed at program runtime. Most typically these involve the creation of native resources (open files, threads, etc) that cannot be successfully serialized into the build time heap.
Qbicc supports runtime initialization by allowing static fields of a classes to be declared as runtime initialized. These fields will be initialized lazily, at first access, by executing a runtime initializer (`<rtinit>`) associated with the accessed field. Runtime initialization is localized: accessing a particular static field will cause its runtime initializer to be executed but has no implications for other runtime initializers defined either in the field's defining class or any superclass or implemented interface of the field's defining class.
When serialized from the build-time heap to the runtime heap, all runtime-initialized fields will be serialized with the zero (uninitialized) value appropriate for their type.
Qbicc allows related static fields in the same class to share a common `<rtinit>` method. The first access to any of the fields will cause the execution of the associated `<rtinit>` method and the initialization of all the fields.
## Adjusting Heap Serialization
For some objects it is necessary to initialize them during build-time initialization, but "reset" them before they are used at runtime. Qbicc supports this by allowing fields to be annotated to be serialized as the type-appropriate zero value or as a primitive constant value. This value replacement happens as the build time heap is serialized.
One common scenario is to invalidate objects that are wrapping native resources. For example, when a `FileDescriptor` is serialized its `fd` and `handle` instance fields are serialized as `-1` and its `closed` field is serialize as `true`. Thus, any attempt to use the build-time FileDescriptor at runtime will raise the appropriate exception.
## Patching: Migration for Existing Classes
The runtime initialization mechanisms described above are currently enabled via a set of annotations. This allows qbicc to implement the desired semantics without requiring any changes to the Java compiler, class file format, or language specification. In the long term, we believe small modifications to the Java specification, for example defining a `rtinit { ... }` similar to the existing `static { ... }` construct could enable a simpler specification.
The primary annotation for runtime initialization is `RuntimeAspect`. This annotation is defined on a class and is interpreted as meaning that the `<clinit>` method of the class should be interpreted as an `<rtinit>` method. This method will not be executed during build-time initialization and instead will be deferred until the first access of one of the static fields defined in the class.
To allow us to "externally" modify JDK core classes for qbicc, we have developed an annotation-driven patcher infrastructure. The patcher allows the declaration of patch classes that add, remove, and modify the methods and fields of an existing class. This modification includes the replacement of the `<clinit>` method and the declaration of multiple `RuntimeAspect` patch classes.
The best way to explore what is possible with the patcher is to examine the java.base/src directory in the qbicc-class-library project. It makes extensive use of the patcher annotations to adapt the core JDK classes to qbicc while still allowing us to consume the upstream OpenJDK code base via an unmodified git submodule.
## Design Alternatives
A number of alternatives were considered before arriving at the final design documented here. The technical discussions and options considered can be explored starting in qbicc discussion #764 on GitHub.
From: Brian Goetz<brian.goetz@oracle.com> Date: Thursday, May 26, 2022 at 2:21 PM To: David P Grove<groved@us.ibm.com>,"leyden-dev@openjdk.java.net" <leyden-dev@openjdk.java.net> Subject: [EXTERNAL] Re: Experimentation with build time and runtime class initialization in qbicc
Hi David; Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi David;
Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no issues there?
Thanks, -Brian On 5/26/2022 12:35 PM, David P Grove wrote:
Hi,
In the qbicc project, we’ve been exploring options for adapting Java’s class initialization semantics for native images. In particular, we are trying to arrive at a non-surprising semantics that in a native-image scenarios allows most initialization to happen at build-time while still enabling runtime initialization of selected static fields.
Our current design and experience is captured here:https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc<https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc>. In a nutshell, the idea is to initialize classes via build-time execution of existing <clinit> methods as per normal Java semantics while adding per-static-field <rtinit> methods to provide a capability for runtime-reinitialization of a field before its first access.
--dave
I too agree that the "soupy" nature of <clinit> makes reverse-engineering difficult, and that this alternate translation would make things easier for an after-the-fact analysis tool that is trying to reason about what computations could be safely shifted in time. But, keep in mind that it's not a free lunch. To point out the obvious tradeoff: this turns into a startup hit for every dynamically executed Java program (larger classfiles, more bytecodes, more methods). This is a tradeoff we would have to consider carefully, since making Java startup slower in general is not a cost we should take on lightly, especially given the charter of this project. So, something for the "could consider" list, but not a slam-dunk. On 5/28/2022 12:39 PM, Christian Wimmer wrote:
Hi,
I agree with the "soupy nature" of <clinit> methods mentioned below. This makes it impossible in general to reverse-engineer which parts of <clinit> initialize which static field. One suggestion how that could be improved: Instead of emitting a single <clinit> method, javac can emit separate <clinit_XXX> methods for each static field that is initialized inline as part of the field declaration, as well as each static{} block. With a consistent naming scheme of these methods, it would be much easier to run some initializations at build time and some at run time. For compatibility, the <clinit> method could be a chain of invocations of the <clinit_XXX> methods (or maybe <clinit> itself is no longer necessary at all).
So for example a class
class MyClass { static Object o1 = "abc"; static { foo(); } static Object o2 = 42; }
the Java compiler would create the methods (written here with disassembled bytecode)
<clinit_o1>() { o1 = "abc" } <clinit_$0>() { foo(); } <clinit_o2>() { o2 = 42; } <clinit>() { <clinit_o1>(); <clinit_$0>(); <clinit_o2>(); }
Why such a scheme? It is much easier to prove here that the field o2 can be initialized at build time regardless of what foo() is doing, and then remove the run-time initialization of o2 by replacing <clinit_o2> with an empty method. All of that can be done without analyzing and modifying the bytecode soup of the current <clinit> method.
-Christian
On 5/27/22 08:35, Dan Heidinga wrote:
On Thu, May 26, 2022 at 9:01 PM Brian Goetz <brian.goetz@oracle.com> wrote:
Thanks for providing this.
Something about the qbicc approach here doesn't seem to add up to me. Maybe you can tell me what I'm missing.
From reading your notes, it seems that at build time, you start with the root class(es), execute their <clinit>, which will cause loading of more classes, more <clinits>, and you iterate until there are no new classes to initialize. With qbicc we embraced the closed-world constraint and mandated that all class initialization happens at build time. While we started with runtime class initialization to bootstrap being able to run more code, we quickly switched to being all-in on build time init (BTI) due to the virtuous cycle between BTI and dead code elimination.
You then treat the statics as roots, and serialize those objects to the initial heap image. But before doing that, you exclude (zero out) any which are marked as "reinitialize at runtime." Right.
The rationale for this clearly is that you want to continue the graph walk to find all the loadable classes, but then don't want to use the polluted value. But what happens in cases like this:
class Aliased { @RuntimeInitialized private static final Socket s = ...; private static final Socket copy = s; }
Do you throw on reads of runtime-initialized fields from a <clinit>? Do you walk the heap and find aliases to runtime-initialized values, and replace them with something (if so, what?) Or is the Aliased class above just "broken" according to this model, and I encounter a stale/nonworking socket in `copy` at runtime, and one that is not properly aliased to `s`? Once an object is initialized at build time, its state can escape into all sorts of other places, and just zeroing out the static root isn't enough to stamp it out. This is where the "soupy" nature of <clinit> becomes evident. <clinit> is a single method that has tremendous side effects, setting static fields, initializing other classes, starting threads, caching computed values, etc. It's very hard to automatically reason about what has happened in a <clinit> method and what the user intends for those side effects (if they're even aware of what they all may be!).
What was the user's intent when they initialized 'copy'? To record what the original Socket connection - set up at build time - had been rather than separately storing the address/port? If they had a semantic meaning for `copy` even after `s` had been nulled out, then automatically resetting `copy` would violate their expectation.
We need the user to tell us their intent. If they wanted both `s` & `copy` to be reset, then they need to be explicit about that and annotate both fields. We don't attempt to null all copies of the value of a @RuntimeInitialized field.
Am I missing something? You seemed to have grasped it correctly =)
If that field had been a primitive, such as a long, we'd be unable to track down which other longs in the heap were copies of it or derived from it. We wouldn't reset some other location with the value 42 because a @RuntimeInitialized field was set to 42 at build time. The programmer has to take responsibility for which fields need to be reset. With qbicc, that's annotations. With Leyden we may be able to give them a better way to group fields and express how & when they should be initialized.
--Dan
Thanks, -Brian
On 5/26/2022 4:22 PM, David P Grove wrote:
Hi,
I’ve appended the contents of the referenced wiki page in this email. Apologies in advance if the formatting doesn’t come through as intended.
There is a full implementation of this (GPLv2 + Classpath exception) as part of the qbicc project on GitHub. There is also a GitHub discussion in the qbicc project that links to various GitHub issues that capture the history that led to the current design. I will not hyperlink to those here so that if people have any IP concerns, they can avoid seeing them. They are easily findable.
Regards,
--dave
## Overview
One of the goals of the qbicc project is to explore technical approaches for adapting Java's specification of class initialization to fully support native image compilation. Enabling build-time evaluation of complex class initialization logic is essential for obtaining much of the benefits of native image compilation: reduced memory footprint and fast startup. However, both the core JDK and many frameworks will not be primarily be used in native image scenarios. Therefore, it is essential that the approach taken for build-time initialization enables both the existing runtime class initialization and the new build-time class initialization logic to co-exist. Furthermore, for as many cases as possible, the class initialization code should be shared between the two usage scenarios and have non-surprising semantics in both.
## Build-time Initialization
In qbicc, all classes are initialized at build-time. Class initialization at build time is performed according to the existing semantics of Java class initialization driven by build-time execution of the `<clinit>` methods of reachable classes. The set of reachable classes is determined iteratively, starting with the program entrypoints and adding the methods and classes they utilize until no further reachable classes are discovered (a fixed point is reached).
After build-time initialization has completed, a build-time heap has been constructed that contains the objects that were created during the build-time execution of the `<clinit>` methods. Using the reachable static fields of the reachable program as roots, this build-time heap is serialized into the native image. This set of objects will form the initial runtime heap of the program when it is executed.
## Runtime Initializers
There are cases where one or more initialization actions of a class **must** be executed at program runtime. Most typically these involve the creation of native resources (open files, threads, etc) that cannot be successfully serialized into the build time heap.
Qbicc supports runtime initialization by allowing static fields of a classes to be declared as runtime initialized. These fields will be initialized lazily, at first access, by executing a runtime initializer (`<rtinit>`) associated with the accessed field. Runtime initialization is localized: accessing a particular static field will cause its runtime initializer to be executed but has no implications for other runtime initializers defined either in the field's defining class or any superclass or implemented interface of the field's defining class.
When serialized from the build-time heap to the runtime heap, all runtime-initialized fields will be serialized with the zero (uninitialized) value appropriate for their type.
Qbicc allows related static fields in the same class to share a common `<rtinit>` method. The first access to any of the fields will cause the execution of the associated `<rtinit>` method and the initialization of all the fields.
## Adjusting Heap Serialization
For some objects it is necessary to initialize them during build-time initialization, but "reset" them before they are used at runtime. Qbicc supports this by allowing fields to be annotated to be serialized as the type-appropriate zero value or as a primitive constant value. This value replacement happens as the build time heap is serialized.
One common scenario is to invalidate objects that are wrapping native resources. For example, when a `FileDescriptor` is serialized its `fd` and `handle` instance fields are serialized as `-1` and its `closed` field is serialize as `true`. Thus, any attempt to use the build-time FileDescriptor at runtime will raise the appropriate exception.
## Patching: Migration for Existing Classes
The runtime initialization mechanisms described above are currently enabled via a set of annotations. This allows qbicc to implement the desired semantics without requiring any changes to the Java compiler, class file format, or language specification. In the long term, we believe small modifications to the Java specification, for example defining a `rtinit { ... }` similar to the existing `static { ... }` construct could enable a simpler specification.
The primary annotation for runtime initialization is `RuntimeAspect`. This annotation is defined on a class and is interpreted as meaning that the `<clinit>` method of the class should be interpreted as an `<rtinit>` method. This method will not be executed during build-time initialization and instead will be deferred until the first access of one of the static fields defined in the class.
To allow us to "externally" modify JDK core classes for qbicc, we have developed an annotation-driven patcher infrastructure. The patcher allows the declaration of patch classes that add, remove, and modify the methods and fields of an existing class. This modification includes the replacement of the `<clinit>` method and the declaration of multiple `RuntimeAspect` patch classes.
The best way to explore what is possible with the patcher is to examine the java.base/src directory in the qbicc-class-library project. It makes extensive use of the patcher annotations to adapt the core JDK classes to qbicc while still allowing us to consume the upstream OpenJDK code base via an unmodified git submodule.
## Design Alternatives
A number of alternatives were considered before arriving at the final design documented here. The technical discussions and options considered can be explored starting in qbicc discussion #764 on GitHub.
From: Brian Goetz<brian.goetz@oracle.com> Date: Thursday, May 26, 2022 at 2:21 PM To: David P Grove<groved@us.ibm.com>,"leyden-dev@openjdk.java.net" <leyden-dev@openjdk.java.net> Subject: [EXTERNAL] Re: Experimentation with build time and runtime class initialization in qbicc
Hi David; Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi David;
Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no issues there?
Thanks, -Brian On 5/26/2022 12:35 PM, David P Grove wrote:
Hi,
In the qbicc project, we’ve been exploring options for adapting Java’s class initialization semantics for native images. In particular, we are trying to arrive at a non-surprising semantics that in a native-image scenarios allows most initialization to happen at build-time while still enabling runtime initialization of selected static fields.
Our current design and experience is captured here:https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc<https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc>. In a nutshell, the idea is to initialize classes via build-time execution of existing <clinit> methods as per normal Java semantics while adding per-static-field <rtinit> methods to provide a capability for runtime-reinitialization of a field before its first access.
--dave
Certainly everything comes with a tradeoff. But I would argue that the cost of the current workaround to influence static field initializations - make a separate static inner class for a static field that should be initialized separately - is even higher because it requires a full class data structure just to hold a single field. Even in the JDK, the number of inner classes named "Lazy" is growing. A more fine-grained initialization of fields within a class can help to reduce such overhead. -Christian On 5/28/22 09:58, Brian Goetz wrote:
I too agree that the "soupy" nature of <clinit> makes reverse-engineering difficult, and that this alternate translation would make things easier for an after-the-fact analysis tool that is trying to reason about what computations could be safely shifted in time.
But, keep in mind that it's not a free lunch. To point out the obvious tradeoff: this turns into a startup hit for every dynamically executed Java program (larger classfiles, more bytecodes, more methods). This is a tradeoff we would have to consider carefully, since making Java startup slower in general is not a cost we should take on lightly, especially given the charter of this project. So, something for the "could consider" list, but not a slam-dunk.
On 5/28/2022 12:39 PM, Christian Wimmer wrote:
Hi,
I agree with the "soupy nature" of <clinit> methods mentioned below. This makes it impossible in general to reverse-engineer which parts of <clinit> initialize which static field. One suggestion how that could be improved: Instead of emitting a single <clinit> method, javac can emit separate <clinit_XXX> methods for each static field that is initialized inline as part of the field declaration, as well as each static{} block. With a consistent naming scheme of these methods, it would be much easier to run some initializations at build time and some at run time. For compatibility, the <clinit> method could be a chain of invocations of the <clinit_XXX> methods (or maybe <clinit> itself is no longer necessary at all).
So for example a class
class MyClass { static Object o1 = "abc"; static { foo(); } static Object o2 = 42; }
the Java compiler would create the methods (written here with disassembled bytecode)
<clinit_o1>() { o1 = "abc" } <clinit_$0>() { foo(); } <clinit_o2>() { o2 = 42; } <clinit>() { <clinit_o1>(); <clinit_$0>(); <clinit_o2>(); }
Why such a scheme? It is much easier to prove here that the field o2 can be initialized at build time regardless of what foo() is doing, and then remove the run-time initialization of o2 by replacing <clinit_o2> with an empty method. All of that can be done without analyzing and modifying the bytecode soup of the current <clinit> method.
-Christian
On 5/27/22 08:35, Dan Heidinga wrote:
On Thu, May 26, 2022 at 9:01 PM Brian Goetz <brian.goetz@oracle.com> wrote:
Thanks for providing this.
Something about the qbicc approach here doesn't seem to add up to me. Maybe you can tell me what I'm missing.
From reading your notes, it seems that at build time, you start with the root class(es), execute their <clinit>, which will cause loading of more classes, more <clinits>, and you iterate until there are no new classes to initialize. With qbicc we embraced the closed-world constraint and mandated that all class initialization happens at build time. While we started with runtime class initialization to bootstrap being able to run more code, we quickly switched to being all-in on build time init (BTI) due to the virtuous cycle between BTI and dead code elimination.
You then treat the statics as roots, and serialize those objects to the initial heap image. But before doing that, you exclude (zero out) any which are marked as "reinitialize at runtime." Right.
The rationale for this clearly is that you want to continue the graph walk to find all the loadable classes, but then don't want to use the polluted value. But what happens in cases like this:
class Aliased { @RuntimeInitialized private static final Socket s = ...; private static final Socket copy = s; }
Do you throw on reads of runtime-initialized fields from a <clinit>? Do you walk the heap and find aliases to runtime-initialized values, and replace them with something (if so, what?) Or is the Aliased class above just "broken" according to this model, and I encounter a stale/nonworking socket in `copy` at runtime, and one that is not properly aliased to `s`? Once an object is initialized at build time, its state can escape into all sorts of other places, and just zeroing out the static root isn't enough to stamp it out. This is where the "soupy" nature of <clinit> becomes evident. <clinit> is a single method that has tremendous side effects, setting static fields, initializing other classes, starting threads, caching computed values, etc. It's very hard to automatically reason about what has happened in a <clinit> method and what the user intends for those side effects (if they're even aware of what they all may be!).
What was the user's intent when they initialized 'copy'? To record what the original Socket connection - set up at build time - had been rather than separately storing the address/port? If they had a semantic meaning for `copy` even after `s` had been nulled out, then automatically resetting `copy` would violate their expectation.
We need the user to tell us their intent. If they wanted both `s` & `copy` to be reset, then they need to be explicit about that and annotate both fields. We don't attempt to null all copies of the value of a @RuntimeInitialized field.
Am I missing something? You seemed to have grasped it correctly =)
If that field had been a primitive, such as a long, we'd be unable to track down which other longs in the heap were copies of it or derived from it. We wouldn't reset some other location with the value 42 because a @RuntimeInitialized field was set to 42 at build time. The programmer has to take responsibility for which fields need to be reset. With qbicc, that's annotations. With Leyden we may be able to give them a better way to group fields and express how & when they should be initialized.
--Dan
Thanks, -Brian
On 5/26/2022 4:22 PM, David P Grove wrote:
Hi,
I’ve appended the contents of the referenced wiki page in this email. Apologies in advance if the formatting doesn’t come through as intended.
There is a full implementation of this (GPLv2 + Classpath exception) as part of the qbicc project on GitHub. There is also a GitHub discussion in the qbicc project that links to various GitHub issues that capture the history that led to the current design. I will not hyperlink to those here so that if people have any IP concerns, they can avoid seeing them. They are easily findable.
Regards,
--dave
## Overview
One of the goals of the qbicc project is to explore technical approaches for adapting Java's specification of class initialization to fully support native image compilation. Enabling build-time evaluation of complex class initialization logic is essential for obtaining much of the benefits of native image compilation: reduced memory footprint and fast startup. However, both the core JDK and many frameworks will not be primarily be used in native image scenarios. Therefore, it is essential that the approach taken for build-time initialization enables both the existing runtime class initialization and the new build-time class initialization logic to co-exist. Furthermore, for as many cases as possible, the class initialization code should be shared between the two usage scenarios and have non-surprising semantics in both.
## Build-time Initialization
In qbicc, all classes are initialized at build-time. Class initialization at build time is performed according to the existing semantics of Java class initialization driven by build-time execution of the `<clinit>` methods of reachable classes. The set of reachable classes is determined iteratively, starting with the program entrypoints and adding the methods and classes they utilize until no further reachable classes are discovered (a fixed point is reached).
After build-time initialization has completed, a build-time heap has been constructed that contains the objects that were created during the build-time execution of the `<clinit>` methods. Using the reachable static fields of the reachable program as roots, this build-time heap is serialized into the native image. This set of objects will form the initial runtime heap of the program when it is executed.
## Runtime Initializers
There are cases where one or more initialization actions of a class **must** be executed at program runtime. Most typically these involve the creation of native resources (open files, threads, etc) that cannot be successfully serialized into the build time heap.
Qbicc supports runtime initialization by allowing static fields of a classes to be declared as runtime initialized. These fields will be initialized lazily, at first access, by executing a runtime initializer (`<rtinit>`) associated with the accessed field. Runtime initialization is localized: accessing a particular static field will cause its runtime initializer to be executed but has no implications for other runtime initializers defined either in the field's defining class or any superclass or implemented interface of the field's defining class.
When serialized from the build-time heap to the runtime heap, all runtime-initialized fields will be serialized with the zero (uninitialized) value appropriate for their type.
Qbicc allows related static fields in the same class to share a common `<rtinit>` method. The first access to any of the fields will cause the execution of the associated `<rtinit>` method and the initialization of all the fields.
## Adjusting Heap Serialization
For some objects it is necessary to initialize them during build-time initialization, but "reset" them before they are used at runtime. Qbicc supports this by allowing fields to be annotated to be serialized as the type-appropriate zero value or as a primitive constant value. This value replacement happens as the build time heap is serialized.
One common scenario is to invalidate objects that are wrapping native resources. For example, when a `FileDescriptor` is serialized its `fd` and `handle` instance fields are serialized as `-1` and its `closed` field is serialize as `true`. Thus, any attempt to use the build-time FileDescriptor at runtime will raise the appropriate exception.
## Patching: Migration for Existing Classes
The runtime initialization mechanisms described above are currently enabled via a set of annotations. This allows qbicc to implement the desired semantics without requiring any changes to the Java compiler, class file format, or language specification. In the long term, we believe small modifications to the Java specification, for example defining a `rtinit { ... }` similar to the existing `static { ... }` construct could enable a simpler specification.
The primary annotation for runtime initialization is `RuntimeAspect`. This annotation is defined on a class and is interpreted as meaning that the `<clinit>` method of the class should be interpreted as an `<rtinit>` method. This method will not be executed during build-time initialization and instead will be deferred until the first access of one of the static fields defined in the class.
To allow us to "externally" modify JDK core classes for qbicc, we have developed an annotation-driven patcher infrastructure. The patcher allows the declaration of patch classes that add, remove, and modify the methods and fields of an existing class. This modification includes the replacement of the `<clinit>` method and the declaration of multiple `RuntimeAspect` patch classes.
The best way to explore what is possible with the patcher is to examine the java.base/src directory in the qbicc-class-library project. It makes extensive use of the patcher annotations to adapt the core JDK classes to qbicc while still allowing us to consume the upstream OpenJDK code base via an unmodified git submodule.
## Design Alternatives
A number of alternatives were considered before arriving at the final design documented here. The technical discussions and options considered can be explored starting in qbicc discussion #764 on GitHub.
From: Brian Goetz<brian.goetz@oracle.com> Date: Thursday, May 26, 2022 at 2:21 PM To: David P Grove<groved@us.ibm.com>,"leyden-dev@openjdk.java.net" <leyden-dev@openjdk.java.net> Subject: [EXTERNAL] Re: Experimentation with build time and runtime class initialization in qbicc
Hi David; Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd Hi David;
Would like to understand more about this, but first, from an IP-hygiene perspective, documents linked from this list should be under the OpenJDK terms and conditions. Can you post the contents of that document here, so there are no issues there?
Thanks, -Brian On 5/26/2022 12:35 PM, David P Grove wrote:
Hi,
In the qbicc project, we’ve been exploring options for adapting Java’s class initialization semantics for native images. In particular, we are trying to arrive at a non-surprising semantics that in a native-image scenarios allows most initialization to happen at build-time while still enabling runtime initialization of selected static fields.
Our current design and experience is captured here:https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc<https://github.com/qbicc/qbicc/wiki/Class-Initialization-in-qbicc>. In a nutshell, the idea is to initialize classes via build-time execution of existing <clinit> methods as per normal Java semantics while adding per-static-field <rtinit> methods to provide a capability for runtime-reinitialization of a field before its first access.
--dave
Hi David, Thanks for the write-up. One thing that isn't completely clear to me after reading this is why language changes (<rtinit>) are needed? It seems to me this could be entirely implemented via a standard API. Using ClassValue as the main inspiration you could have something like: abstract class RuntimeLocal<T> { protected RuntimeLocal() { checkBuildTime(); VM.registerForRuntimeInitialization(this); } protected abstract T computeValue(); public final T get(); // Calls to get are optimized by the vm } Usage would be something similar to: class Usage { static final LocalDateTime BUILD_TIME = LocalDateTime.now(); static final RuntimeLocal<LocalDateTime> RUNTIME_TIME = new RuntimeLocal<>() { protected LocalDateTime computeValue() { return LocalDateTime.now(); } }; } I might be missing some details, but it seems to me that this approach would be strongly favorable to needing to change the language as well as adding new bytecodes. /Kasper On Thu, 26 May 2022 at 21:22, David P Grove <groved@us.ibm.com> wrote:
Hi, I’ve appended the contents of the referenced wiki page in this email. Apologies in advance if the formatting doesn’t come through as intended.
There is a full implementation of this (GPLv2 + Classpath exception) as part of the qbicc project on GitHub. There is also a GitHub discussion in the qbicc project that links to various GitHub issues that capture the history that led to the current design. I will not hyperlink to those here so that if people have any IP concerns, they can avoid seeing them. They are easily findable.
Regards,
--dave
On Fri, May 27, 2022 at 7:53 AM Kasper Nielsen <kasperni@gmail.com> wrote:
Hi David,
Thanks for the write-up.
One thing that isn't completely clear to me after reading this is why language changes (<rtinit>) are needed?
The <rtinit> model was a convenient way for us to explore a model that put all class initialization at build time, while allowing a small set of fields to be reinitialized at runtime. It also minimized the changes we had to make to the core JDK classes which makes maintaining the changes much easier given the rate of JDK updates. SubstrateVM uses a similar approach with their Substitutions for what I assume are similar reasons. Leyden will be able to update the JDK core classes directly and can take a more direct approach to indicating in which phase a static field should be initialized.
It seems to me this could be entirely implemented via a standard API. Using ClassValue as the main inspiration you could have something like:
abstract class RuntimeLocal<T> { protected RuntimeLocal() { checkBuildTime(); VM.registerForRuntimeInitialization(this); } protected abstract T computeValue(); public final T get(); // Calls to get are optimized by the vm }
Usage would be something similar to:
class Usage {
static final LocalDateTime BUILD_TIME = LocalDateTime.now();
static final RuntimeLocal<LocalDateTime> RUNTIME_TIME = new RuntimeLocal<>() { protected LocalDateTime computeValue() { return LocalDateTime.now(); } }; }
I might be missing some details, but it seems to me that this approach would be strongly favorable to needing to change the language as well as adding new bytecodes.
This is a good starting point. I went a fair ways looking at how to group static fields into different classes to decouple their lifetimes and found that I couldn't cleanly split them into two groups. I used the Initialization on demand holder pattern (IODH) rather than your RuntimeLocal but the idea is very similar. The problem is that while it's clear that some fields can be initialized early (build time) and others must be initialized late (runtime), there is a third group that needs to be reinitialized. I list 3 buckets: early, late, and reinit, but that's a minimum number. There may be more than 3. And due to the "soupy" nature of <clinit>, it's not always easy to avoid depending on a field that's in a different bucket. And values in that 3rd bucket - the fields that need to be reinitialized - don't have a clear meaning when their value propagates around the program. Does it need to be cleared everywhere and force reinit of all consumers? Lots to figure out here. We need a better model - whether that's library features or new language features - that makes it easier to express when (which phase) an operation should occur and some way to talk about the dependency chain of that value (all the classes that have to be initialized, values calculated, etc). --Dan
/Kasper
On Thu, 26 May 2022 at 21:22, David P Grove <groved@us.ibm.com> wrote:
Hi, I’ve appended the contents of the referenced wiki page in this email. Apologies in advance if the formatting doesn’t come through as intended.
There is a full implementation of this (GPLv2 + Classpath exception) as part of the qbicc project on GitHub. There is also a GitHub discussion in the qbicc project that links to various GitHub issues that capture the history that led to the current design. I will not hyperlink to those here so that if people have any IP concerns, they can avoid seeing them. They are easily findable.
Regards,
--dave
I think Dan is homing in on one of the key questions, which is the nature of the third bucket (static finals that require reinitialization.) It would be useful for everyone following the discussion if we had a more complete list of situations you've encountered where this seems essential, and their notable aspects. As you point out, there are a host of potential "solutions"; while it is surely premature to try to propose a solution, it is never too early to come to a better understanding of the problem. On 5/31/2022 11:50 AM, Dan Heidinga wrote:
On Fri, May 27, 2022 at 7:53 AM Kasper Nielsen<kasperni@gmail.com> wrote:
Hi David,
Thanks for the write-up.
One thing that isn't completely clear to me after reading this is why language changes (<rtinit>) are needed? The <rtinit> model was a convenient way for us to explore a model that put all class initialization at build time, while allowing a small set of fields to be reinitialized at runtime. It also minimized the changes we had to make to the core JDK classes which makes maintaining the changes much easier given the rate of JDK updates. SubstrateVM uses a similar approach with their Substitutions for what I assume are similar reasons.
Leyden will be able to update the JDK core classes directly and can take a more direct approach to indicating in which phase a static field should be initialized.
It seems to me this could be entirely implemented via a standard API. Using ClassValue as the main inspiration you could have something like:
abstract class RuntimeLocal<T> { protected RuntimeLocal() { checkBuildTime(); VM.registerForRuntimeInitialization(this); } protected abstract T computeValue(); public final T get(); // Calls to get are optimized by the vm }
Usage would be something similar to:
class Usage {
static final LocalDateTime BUILD_TIME = LocalDateTime.now();
static final RuntimeLocal<LocalDateTime> RUNTIME_TIME = new RuntimeLocal<>() { protected LocalDateTime computeValue() { return LocalDateTime.now(); } }; }
I might be missing some details, but it seems to me that this approach would be strongly favorable to needing to change the language as well as adding new bytecodes. This is a good starting point. I went a fair ways looking at how to group static fields into different classes to decouple their lifetimes and found that I couldn't cleanly split them into two groups. I used the Initialization on demand holder pattern (IODH) rather than your RuntimeLocal but the idea is very similar.
The problem is that while it's clear that some fields can be initialized early (build time) and others must be initialized late (runtime), there is a third group that needs to be reinitialized. I list 3 buckets: early, late, and reinit, but that's a minimum number. There may be more than 3. And due to the "soupy" nature of <clinit>, it's not always easy to avoid depending on a field that's in a different bucket. And values in that 3rd bucket - the fields that need to be reinitialized - don't have a clear meaning when their value propagates around the program. Does it need to be cleared everywhere and force reinit of all consumers? Lots to figure out here.
We need a better model - whether that's library features or new language features - that makes it easier to express when (which phase) an operation should occur and some way to talk about the dependency chain of that value (all the classes that have to be initialized, values calculated, etc).
--Dan
/Kasper
On Thu, 26 May 2022 at 21:22, David P Grove<groved@us.ibm.com> wrote:
Hi, I’ve appended the contents of the referenced wiki page in this email. Apologies in advance if the formatting doesn’t come through as intended.
There is a full implementation of this (GPLv2 + Classpath exception) as part of the qbicc project on GitHub. There is also a GitHub discussion in the qbicc project that links to various GitHub issues that capture the history that led to the current design. I will not hyperlink to those here so that if people have any IP concerns, they can avoid seeing them. They are easily findable.
Regards,
--dave
On Tue, May 31, 2022 at 12:17 PM Brian Goetz <brian.goetz@oracle.com> wrote:
I think Dan is homing in on one of the key questions, which is the nature of the third bucket (static finals that require reinitialization.) It would be useful for everyone following the discussion if we had a more complete list of situations you've encountered where this seems essential, and their notable aspects.
In qbicc, the places we've had to reinitialize static fields are captured in the qbicc/qbicc-class-library repo [0] using "$_runtime" source files [1]. Many of the cases have to do with capturing the build time vs the runtime environment. The number of available CPUs is captured in several places: * j.l.Runtime : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/ja... * j.u.c.Exchanger: https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/ja... * j.u.c.Phaser : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/ja... * j.u.c.a.Striped64 : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/ja... The environment variables are captured: * j.l.ProcessEnvironment : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/ja... The in / out / err file descriptors need to be reinitialized: * j.io.FileDescriptor : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/ja... Prevent threads from being created in a static initializer: * j.l.ref.Reference : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/ja... * Likely more cases for this we just haven't hit yet Unsafe pageSize needs to be configured at runtime. As do UnsafeConstants like ADDRESS_SIZE0: * j.i.m.Unsafe : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/ja... * j.i.m.UnsafeConstants: https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/ja... & https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/ja... Capturing the default directory: * sun.nio.fs.UnixFileSystem : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/ja... We're still working through detangling the "initPhase" process in j.l.System into a build time and runtime ("rtInitPhase") version: https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/ja... We also did some investigation of how feasible it would be to rewrite SubstrateVM's Substitutions to use the IODH pattern and I can share that info as well but it'll take a bit for me to write it up in a clear state. --Dan [0] https://github.com/qbicc/qbicc-class-library [1] https://github.com/qbicc/qbicc-class-library/search?q=%24_runtime
As you point out, there are a host of potential "solutions"; while it is surely premature to try to propose a solution, it is never too early to come to a better understanding of the problem.
On 5/31/2022 11:50 AM, Dan Heidinga wrote:
On Fri, May 27, 2022 at 7:53 AM Kasper Nielsen <kasperni@gmail.com> wrote:
Hi David,
Thanks for the write-up.
One thing that isn't completely clear to me after reading this is why language changes (<rtinit>) are needed?
The <rtinit> model was a convenient way for us to explore a model that put all class initialization at build time, while allowing a small set of fields to be reinitialized at runtime. It also minimized the changes we had to make to the core JDK classes which makes maintaining the changes much easier given the rate of JDK updates. SubstrateVM uses a similar approach with their Substitutions for what I assume are similar reasons.
Leyden will be able to update the JDK core classes directly and can take a more direct approach to indicating in which phase a static field should be initialized.
It seems to me this could be entirely implemented via a standard API. Using ClassValue as the main inspiration you could have something like:
abstract class RuntimeLocal<T> { protected RuntimeLocal() { checkBuildTime(); VM.registerForRuntimeInitialization(this); } protected abstract T computeValue(); public final T get(); // Calls to get are optimized by the vm }
Usage would be something similar to:
class Usage {
static final LocalDateTime BUILD_TIME = LocalDateTime.now();
static final RuntimeLocal<LocalDateTime> RUNTIME_TIME = new RuntimeLocal<>() { protected LocalDateTime computeValue() { return LocalDateTime.now(); } }; }
I might be missing some details, but it seems to me that this approach would be strongly favorable to needing to change the language as well as adding new bytecodes.
This is a good starting point. I went a fair ways looking at how to group static fields into different classes to decouple their lifetimes and found that I couldn't cleanly split them into two groups. I used the Initialization on demand holder pattern (IODH) rather than your RuntimeLocal but the idea is very similar.
The problem is that while it's clear that some fields can be initialized early (build time) and others must be initialized late (runtime), there is a third group that needs to be reinitialized. I list 3 buckets: early, late, and reinit, but that's a minimum number. There may be more than 3. And due to the "soupy" nature of <clinit>, it's not always easy to avoid depending on a field that's in a different bucket. And values in that 3rd bucket - the fields that need to be reinitialized - don't have a clear meaning when their value propagates around the program. Does it need to be cleared everywhere and force reinit of all consumers? Lots to figure out here.
We need a better model - whether that's library features or new language features - that makes it easier to express when (which phase) an operation should occur and some way to talk about the dependency chain of that value (all the classes that have to be initialized, values calculated, etc).
--Dan
/Kasper
On Thu, 26 May 2022 at 21:22, David P Grove <groved@us.ibm.com> wrote:
Hi, I’ve appended the contents of the referenced wiki page in this email. Apologies in advance if the formatting doesn’t come through as intended.
There is a full implementation of this (GPLv2 + Classpath exception) as part of the qbicc project on GitHub. There is also a GitHub discussion in the qbicc project that links to various GitHub issues that capture the history that led to the current design. I will not hyperlink to those here so that if people have any IP concerns, they can avoid seeing them. They are easily findable.
Regards,
--dave
Thanks, Dan, for the detailed information. The other investigation also seems interesting, so I hope some day you’ll find the time to write it up. There’s lots to unpack here, but I want to focus on a specific aspect, related to the issue of “stale” or ‘aliased” compile-time values that I raised in my earlier mail. Taking the specific example of caching Runtime.availableProcessors(), let’s ask: WHY are these classes caching R.aP() in a static? There are two possible cases: - Pure caching. Here, the author has made a choice (right or wrong) that calling R.aP() repeatedly will be too expensive, and so caches the value in a static for later use for, say, allocating arena arrays in the constructor of Striped64 or Exchanger — but the instances created in the early phase are still valid in the later phase, and compatible with instances created in the later phase. - Enforcement of invariant. Here, the author has captured the fact that they require the value to be stable, because (say) they’re going to create multiple arrays and expect them all to be of the same length. Here, early-phase and later-phase instances could not compatibly coexist. In the first case, reinitializing the cached field at phase change points may be harmless; it’s essentially equivalent to replacing reads of fields with repeated evaluation of the initializer (assuming the initialization is pure); in the second, the runtime has broken an invariant the author had reason to believe is valid. Without diving into solutions at this point, we can’t escape the following observations: - This is what happens when you try to reinterpret old code with new semantics; code that had every reason to work properly when it was written, becomes retroactively broken when the runtime reinterprets old cold in a new way. New semantics require permission from the user. - If there are N separate desirable (but incompatible) outcomes, such as the two cases cited above, their code has to be different from each other. Right now, we can’t tell the difference between these cases. If, as in the “its an invariant” case, it would be unacceptable for the value to change (i.e., when the user said “static final”, they were serious), the one of the following has to happen: - We must be prepared to keep the earlier-phase result in later phases, even if the underlying quantity has changed; - We must defer evaluation until the later phase (potentially deferring all dependent early evaluations); - We fail at early-eval time if someone attempts to evaluate the must-be-stable quantity in the early phase, and let the programmer sort it out. In fact, to the extent we want early evaluation, I suspect that we may want to be able to express *all three* of these in the programming model.
On Jun 6, 2022, at 10:36 AM, Dan Heidinga <heidinga@redhat.com> wrote:
On Tue, May 31, 2022 at 12:17 PM Brian Goetz <brian.goetz@oracle.com> wrote:
I think Dan is homing in on one of the key questions, which is the nature of the third bucket (static finals that require reinitialization.) It would be useful for everyone following the discussion if we had a more complete list of situations you've encountered where this seems essential, and their notable aspects.
In qbicc, the places we've had to reinitialize static fields are captured in the qbicc/qbicc-class-library repo [0] using "$_runtime" source files [1]. Many of the cases have to do with capturing the build time vs the runtime environment.
The number of available CPUs is captured in several places: * j.l.Runtime : https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blo... * j.u.c.Exchanger: https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blo... * j.u.c.Phaser : https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blo... * j.u.c.a.Striped64 : https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blo...
The environment variables are captured: * j.l.ProcessEnvironment : https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blo...
The in / out / err file descriptors need to be reinitialized: * j.io.FileDescriptor : https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blo...
Prevent threads from being created in a static initializer: * j.l.ref.Reference : https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blo... * Likely more cases for this we just haven't hit yet
Unsafe pageSize needs to be configured at runtime. As do UnsafeConstants like ADDRESS_SIZE0: * j.i.m.Unsafe : https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blo... * j.i.m.UnsafeConstants: https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blo... & https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blo...
Capturing the default directory: * sun.nio.fs.UnixFileSystem : https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blo...
We're still working through detangling the "initPhase" process in j.l.System into a build time and runtime ("rtInitPhase") version: https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blo...
We also did some investigation of how feasible it would be to rewrite SubstrateVM's Substitutions to use the IODH pattern and I can share that info as well but it'll take a bit for me to write it up in a clear state.
--Dan
[0] https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library__;!... [1] https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/sea...
As you point out, there are a host of potential "solutions"; while it is surely premature to try to propose a solution, it is never too early to come to a better understanding of the problem.
On 5/31/2022 11:50 AM, Dan Heidinga wrote:
On Fri, May 27, 2022 at 7:53 AM Kasper Nielsen <kasperni@gmail.com> wrote:
Hi David,
Thanks for the write-up.
One thing that isn't completely clear to me after reading this is why language changes (<rtinit>) are needed?
The <rtinit> model was a convenient way for us to explore a model that put all class initialization at build time, while allowing a small set of fields to be reinitialized at runtime. It also minimized the changes we had to make to the core JDK classes which makes maintaining the changes much easier given the rate of JDK updates. SubstrateVM uses a similar approach with their Substitutions for what I assume are similar reasons.
Leyden will be able to update the JDK core classes directly and can take a more direct approach to indicating in which phase a static field should be initialized.
It seems to me this could be entirely implemented via a standard API. Using ClassValue as the main inspiration you could have something like:
abstract class RuntimeLocal<T> { protected RuntimeLocal() { checkBuildTime(); VM.registerForRuntimeInitialization(this); } protected abstract T computeValue(); public final T get(); // Calls to get are optimized by the vm }
Usage would be something similar to:
class Usage {
static final LocalDateTime BUILD_TIME = LocalDateTime.now();
static final RuntimeLocal<LocalDateTime> RUNTIME_TIME = new RuntimeLocal<>() { protected LocalDateTime computeValue() { return LocalDateTime.now(); } }; }
I might be missing some details, but it seems to me that this approach would be strongly favorable to needing to change the language as well as adding new bytecodes.
This is a good starting point. I went a fair ways looking at how to group static fields into different classes to decouple their lifetimes and found that I couldn't cleanly split them into two groups. I used the Initialization on demand holder pattern (IODH) rather than your RuntimeLocal but the idea is very similar.
The problem is that while it's clear that some fields can be initialized early (build time) and others must be initialized late (runtime), there is a third group that needs to be reinitialized. I list 3 buckets: early, late, and reinit, but that's a minimum number. There may be more than 3. And due to the "soupy" nature of <clinit>, it's not always easy to avoid depending on a field that's in a different bucket. And values in that 3rd bucket - the fields that need to be reinitialized - don't have a clear meaning when their value propagates around the program. Does it need to be cleared everywhere and force reinit of all consumers? Lots to figure out here.
We need a better model - whether that's library features or new language features - that makes it easier to express when (which phase) an operation should occur and some way to talk about the dependency chain of that value (all the classes that have to be initialized, values calculated, etc).
--Dan
/Kasper
On Thu, 26 May 2022 at 21:22, David P Grove <groved@us.ibm.com> wrote:
Hi, I’ve appended the contents of the referenced wiki page in this email. Apologies in advance if the formatting doesn’t come through as intended.
There is a full implementation of this (GPLv2 + Classpath exception) as part of the qbicc project on GitHub. There is also a GitHub discussion in the qbicc project that links to various GitHub issues that capture the history that led to the current design. I will not hyperlink to those here so that if people have any IP concerns, they can avoid seeing them. They are easily findable.
Regards,
--dave
On Tue, 31 May 2022 at 16:50, Dan Heidinga <heidinga@redhat.com> wrote:
On Fri, May 27, 2022 at 7:53 AM Kasper Nielsen <kasperni@gmail.com> wrote:
Hi David,
Thanks for the write-up.
One thing that isn't completely clear to me after reading this is why language changes (<rtinit>) are needed?
The <rtinit> model was a convenient way for us to explore a model that put all class initialization at build time, while allowing a small set of fields to be reinitialized at runtime. It also minimized the changes we had to make to the core JDK classes which makes maintaining the changes much easier given the rate of JDK updates. SubstrateVM uses a similar approach with their Substitutions for what I assume are similar reasons.
Leyden will be able to update the JDK core classes directly and can take a more direct approach to indicating in which phase a static field should be initialized.
It seems to me this could be entirely implemented via a standard API. Using ClassValue as the main inspiration you could have something like:
abstract class RuntimeLocal<T> { protected RuntimeLocal() { checkBuildTime(); VM.registerForRuntimeInitialization(this); } protected abstract T computeValue(); public final T get(); // Calls to get are optimized by the vm }
Usage would be something similar to:
class Usage {
static final LocalDateTime BUILD_TIME = LocalDateTime.now();
static final RuntimeLocal<LocalDateTime> RUNTIME_TIME = new RuntimeLocal<>() { protected LocalDateTime computeValue() { return LocalDateTime.now(); } }; }
I might be missing some details, but it seems to me that this approach would be strongly favorable to needing to change the language as well as adding new bytecodes.
This is a good starting point. I went a fair ways looking at how to group static fields into different classes to decouple their lifetimes and found that I couldn't cleanly split them into two groups.
I think there is an important distinction to make here between "phased class initialization" and "phased field initialization". Having used GraalVM's native image for some time. My experience is that is very hard to reason about phased class initialization. A saner model, I would argue, would be one where all classes are initialized at image build-time and never reinitialized. If a class needs laziness or reinitialization this must be done explicitly using <rtinit>/RuntimeLocal. If you have groups of fields that need to be initialized together this can be done by storing them in a record which can then be stored in a reinit field. In this model, you would still need to think about the usage of reinit fields. But you would never need to spend cycles on figuring out what phase a class was initialized in. But this is all something that can be discussed further down the line.
The problem is that while it's clear that some fields can be initialized early (build time) and others must be initialized late (runtime), there is a third group that needs to be reinitialized. I list 3 buckets: early, late, and reinit, but that's a minimum number. There may be more than 3. And due to the "soupy" nature of <clinit>, it's not always easy to avoid depending on a field that's in a different bucket. And values in that 3rd bucket - the fields that need to be reinitialized - don't have a clear meaning when their value propagates around the program. Does it need to be cleared everywhere and force reinit of all consumers? Lots to figure out here.
We need a better model - whether that's library features or new language features - that makes it easier to express when (which phase) an operation should occur and some way to talk about the dependency chain of that value (all the classes that have to be initialized, values calculated, etc).
I must admit I'm a bit skeptical about something like dependency tracking. Take something like System.lineSeparator() and a platform-independent image. Is it really realistic that we track all strings that are created using this method doing build-time? But, as you said lots to figure out:)
participants (6)
-
Brian Goetz
-
Christian Wimmer
-
Dan Heidinga
-
David P Grove
-
Kasper Nielsen
-
Volker Simonis