Re: Project Leyden: Beginnings
Hi Mark, Thanks for your all your efforts to finally get Leyden started!
The ultimate goal of this Project, as stated in the Call for Discussion [1], is to address the long-term pain points of Java’s slow startup time, slow time to peak performance, and large footprint.
You're probably aware of the CRaC project [1] which as well addresses the first two of these pain points (slow startup & slow time to peak performance) by leveraging checkpointing (aka. snapshotting) and restoring (aka. resuming) of fully warmed up JVM instances. [1] https://openjdk.java.net/projects/crac/
In the Call for Discussion I proposed that we address these pain points by introducing a concept of _static run-time images_ to the Java Platform, and to the JDK.
- A static image is a standalone program, derived from an application and a JDK, which runs that application -- and no other.
- A static image is a _closed world_ with respect to the classes that it can load: At run time it cannot load classes from outside the image, nor can it create classes dynamically.
The closed-world constraint imposes strict limits on Java’s natural dynamism, particularly on the run-time reflection and class-loading features upon which so many existing Java libraries and frameworks depend. Not all applications are well suited to this constraint, and not all developers are willing to live with it.
So rather than adopt the closed-world constraint at the start, I propose that we instead pursue a gradual, incremental approach.
Now that the goal of exploring "static images" is not the main goal of Leyden anymore (you should probably update the Leyden project page [2] to reflect this), the goals of CRaC and Leyden seem to match even more. CRaC's new execution model doesn't impose any constraints on "Java’s natural dynamism" so it should naturally support most server-side applications out of the box. Instead, CRaC imposes a new constraint for Java applications which we call "snapsafety". A snapsafe application can operate correctly and securely after it has been restored from a previously checkpointed (and possibly cloned) state. The main challenge for CRaC is to first make the JVM and the core libraries snapsafe before it exposes hooks to libraries and application to give them a chance to become snapsafe as well. [2] https://openjdk.java.net/projects/leyden/
We will explore a spectrum of constraints, weaker than the closed-world constraint, and discover what optimizations they enable. The resulting optimizations will almost certainly be weaker than those enabled by the closed-world constraint. Because the constraints are weaker, however, the optimizations will likely be applicable to a broader range of existing code -- thus they will be more useful to more developers.
We will work incrementally along this spectrum of constraints, starting small and simple so that we can develop a firm understanding of the changes required to the Java Platform Specification. Along the way we will strive, of course, to preserve Java’s core values of readability, compatibility, and generality.
It seems to me that "snapsafety" could be such a constraint and I hope for a fruitful and successful cooperation between the two projects. Thank you and best regards, Volker
We will lean heavily on existing components of the JDK including the HotSpot JVM, the C2 compiler, application class-data sharing (CDS), and the `jlink` linking tool.
In the long run we will likely embrace the full closed-world constraint in order to produce fully-static images. Between now and then, however, we will develop and deliver incremental improvements which developers can use sooner rather than later.
Let us begin!
- Mark
[1] https://mail.openjdk.java.net/pipermail/discuss/2020-April/005429.html
// https://openjdk.java.net/projects/leyden/notes/01-beginnings
On 5/20/2022 9:16 AM, Volker Simonis wrote:
Hi Mark,
Thanks for your all your efforts to finally get Leyden started!
The ultimate goal of this Project, as stated in the Call for Discussion [1], is to address the long-term pain points of Java’s slow startup time, slow time to peak performance, and large footprint.
You're probably aware of the CRaC project [1] which as well addresses the first two of these pain points (slow startup & slow time to peak performance) by leveraging checkpointing (aka. snapshotting) and restoring (aka. resuming) of fully warmed up JVM instances.
[1] https://openjdk.java.net/projects/crac/
In the Call for Discussion I proposed that we address these pain points by introducing a concept of _static run-time images_ to the Java Platform, and to the JDK.
- A static image is a standalone program, derived from an application and a JDK, which runs that application -- and no other.
- A static image is a _closed world_ with respect to the classes that it can load: At run time it cannot load classes from outside the image, nor can it create classes dynamically.
The closed-world constraint imposes strict limits on Java’s natural dynamism, particularly on the run-time reflection and class-loading features upon which so many existing Java libraries and frameworks depend. Not all applications are well suited to this constraint, and not all developers are willing to live with it.
So rather than adopt the closed-world constraint at the start, I propose that we instead pursue a gradual, incremental approach.
Now that the goal of exploring "static images" is not the main goal of Leyden anymore (you should probably update the Leyden project page [2] to reflect this), the goals of CRaC and Leyden seem to match even more. CRaC's new execution model doesn't impose any constraints on "Java’s natural dynamism" so it should naturally support most server-side applications out of the box. Instead, CRaC imposes a new constraint for Java applications which we call "snapsafety". A snapsafe application can operate correctly and securely after it has been restored from a previously checkpointed (and possibly cloned) state. The main challenge for CRaC is to first make the JVM and the core libraries snapsafe before it exposes hooks to libraries and application to give them a chance to become snapsafe as well.
[2] https://openjdk.java.net/projects/leyden/
We will explore a spectrum of constraints, weaker than the closed-world constraint, and discover what optimizations they enable. The resulting optimizations will almost certainly be weaker than those enabled by the closed-world constraint. Because the constraints are weaker, however, the optimizations will likely be applicable to a broader range of existing code -- thus they will be more useful to more developers.
We will work incrementally along this spectrum of constraints, starting small and simple so that we can develop a firm understanding of the changes required to the Java Platform Specification. Along the way we will strive, of course, to preserve Java’s core values of readability, compatibility, and generality.
It seems to me that "snapsafety" could be such a constraint and I hope for a fruitful and successful cooperation between the two projects.
I think we have an opportunity in Leyden to improve the language and platform to support such concepts. I don't know the details of "snapsafety", but in general we should have language support to indicate some sort of "immutable" constraints. These constraints can be validated (so that we can use pre-optimized snapshot (s)), or invalidated (so we will go back to the old slow-but-correct initialization). Also, in addition to a single snapshot of an app, perhaps we can also consider multiple snapshots at a lower granularity. One parallel to draw from is the "constexpr" keyword in C++. However, "constexpr" only deals with language-level constructs. For Java, perhaps we need something that includes a wider set of environmental dependencies. For example, many immutable tables in Java apps are created from external XML files. Do we want a way to snapshot such tables? Maybe we can do that if the XML files are statically stored inside a jlink image? Again, I don't know what the answer is, but I am excited that we are able to look for solutions at all levels of the language and platform. Thanks - Ioi
Thank you and best regards, Volker
We will lean heavily on existing components of the JDK including the HotSpot JVM, the C2 compiler, application class-data sharing (CDS), and the `jlink` linking tool.
In the long run we will likely embrace the full closed-world constraint in order to produce fully-static images. Between now and then, however, we will develop and deliver incremental improvements which developers can use sooner rather than later.
Let us begin!
- Mark
[1] https://mail.openjdk.java.net/pipermail/discuss/2020-April/005429.html
// https://openjdk.java.net/projects/leyden/notes/01-beginnings
n 23 May 2022, at 9:10, Ioi Lam wrote:
On 5/20/2022 9:16 AM, Volker Simonis wrote:
… It seems to me that "snapsafety" could be such a constraint and I hope for a fruitful and successful cooperation between the two projects.
A snappy term indeed! When applied to the existing Java platform, the concept (probably) leads to all sorts of complicated considerations about remote and hidden side effects and environmental queries. As Ioi points out, the big new thing here, not possible outside of Leyden, is the option to *modify* the Java language specification (and standard libraries), if we think it helps clarify or simplify the (suitably modified) definition of snapsafety.
I think we have an opportunity in Leyden to improve the language and platform to support such concepts. I don't know the details of "snapsafety", but in general we should have language support to indicate some sort of "immutable" constraints. These constraints can be validated (so that we can use pre-optimized snapshot (s)), or invalidated (so we will go back to the old slow-but-correct initialization).
The part of the language I like to think about changing is not so much assertions (maybe `assert`s) about past events (which are those “immutable constraints”?) but rather relaxation or modification for rules regarding order of evaluation, for suitably marked expressions and statements. The small scale constant-folding rules which every JIT uses are really order of evaluation changes: An expression like `1+2+x` folding to `3+x` takes the expression `1+2` and moves it “back in time” to JIT time. This is safe because the JIT knows there is no way the program can give evidence of the difference (unless a debugger single-steps through bytecodes). But I think we should chase after constant-folding this sort of thing: ``` Object lookup(String x) { // hey, can someone please do this just once, at jlink time? var mydata = readHashTable(findResourceFile("mydata.xml”)); // this depends on x, so cannot be moved back in time: return var.get(x); } ``` The standard technique is to put `mydata` in a static final variable. And now that’s easy to do inline as well: ``` Object lookup(String x) { // like a C++ static, the initializer is executed on first use: class Static { static final HashMap<String,Object> mydata = readHashTable(findResourceFile("mydata.xml”)); // but still, can someone please do it just once, at jlink time? } // this depends on x, so cannot be moved back in time: return Static.mydata.get(x); } ``` (Side note: Reading files throws a checked exception. Does this mean that the above method should be amended to throw a possible checked exception, but marked as “somewhere in the past”? If so, then time-shifted expressions would need to have associated time-shifted exception checking rules.) This is a kind of time-shifting currently under programmer control. It suggests to me that we can and should double down on supporting static final state (and also lazy statics as in JDK-8209964), by focusing some effort on time-shifting not so much arbitrary expressions and statements, but the initialization of classes. If a programmer could mark a *whole class* as time-shiftable in its initialization, then the programmer could expect that jlink could make good provisioning decisions about that class, rather than the current standard policy of initializing a class on first use (of a static or of an instance creation). One more bit of mental framework: A Java class is initialized no earlier and no later than its first initializing use (static or instance creation). Certainly there must be other events that the class initialization could be referred to. “jlink time” is a hazy concept, but program startup is not: A Java program starts just before its selected `main` entry point is run. If a class C could be marked (by the programmer) as being initialized no earlier than entry to `main`, then the programmer could certify that the class is a candidate for pre-initialization, regardless of the change of semantics (relative to Java’s current order of class initialization). And that would solve some (not all) of the problems around making valid jlink-time evaluations. I guess I’m suggesting that a language-level proxy for “jlink time” is main method entry. I suspect that time-shifted class initialization probably needs a concept of time-shifted dependency (as well as time-shifted exceptions, see above?) so that if class C is marked as “can initialize around main entry” C can also be marked as “but no earlier than initialization of D”, for some other class D that C’s initialization depends on. (The work on lazies JDK-8209964 is sort of a complementary image of what Leyden is after, since a lazy variable is time-shifted *after* its containing class is initialized, another change from standard Java rules. The two kinds of time shifting, backward and forward, probably deserve a combined treatment of some sort.)
Also, in addition to a single snapshot of an app, perhaps we can also consider multiple snapshots at a lower granularity.
One parallel to draw from is the "constexpr" keyword in C++. However, "constexpr" only deals with language-level constructs. For Java, perhaps we need something that includes a wider set of environmental dependencies. For example, many immutable tables in Java apps are created from external XML files. Do we want a way to snapshot such tables? Maybe we can do that if the XML files are statically stored inside a jlink image?
Again, I don't know what the answer is, but I am excited that we are able to look for solutions at all levels of the language and platform.
Thanks - Ioi
Thank you and best regards, Volker
We will lean heavily on existing components of the JDK including the HotSpot JVM, the C2 compiler, application class-data sharing (CDS), and the `jlink` linking tool.
In the long run we will likely embrace the full closed-world constraint in order to produce fully-static images. Between now and then, however, we will develop and deliver incremental improvements which developers can use sooner rather than later.
Let us begin!
- Mark
[1] https://mail.openjdk.java.net/pipermail/discuss/2020-April/005429.html
// https://openjdk.java.net/projects/leyden/notes/01-beginnings
----- Original Message -----
From: "John Rose" <john.r.rose@oracle.com> To: "Ioi Lam" <ioi.lam@oracle.com> Cc: leyden-dev@openjdk.java.net Sent: Tuesday, May 24, 2022 8:00:21 PM Subject: Re: Project Leyden: Beginnings
n 23 May 2022, at 9:10, Ioi Lam wrote:
On 5/20/2022 9:16 AM, Volker Simonis wrote:
… It seems to me that "snapsafety" could be such a constraint and I hope for a fruitful and successful cooperation between the two projects.
A snappy term indeed! When applied to the existing Java platform, the concept (probably) leads to all sorts of complicated considerations about remote and hidden side effects and environmental queries.
As Ioi points out, the big new thing here, not possible outside of Leyden, is the option to *modify* the Java language specification (and standard libraries), if we think it helps clarify or simplify the (suitably modified) definition of snapsafety.
I think we have an opportunity in Leyden to improve the language and platform to support such concepts. I don't know the details of "snapsafety", but in general we should have language support to indicate some sort of "immutable" constraints. These constraints can be validated (so that we can use pre-optimized snapshot (s)), or invalidated (so we will go back to the old slow-but-correct initialization).
The part of the language I like to think about changing is not so much assertions (maybe `assert`s) about past events (which are those “immutable constraints”?) but rather relaxation or modification for rules regarding order of evaluation, for suitably marked expressions and statements.
The small scale constant-folding rules which every JIT uses are really order of evaluation changes: An expression like `1+2+x` folding to `3+x` takes the expression `1+2` and moves it “back in time” to JIT time. This is safe because the JIT knows there is no way the program can give evidence of the difference (unless a debugger single-steps through bytecodes). But I think we should chase after constant-folding this sort of thing:
``` Object lookup(String x) { // hey, can someone please do this just once, at jlink time? var mydata = readHashTable(findResourceFile("mydata.xml”)); // this depends on x, so cannot be moved back in time: return var.get(x); } ```
The standard technique is to put `mydata` in a static final variable. And now that’s easy to do inline as well:
``` Object lookup(String x) { // like a C++ static, the initializer is executed on first use: class Static { static final HashMap<String,Object> mydata = readHashTable(findResourceFile("mydata.xml”)); // but still, can someone please do it just once, at jlink time? } // this depends on x, so cannot be moved back in time: return Static.mydata.get(x); } ```
(Side note: Reading files throws a checked exception. Does this mean that the above method should be amended to throw a possible checked exception, but marked as “somewhere in the past”? If so, then time-shifted expressions would need to have associated time-shifted exception checking rules.)
This is a kind of time-shifting currently under programmer control. It suggests to me that we can and should double down on supporting static final state (and also lazy statics as in JDK-8209964), by focusing some effort on time-shifting not so much arbitrary expressions and statements, but the initialization of classes. If a programmer could mark a *whole class* as time-shiftable in its initialization, then the programmer could expect that jlink could make good provisioning decisions about that class, rather than the current standard policy of initializing a class on first use (of a static or of an instance creation).
In my opinion, lazy static is enough, we may not need a class wide keyword. Lazy static means - it is not executed as part of the static initialize (not in <clinit>) - the initialization expression has to be executed before the first access, it can be just before the first access or a long time before, offline. - if an exception occurs during the execution of the initialization, the exception is wrapped into a (subclass of) LinkageError, any attempt to access to the static variable will throw that exception (the same way constant pool constant are resolved) so perhaps lazy static is not the right term, perhaps "const" is a better term.
One more bit of mental framework: A Java class is initialized no earlier and no later than its first initializing use (static or instance creation). Certainly there must be other events that the class initialization could be referred to. “jlink time” is a hazy concept, but program startup is not: A Java program starts just before its selected `main` entry point is run. If a class C could be marked (by the programmer) as being initialized no earlier than entry to `main`, then the programmer could certify that the class is a candidate for pre-initialization, regardless of the change of semantics (relative to Java’s current order of class initialization). And that would solve some (not all) of the problems around making valid jlink-time evaluations. I guess I’m suggesting that a language-level proxy for “jlink time” is main method entry.
startup time may be not that well defined because of project Crac.
I suspect that time-shifted class initialization probably needs a concept of time-shifted dependency (as well as time-shifted exceptions, see above?) so that if class C is marked as “can initialize around main entry” C can also be marked as “but no earlier than initialization of D”, for some other class D that C’s initialization depends on.
(The work on lazies JDK-8209964 is sort of a complementary image of what Leyden is after, since a lazy variable is time-shifted *after* its containing class is initialized, another change from standard Java rules. The two kinds of time shifting, backward and forward, probably deserve a combined treatment of some sort.)
or the exact time of the evaluation is not guarantee. Dependencies is an issue. To be allowed to be pre-computed a "const" variable should not depend transitively on the execution of a static init block.
Also, in addition to a single snapshot of an app, perhaps we can also consider multiple snapshots at a lower granularity.
One parallel to draw from is the "constexpr" keyword in C++. However, "constexpr" only deals with language-level constructs. For Java, perhaps we need something that includes a wider set of environmental dependencies. For example, many immutable tables in Java apps are created from external XML files. Do we want a way to snapshot such tables? Maybe we can do that if the XML files are statically stored inside a jlink image ?
or if the XML file is read at before-runtime. This is something Quarkus (and Micronaut) does, they uses annotation processor or bytecode patching to inject constants in-between the compilation and runtime.
Again, I don't know what the answer is, but I am excited that we are able to look for solutions at all levels of the language and platform.
If we knew the anwser, Leyden is not necessary :) Rémi
Thanks - Ioi
Thank you and best regards, Volker
We will lean heavily on existing components of the JDK including the HotSpot JVM, the C2 compiler, application class-data sharing (CDS), and the `jlink` linking tool.
In the long run we will likely embrace the full closed-world constraint in order to produce fully-static images. Between now and then, however, we will develop and deliver incremental improvements which developers can use sooner rather than later.
Let us begin!
- Mark
[1] https://mail.openjdk.java.net/pipermail/discuss/2020-April/005429.html
// https://openjdk.java.net/projects/leyden/notes/01-beginnings
On 23 May 2022, at 9:10, Ioi Lam wrote: (more)
… One parallel to draw from is the "constexpr" keyword in C++.
Please take a look also at the ideas in D around pure and immutable computations. They allow time-shifting of very complicated D programs to compile time. I saw a demo (long ago) of compile-time data weaving in D which took an immutable bundle of strings, transformed them and ran the result as D expressions through the D compiler itself (as a subtask, at compile-time), and took the resulting output as further input to incorporate into the D program. Basically, it was as if the D preprocessor suddenly was a metaprogramming framework. And it worked, not as a special hack, but as a corollary of very cleanly worked out D-language rules for purity and immutability, plus the fact that much of the D standard libraries (including the D compiler) were pure enough to play these games with. Reference: https://dlang.org/spec/function.html#pure-functions (I see from dlang.org they have fancy templates now. They probably interoperate well with ad hoc compile-time computations. The C++ constexpr stuff is moving that way too, I guess.) Also, unlike a compiled language, we have a virtual machine that can (in principle) be asked to verify purity of methods on the fly. This means we can (in principle) have pure functions which are separately linked, and do not need to be dumped into the current compilation. Of course native methods and Panama downcalls would have to either be rejected or manually certified, but that’s all part of the game.
However, "constexpr" only deals with language-level constructs. For Java, perhaps we need something that includes a wider set of environmental dependencies.
Yes. (I have thought for some years that our keyword `const` has been waiting to be used to annotate time-shifted computations. That’s just bikeshedding of course.) I think, given Java’s embrace of dynamic linking, it might make sense to define an idea of a time-shifted *value* as well as a time-shifted expression or statement. By that I mean a normal method could (perhaps) declare some but not all of its *parameters* as `static` (or `constexpr` or whatever) with the meaning that it is requesting that the corresponding actual arguments be time-shifted (if possible) at every invocation point of that method. Then a dynamically linked method could still partially play the time-shifting game, in some of its parameters. (And similarly for local variables. Sort of a “better static” with a dependency-driven initialization order.) If a method’s formal parameter is marked `static`, then expressions using that parameter inside the method are also candidates for early evaluation. (This means the JVM or somebody has to keep track of separate derived values for each method call site. Doable but tricky.) This might give a framework to thread through all the pre-evaluated values, through an application workload, without disrupting the logic by partitioning it into disjoint “before” and “after” phases. And I think all of the above works about as well, not only for time-shifting back in time to pre-evaluation in jlink, but also for time-shifting forward in time to lazy evaluation. Java already has lots of lazy evaluation in it, notably on-demand class initialization and (under the covers) condy/indy. If we had a way to mark program portions as time-shiftable, that would naturally parley out into a way to work with lazy computations, as well as pre-evaluated ones. This is the way I would prefer to handle recurrent requests (from me and others) for APIs which help string templates to support syntax-specific validation (SQL, XML, etc.). Such validation, for a constant template, should happen as early as possible, ideally in the IDE, and certainly at jlink time. Time-shifting can be a foundation for static validation.
----- Original Message -----
From: "John Rose" <john.r.rose@oracle.com> To: "Ioi Lam" <ioi.lam@oracle.com> Cc: "leyden-dev" <leyden-dev@openjdk.java.net> Sent: Tuesday, May 24, 2022 8:30:32 PM Subject: Re: Project Leyden: Beginnings
On 23 May 2022, at 9:10, Ioi Lam wrote:
(more)
… One parallel to draw from is the "constexpr" keyword in C++.
Please take a look also at the ideas in D around pure and immutable computations. They allow time-shifting of very complicated D programs to compile time. I saw a demo (long ago) of compile-time data weaving in D which took an immutable bundle of strings, transformed them and ran the result as D expressions through the D compiler itself (as a subtask, at compile-time), and took the resulting output as further input to incorporate into the D program. Basically, it was as if the D preprocessor suddenly was a metaprogramming framework. And it worked, not as a special hack, but as a corollary of very cleanly worked out D-language rules for purity and immutability, plus the fact that much of the D standard libraries (including the D compiler) were pure enough to play these games with.
Reference: https://dlang.org/spec/function.html#pure-functions
(I see from dlang.org they have fancy templates now. They probably interoperate well with ad hoc compile-time computations. The C++ constexpr stuff is moving that way too, I guess.)
Also, unlike a compiled language, we have a virtual machine that can (in principle) be asked to verify purity of methods on the fly. This means we can (in principle) have pure functions which are separately linked, and do not need to be dumped into the current compilation. Of course native methods and Panama downcalls would have to either be rejected or manually certified, but that’s all part of the game.
yes, Zig does somethoing similar too. https://kristoff.it/blog/what-is-zig-comptime/
However, "constexpr" only deals with language-level constructs. For Java, perhaps we need something that includes a wider set of environmental dependencies.
Yes. (I have thought for some years that our keyword `const` has been waiting to be used to annotate time-shifted computations. That’s just bikeshedding of course.) I think, given Java’s embrace of dynamic linking, it might make sense to define an idea of a time-shifted *value* as well as a time-shifted expression or statement. By that I mean a normal method could (perhaps) declare some but not all of its *parameters* as `static` (or `constexpr` or whatever) with the meaning that it is requesting that the corresponding actual arguments be time-shifted (if possible) at every invocation point of that method. Then a dynamically linked method could still partially play the time-shifting game, in some of its parameters. (And similarly for local variables. Sort of a “better static” with a dependency-driven initialization order.) If a method’s formal parameter is marked `static`, then expressions using that parameter inside the method are also candidates for early evaluation. (This means the JVM or somebody has to keep track of separate derived values for each method call site. Doable but tricky.) This might give a framework to thread through all the pre-evaluated values, through an application workload, without disrupting the logic by partitioning it into disjoint “before” and “after” phases.
The problem of any keywords like constexpr is that it does not work well when you have libraries in the middle (like an XML parser) and you requires everything to be transitively a constexpr.
And I think all of the above works about as well, not only for time-shifting back in time to pre-evaluation in jlink, but also for time-shifting forward in time to lazy evaluation. Java already has lots of lazy evaluation in it, notably on-demand class initialization and (under the covers) condy/indy. If we had a way to mark program portions as time-shiftable, that would naturally parley out into a way to work with lazy computations, as well as pre-evaluated ones.
I believe we can conflate the two by saying that an expression can be evaluated whenever it suits the jlink/runtime because in both cases you want something that relax the initialization order.
This is the way I would prefer to handle recurrent requests (from me and others) for APIs which help string templates to support syntax-specific validation (SQL, XML, etc.). Such validation, for a constant template, should happen as early as possible, ideally in the IDE, and certainly at jlink time. Time-shifting can be a foundation for static validation.
yes ! with less syntactic sugar please (i wonder if there is a rehab for syntactic sugar ?). Rémi
participants (4)
-
Ioi Lam
-
John Rose
-
Remi Forax
-
Volker Simonis