From tanksherman27 at gmail.com  Wed Jun  1 05:07:50 2022
From: tanksherman27 at gmail.com (Julian Waters)
Date: Wed, 1 Jun 2022 13:07:50 +0800
Subject: Can Ahead of Time code benefit regular Java applications too?
In-Reply-To: <9f70a2d5-5cb1-e615-b76b-957f95ac9928@oracle.com>
References: <CAP2b4GPZxnMFaFEEZX8utmCM1pWqWfs=VRd9+0KDA73SPxg9iQ@mail.gmail.com>
 <9f70a2d5-5cb1-e615-b76b-957f95ac9928@oracle.com>
Message-ID: <CAP2b4GPfpGCEC2bb2F8w61-ivyLUthqy2Jf=y9WpD_L9jV56cw@mail.gmail.com>

The prospect of version agnostic jars which any JVM version can use
certainly sounds lucrative, but I don't think it's a must-have, especially
if the issues supporting such a feature makes pursuing the idea not worth
it. To my knowledge, when it was still being actively developed, jaotc
shared libraries were also specific to the OS and JVM they were
compiled for. Likewise, perhaps working in a similar fashion to
intrinsics, you could have certain sections of regularly compiled Java code
within jars replaced by native code compiled by C1 (or C2?) if the JVM it
was compiled by and the target OS/CPU match the current running JVM and
OS/CPU (Indeed, this is how the Velocity project checks if it should load
its own shared libraries or fall back to a Java implementation if it
detects that the current platform is suitable for native code acceleration
- https://github.com/PaperMC/Velocity/tree/dev/3.0.0/native). In a way,
this might be similar to Anton's suggestion of "A closed world start image
that is restored into an open world Java application" on another thread
within this mailing list, and the involvement of CraC within Leyden.

best regards,
Julian

On Wed, Jun 1, 2022 at 5:42 AM Ioi Lam <ioi.lam at oracle.com> wrote:

>
>
> On 5/30/2022 6:07 AM, Julian Waters wrote:
> > Hi all,
> >
> > Since Leyden's goal has shifted from originally exploring only binaries
> > compiled directly to native code, to "address the long-term pain points
> of
> > Java?s slow startup time, slow time to peak performance, and large
> > footprint", would there be any merit in looking at allowing native code
> to
> > be embedded within jars to bypass the Interpreter at runtime? (Maybe have
> > Ahead of Time code that replaces the Interpreter be compiled by C1, and
> > treat it as part of the C1 pipeline so it can be profiled while being
> run)
> > Ideally it'd be similar to the now defunct jaotc, but more compact
> (within
> > the jar itself or perhaps the classfiles somehow) instead of compiling
> the
> > Ahead of Time code into an entirely separate file which then needs to be
> > explicitly passed to the JVM at runtime. This may or may not be a good
> > starting point before advancing to entirely standalone Java binaries,
> but I
> > digress. Perhaps the experience of the CraC team would be of some help in
> > this area?
> >
> > best regards,
> > Julian
>
> What kind of interface and dependency between the JVM and the native
> code would be needed to support this?
>
> As far as I can tell, the Leyden discussions have been about producing
> artifacts (native code or heap dumps) that are tightly bound to a
> specific build of the JDK. If you want a (version agnostic) JAR file to
> contain native code that can be used by arbitrary JDKs, that would raise
> the complexity quite significantly.
>
> Thanks
> - Ioi
>
>

From tanksherman27 at gmail.com  Wed Jun  1 05:25:12 2022
From: tanksherman27 at gmail.com (Julian Waters)
Date: Wed, 1 Jun 2022 13:25:12 +0800
Subject: Improve determinism in the Java language
In-Reply-To: <e78ab3b5-1864-239e-5a6f-fb569a298a01@oracle.com>
References: <e78ab3b5-1864-239e-5a6f-fb569a298a01@oracle.com>
Message-ID: <CAP2b4GO-U+KYzmAJECseS_Gbvy83HYkPVshU05GycD0nuzSOCg@mail.gmail.com>

I'm leaning towards making certain parts of Java stricter if it's being
compiled Ahead of Time, such as the compile time linking you mention, much
like what languages such as C and C++ require you to do when generating
binaries (Using the rough analogy of object files as compared to
classfiles). Many of the dynamic features in the language typically only
make sense if being run with a JVM anyway, such as using reflection to
modify access to fields and methods, something which is significantly
harder to do in a standalone executable. Not being able to optimize code
based on a certain condition seems like a bit of a waste to me.

best regards,
Julian

On Wed, Jun 1, 2022 at 5:21 AM Ioi Lam <ioi.lam at oracle.com> wrote:

> A lot of the recent Leyden discussion has been around "what
> optimizations can be done ahead of time" (e.g., static field
> initialization). However, I think we also need to look at a
> lower level.
>
> One reason that Java has been difficult to optimize ahead-of-time
> is the tremendous dynamism in the language.
>
> Here are a few things that I think we can do to make Java programs
> more deterministic so that ahead of time optimizations can
> be applied:
>
> 1 Deterministic Program Code
>
>    A Java program can essentially rewrite itself and even
>    the libraries it uses. Here's an example:
>
>    class App {
>        static {
>            if (...) {
>                MethodHandles.lookup()
>                    .defineClass(.. hacked App$Bar ...);
>            }
>        }
>        static final Bar bar = new Bar();
>        static class Bar {
>            ....
>        }
>    }
>
>
>    - We can't effectively AOT-compile the program code because
>      the native code may not match the runtime generated
>      bytecodes.
>
>    - We can't pre-initialize the App.bar field because its shape
>      may be different.
>
>    One idea is to disallow such code patching when Leyden is enabled.
>    For example, we can require that to use Leyden, an application
>    must be "prelinked", which means that as soon as the application
>    is loaded, the classes App and App$Bar are already loaded. The
>    defineClass() call will fail with a LinkageError (duplicated class
>    definition).
>
>
> 2 Decouple class namespaces from dynamic bytecode generation
>
>    This is a corollary of the above item. Java uses
>    ClassLoader.defineClass() for BOTH namespace and dynamic
>    bytecode generation. I would stipulate that most users
>    of Leyden want to do the former and not the latter.
>
>    We should have a new API to load a fixed set of classes
>    into a namespace.
>
>
> 3 <clinit> order
>
>    Java allows <clinit>s that recursively depend on each other. The
>    result depends on the reference order of these classes.
>
>    class A {  static final int a = B.b++; }
>    class B {  static final int b = A.a++; }
>
>    We could have a problem if the application assumes that A is
>    always initialized before B, but the Leyden optimizer
>    initializes them in the opposite order.
>
>    We could:
>
>    - Refuse to optimize classes that have mutually recursive
>      <clinit>, or
>    - Change the language spec to give the JVM more freedom to
>      decide the initialization order.
>
>
>

From aph at redhat.com  Wed Jun  1 09:32:45 2022
From: aph at redhat.com (Andrew Haley)
Date: Wed, 1 Jun 2022 10:32:45 +0100
Subject: Can Ahead of Time code benefit regular Java applications too?
In-Reply-To: <CAP2b4GPfpGCEC2bb2F8w61-ivyLUthqy2Jf=y9WpD_L9jV56cw@mail.gmail.com>
References: <CAP2b4GPZxnMFaFEEZX8utmCM1pWqWfs=VRd9+0KDA73SPxg9iQ@mail.gmail.com>
 <9f70a2d5-5cb1-e615-b76b-957f95ac9928@oracle.com>
 <CAP2b4GPfpGCEC2bb2F8w61-ivyLUthqy2Jf=y9WpD_L9jV56cw@mail.gmail.com>
Message-ID: <0bd32c26-661e-3730-d93d-35e79d4823a5@redhat.com>

On 6/1/22 06:07, Julian Waters wrote:
> Likewise, perhaps working in a similar fashion to
> intrinsics, you could have certain sections of regularly compiled Java code
> within jars replaced by native code compiled by C1 (or C2?) if the JVM it
> was compiled by and the target OS/CPU match the current running JVM and
> OS/CPU

The problem there would be that of jaotc: it worked, but because the pre-
compiled code was not patchable, it had to use indirection for all accesses.
So, every field offset, method reference, etc. went through a writable
section. All of these had to be fixed up, and of course it bulked out
the runtime. The whole process, in the end, wasn't much quicker than
C1 compilation.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


From mike at hydraulic.software  Wed Jun  1 13:03:08 2022
From: mike at hydraulic.software (Mike Hearn)
Date: Wed, 1 Jun 2022 15:03:08 +0200
Subject: AppCDS / AOT thoughts based on CLI app experience
Message-ID: <CAGv+2bMrYaGwg11moPbLwpSrhTPva-8t3sAF4SNcp_pvJWwN2g@mail.gmail.com>

Hi,

It feels like most of the interest in static Java comes from the
microservices / functions-as-a-service community. My new company spent
the last year creating a developer tool that runs on the JVM (which
will be useful for Java developers actually, but what it does is
irrelevant here). Internally it's a kind of build system and is thus a
large(ish) CLI app in which startup time and throughput are what
matter most. We also have a separate internal tool that uses Kotlin
scripting to implement a bash-like scripting language, and which is
sensitive in the same ways.

Today the JVM is often overlooked for writing CLI apps due to startup
time, 'lightness' and packaging issues. I figured I'd write down some
notes based on our experiences. They cover workflow, performance,
implementation costs and security issues. Hopefully it's helpful.

1.

I really like AppCDS because:

a. It can't break the app so switching it on/off a no-brainer. Unlike
native-image/static java, no additional testing overhead is created by
it.

b. It's effective even without heap snapshotting. We see a ~40%
speedup for executing --help

c. It's pay-as-you-go. We can use a small archive that's fast to
create to accelerate just the most latency sensitive startup paths, or
we can use it for the whole app, but ultimately costs are
controllable.

d. Archives are deterministic. Modern client-side packaging systems
support delta updates, and CDS plays nicely with them. GraalVM native
images are non-deterministic so every update is going to replace the
entire app, which isn't much fun from an update speed or bandwidth
consumption perspective.

Startup time is dominated by PicoCLI which is a common problem for
Java CLI apps. Supposedly the slowest part is building the model of
the CLI interface using reflection, so it's a perfect candidate for
AppCDS heap snapshotting.  I say supposedly, because I haven't seen
concrete evidence that this is actually where the time goes, but it
seems like a plausible belief. There's a long standing bug filed to
replace reflection with code generation but it's a big job and so
nobody did it.

Unfortunately the app will ship without using AppCDS. Some workflow
issues remain. These can be solved in the app itself, but it'd be nice
if the JVM does it.

The obvious way to use CDS is to ship an archive with the app. We
might do this as a first iteration, but longer term don't want to for
two reasons:

a. The archive can get huge.
b. Signature verification penalties on macOS (see below).

For just making --help and similar short commands faster size isn't so
bad (~6-10mb for us), but if it's used for a whole execution the
archive size for a standard run is nearly the same as total bytecode
size of the app. As more stuff gets cached this will get worse.
Download size might not matter much for this particular app, but as a
general principle it does. So a nice improvement would be to generate
it client side.

CDS files are caches and different platforms have different
conventions for where those go. The JVM doesn't know about those
conventions but our app does, so we'd need our custom native code
launcher (which exists anyway for other reasons) to set the right
paths for CDS.

Then you have to pick the right flags depending on whether the CDS
file exists or not. I follow CDS related changes and believe this is
fixed in latest Java versions but maybe (?) not released yet.

Even once that's fixed it's not quite obvious that we'd use it. The
JVM runs much slower when dumping a dynamic CDS archive and the first
run is when first impressions are made. Whilst for cloud stuff this is
a matter of (artificially?) expensive resources, for CLI apps it's
about more subjective things like feeling snappy. One idea is to delay
dumping a CDS archive until after the first run is exiting, so it
doesn't get in the way. The first run wouldn't benefit from the
archive which is a pity (except on Linux where the package managers
make it easy to run code post-install), but it at least wouldn't be
slowed down by creating it either. The native launcher can schedule
this. Alternatively there could be a brief pause on first run when the
user is told explicitly that the app is optimizing itself, but how
feasible that is depends very much on dump speed. Finally we could
ship a small archive that only covers startup, and then in parallel
make a dump of a full run in the background.

Speaking of which, there's a need for some protocol to drive an app
through a representative 'trial run'. Whether it's generating the
class list or the archive itself, it could be as simple as an
alternative static method that sits next to main. If it were to be
standardized the rest of the infrastructure becomes more re-usable,
for instance build systems can take care of generating classlists, or
the end-user packaging can take care of dynamic dumping.

CDS has two modes and it's not clear which is better. I'm unusually
obsessive about this stuff to the extent of reading the CDS source
code, but despite that I have absolutely no idea if I should be trying
to use static or dynamic archives. There used to be a performance
difference between them but maybe it's fixed now? There's a lack of
end-to-end guidance on how to exploit this feature best.

The ideal would obviously be losing the dump/exec split and make
dynamic dumping continuous, incremental and imposing no performance
penalty. Then we could just supply a path to where the CDS file should
go and things magically warm up across executions. I have no idea how
feasible that is.

Once AppCDS archives are in place and being created at the right
times, a @Snapshotted annotation for fields (or similar) should be an
easy win to eliminate the bulk of the rest of the PicoCLI time.
Dynamically loaded heaps would also be useful to eliminate the
overhead of loading configs and instantiating the (build) task graph
without a Gradle-style daemon.

2.

AppCDS archives can open a subtle security issue when distributing
code to desktop platforms. Because they're full of vtables anyone who
can write to them can (we assume) take over any JVM that loads the
archive and gain whatever privileges have been granted to that app.
The archive file is fully trusted.

On Windows and Linux this doesn't matter. On Linux sensitive files can
be packaged or created in postinst scripts. On Windows either an app
comes with a legacy installer/MSI file and thus doesn't have any
recognized package identity that can be granted extra permissions, or
it uses the current gen MSIX system. In the latter case Windows has a
notion of app identity and so you can request permissions to access
e.g. keychain entries, the user's calendar etc, but in that case
Windows also gives you a private directory that's protected from other
apps where sensitive files can be stashed. AppCDS archives can go
there and we're done.

MacOS is a problem child. There are two situations that matter.

In the first case archives are shipped as data files with the app.
Security is not an issue here, but there's a subtle performance
footgun. On most platforms signatures of files shipped with an app are
checked at install time but on macOS they aren't. Thanks to its NeXT
roots it doesn't really have an installation concept, and thus the
kernel checks signatures of files on first use then caches the
signature check in the kernel vnode. By default the entire file is
hashed in order to link it back to the root signature, which for large
files can impose a small but noticeable delay before the app can open
them. This first run penalty is unfortunate given that AppCDS exists
partly to improve startup time. You can argue it doesn't matter much
due to the caching, but it's worth being aware of - very large AppCDS
archives would get fully paged in and hashed before the app even gets
to do anything. In turn that means people might enable AppCDS with a
big classlist expecting it to speed things up, not noticing that for
Mac users only it slowed things down instead. There are ways to fix
this using supported Apple APIs. One is to supply a CodeDirectory
structure stored in extended attributes: you should get incremental
hashing and normal page fault behaviour (untested!). Another is to
wrap the data in a Mach-O file.

In the second case the CDS archive is being generated client side. Mac
apps don't have anywhere they can create tamperproof data, except for
very small amounts in the keychain. Thus if a Mac app opens a
malicious cache file that can take control of it that's a security
bug, because it'd allow one program to grab any special privileges the
user granted to another. The fact that the grabbing program has passed
GateKeeper and notarization doesn't necessarily matter (Apple's
guidance on this is unclear, but it seems plausible that this is their
stance). In this case the key chain can be used as a root of trust by
storing a hash of the CDS archive in it and checking that after
mmap/before use. Alternatively, again, Apple provides an API that lets
you associate an on-disk (xattr) CodeDirectory structure with a file
which will then be checked incrementally at page fault time. Extreme
care must be taken to avoid race conditions, but in theory, a
CodeDirectory structure can be computed at dump time, written to disk
as an xattr, and then stored again in the key chain (e.g. by
pretending it's a "key" or "password"). After the security API is
instructed to associate a CD with the file, it can be checked against
the tamperproofed version stored in the key chain and if they match,
the archive can then be mmapped and used as normal.

Native images don't have these issues because the state snapshot is
stored inside the Mach-O file and thus gets covered by the normal
mechanisms. However once it adds support for persisted heaps, the same
issue may arise.

Whether it's worth doing the extra work to solve this is unclear. Macs
are guaranteed to come with very fast NVMe disks and CPUs. Still, it's
worth being aware of the issue.

3.

Why not just use a native image then? Maybe we'll do that because the
performance wins are really compelling, but again, v1 will ship
without this for the following reasons:

a. Static minification can break things. Our integration tests
currently invoke the entry point of the app directly, but that could
be fixed to run the tool in an external process. For unit tests the
situation is far murkier. It's a bit unclear how to run JUnit tests
against the statically compiled version and it may not even make sense
(because the tests would pin a bunch of code that might get stripped
in the real app so what are you really testing?).

b. It'd break delta updates. Not the end of the world, but a factor.

c. I have no idea if we're using any libraries that spin bytecode
dynamically. Even if we're not today, what if tomorrow we want to use
such a library? Do we have to avoid using it and increase the cost of
feature development, or roll back the native image and give our users
a nasty performance downgrade? Neither option is attractive. Ideally
SubstrateVM would contain a bytecode interpreter and use it when
necessary. Lots of issues there but e.g. it'd probably be OK if it's
not a general classloader and the code dependencies have to be known
AOT.

d. Similar to (c), fully AOT compilation can explode code and thus
download size even though many codepaths are cold and only execute
once. It'd be nice if a native image could include a mix of bytecode
and AOT compiled hotspots.

e. Once you're past the initial interactive stage the program is
throughput sensitive. How much of a perf downgrade over HotSpot would
we get, if any? With GraalVM EE we could use PGO and not lose any, but
the ISV pricing is opaque. At any rate to answer this we have to fix
the compatibility issues first. The prospect of improving startup time
and then discovering we slowed down the actual builds isn't really
appealing (though I suspect in our case AOT wouldn't really hurt
much).

f. What if we want to support in-process plugins? Maybe we can use
Espresso, but this is a road less travelled (lack of tutorials, well
documented examples etc).

An interesting possibility is using a mix of approaches. For the bash
competitor I mentioned earlier dynamic code loading is needed because
the script bytecode is loaded into the host JVM, but the Kotlin
compiler itself could theoretically be statically compiled to a JNI or
Panama-accessible library. We tried this before and hit compatibility
errors, but didn't make any effort to resolve them.

4.

What about CRaC? It's Linux only so isn't interesting to us, given
that most devs are on Windows/macOS. The benefits for Linux servers
are clear though. Obvious question - can you make a snapshot on one
machine/Linux distro, and resume them on a totally different one, or
does it require a homogenous infrastructure?

5.

A big reason AppCDS is nice is we get to keep the open world. This
isn't only about compatibility, open worlds are just better. The most
popular way to get software to desktop machines is Chrome and the web
is totally open world. Apps are downloaded incrementally as the user
navigates around, and companies exploit this fact aggressively. Large
web sites can be far larger than would be considered practical to
distribute to end user machines, and can easily update 50 times a day.
Web developers have to think about latency on specific interactions,
but they don't have to think about the size of the entire app and that
allows them to scale up feature sets as fast as funding allows. In
contrast the closed world mobile versions of their sites are a parade
of horror stories in which firms have to e.g. hotpatch Dalvik to work
around method count limits (Facebook), or in which code size issues
nearly wrecked the entire company (Uber):

https://twitter.com/StanTwinB/status/1336914412708405248

Right now code size isn't a particularly serious problem for us, but
the ease of including open source libraries means footprint grows all
the time. Especially for our shell scripting tool, there are tons of
cool features that could be added but if we did all of them we'd
probably end up with 500mb of bytecode. With an open world features
can be downloaded on the fly as they get used and you can build a
plugin ecosystem.

The new more incremental direction of Leyden is thus welcomed and
appreciated, because it feels like a lot of ground can be covered by
"small" changes like upgrading AppCDS and caching compiled hotspots.
Even if the results aren't as impressive as with native-image, the
benefits of keeping an open world can probably make up for it, at
least for our use cases.

From adinn at redhat.com  Wed Jun  1 13:28:58 2022
From: adinn at redhat.com (Andrew Dinn)
Date: Wed, 1 Jun 2022 14:28:58 +0100
Subject: AppCDS / AOT thoughts based on CLI app experience
In-Reply-To: <CAGv+2bMrYaGwg11moPbLwpSrhTPva-8t3sAF4SNcp_pvJWwN2g@mail.gmail.com>
References: <CAGv+2bMrYaGwg11moPbLwpSrhTPva-8t3sAF4SNcp_pvJWwN2g@mail.gmail.com>
Message-ID: <603fcc35-03fe-54df-a47b-a659eaadf996@redhat.com>

Hi Mike,

Thanks very much for that extremely valuable input, in particular the 
very clear breakdown of the swings and roundabouts you have noted when 
it comes to using CDS/AppCDS or native Java vs the vanilla dynamic JVM. 
It is very important that project Leyden considers the whole development 
and deployment cycle, not just the size and startup time/footprint of 
the delivered static Java executable (indeed, Dan Heidinga and I just 
published an article about this topic on InfoQ that you might find 
relevant).

Your comment about AppCDS being "pay-as-you-go" resonated most strongly. 
I hope that one of the pay-offs of the incremental approach Mark has 
recommended for the project will be the ability to provide 
"pay-as-you-go" improvements in startup time and footprint where a user 
can balance development benefits and costs against those arising at 
deployment time.

regards,


Andrew Dinn
-----------
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill

On 01/06/2022 14:03, Mike Hearn wrote:
> Hi,
> 
> It feels like most of the interest in static Java comes from the
> microservices / functions-as-a-service community. My new company spent
> the last year creating a developer tool that runs on the JVM (which
> will be useful for Java developers actually, but what it does is
> irrelevant here). Internally it's a kind of build system and is thus a
> large(ish) CLI app in which startup time and throughput are what
> matter most. We also have a separate internal tool that uses Kotlin
> scripting to implement a bash-like scripting language, and which is
> sensitive in the same ways.
> 
> Today the JVM is often overlooked for writing CLI apps due to startup
> time, 'lightness' and packaging issues. I figured I'd write down some
> notes based on our experiences. They cover workflow, performance,
> implementation costs and security issues. Hopefully it's helpful.
> 
> 1.
> 
> I really like AppCDS because:
> 
> a. It can't break the app so switching it on/off a no-brainer. Unlike
> native-image/static java, no additional testing overhead is created by
> it.
> 
> b. It's effective even without heap snapshotting. We see a ~40%
> speedup for executing --help
> 
> c. It's pay-as-you-go. We can use a small archive that's fast to
> create to accelerate just the most latency sensitive startup paths, or
> we can use it for the whole app, but ultimately costs are
> controllable.
> 
> d. Archives are deterministic. Modern client-side packaging systems
> support delta updates, and CDS plays nicely with them. GraalVM native
> images are non-deterministic so every update is going to replace the
> entire app, which isn't much fun from an update speed or bandwidth
> consumption perspective.
> 
> Startup time is dominated by PicoCLI which is a common problem for
> Java CLI apps. Supposedly the slowest part is building the model of
> the CLI interface using reflection, so it's a perfect candidate for
> AppCDS heap snapshotting.  I say supposedly, because I haven't seen
> concrete evidence that this is actually where the time goes, but it
> seems like a plausible belief. There's a long standing bug filed to
> replace reflection with code generation but it's a big job and so
> nobody did it.
> 
> Unfortunately the app will ship without using AppCDS. Some workflow
> issues remain. These can be solved in the app itself, but it'd be nice
> if the JVM does it.
> 
> The obvious way to use CDS is to ship an archive with the app. We
> might do this as a first iteration, but longer term don't want to for
> two reasons:
> 
> a. The archive can get huge.
> b. Signature verification penalties on macOS (see below).
> 
> For just making --help and similar short commands faster size isn't so
> bad (~6-10mb for us), but if it's used for a whole execution the
> archive size for a standard run is nearly the same as total bytecode
> size of the app. As more stuff gets cached this will get worse.
> Download size might not matter much for this particular app, but as a
> general principle it does. So a nice improvement would be to generate
> it client side.
> 
> CDS files are caches and different platforms have different
> conventions for where those go. The JVM doesn't know about those
> conventions but our app does, so we'd need our custom native code
> launcher (which exists anyway for other reasons) to set the right
> paths for CDS.
> 
> Then you have to pick the right flags depending on whether the CDS
> file exists or not. I follow CDS related changes and believe this is
> fixed in latest Java versions but maybe (?) not released yet.
> 
> Even once that's fixed it's not quite obvious that we'd use it. The
> JVM runs much slower when dumping a dynamic CDS archive and the first
> run is when first impressions are made. Whilst for cloud stuff this is
> a matter of (artificially?) expensive resources, for CLI apps it's
> about more subjective things like feeling snappy. One idea is to delay
> dumping a CDS archive until after the first run is exiting, so it
> doesn't get in the way. The first run wouldn't benefit from the
> archive which is a pity (except on Linux where the package managers
> make it easy to run code post-install), but it at least wouldn't be
> slowed down by creating it either. The native launcher can schedule
> this. Alternatively there could be a brief pause on first run when the
> user is told explicitly that the app is optimizing itself, but how
> feasible that is depends very much on dump speed. Finally we could
> ship a small archive that only covers startup, and then in parallel
> make a dump of a full run in the background.
> 
> Speaking of which, there's a need for some protocol to drive an app
> through a representative 'trial run'. Whether it's generating the
> class list or the archive itself, it could be as simple as an
> alternative static method that sits next to main. If it were to be
> standardized the rest of the infrastructure becomes more re-usable,
> for instance build systems can take care of generating classlists, or
> the end-user packaging can take care of dynamic dumping.
> 
> CDS has two modes and it's not clear which is better. I'm unusually
> obsessive about this stuff to the extent of reading the CDS source
> code, but despite that I have absolutely no idea if I should be trying
> to use static or dynamic archives. There used to be a performance
> difference between them but maybe it's fixed now? There's a lack of
> end-to-end guidance on how to exploit this feature best.
> 
> The ideal would obviously be losing the dump/exec split and make
> dynamic dumping continuous, incremental and imposing no performance
> penalty. Then we could just supply a path to where the CDS file should
> go and things magically warm up across executions. I have no idea how
> feasible that is.
> 
> Once AppCDS archives are in place and being created at the right
> times, a @Snapshotted annotation for fields (or similar) should be an
> easy win to eliminate the bulk of the rest of the PicoCLI time.
> Dynamically loaded heaps would also be useful to eliminate the
> overhead of loading configs and instantiating the (build) task graph
> without a Gradle-style daemon.
> 
> 2.
> 
> AppCDS archives can open a subtle security issue when distributing
> code to desktop platforms. Because they're full of vtables anyone who
> can write to them can (we assume) take over any JVM that loads the
> archive and gain whatever privileges have been granted to that app.
> The archive file is fully trusted.
> 
> On Windows and Linux this doesn't matter. On Linux sensitive files can
> be packaged or created in postinst scripts. On Windows either an app
> comes with a legacy installer/MSI file and thus doesn't have any
> recognized package identity that can be granted extra permissions, or
> it uses the current gen MSIX system. In the latter case Windows has a
> notion of app identity and so you can request permissions to access
> e.g. keychain entries, the user's calendar etc, but in that case
> Windows also gives you a private directory that's protected from other
> apps where sensitive files can be stashed. AppCDS archives can go
> there and we're done.
> 
> MacOS is a problem child. There are two situations that matter.
> 
> In the first case archives are shipped as data files with the app.
> Security is not an issue here, but there's a subtle performance
> footgun. On most platforms signatures of files shipped with an app are
> checked at install time but on macOS they aren't. Thanks to its NeXT
> roots it doesn't really have an installation concept, and thus the
> kernel checks signatures of files on first use then caches the
> signature check in the kernel vnode. By default the entire file is
> hashed in order to link it back to the root signature, which for large
> files can impose a small but noticeable delay before the app can open
> them. This first run penalty is unfortunate given that AppCDS exists
> partly to improve startup time. You can argue it doesn't matter much
> due to the caching, but it's worth being aware of - very large AppCDS
> archives would get fully paged in and hashed before the app even gets
> to do anything. In turn that means people might enable AppCDS with a
> big classlist expecting it to speed things up, not noticing that for
> Mac users only it slowed things down instead. There are ways to fix
> this using supported Apple APIs. One is to supply a CodeDirectory
> structure stored in extended attributes: you should get incremental
> hashing and normal page fault behaviour (untested!). Another is to
> wrap the data in a Mach-O file.
> 
> In the second case the CDS archive is being generated client side. Mac
> apps don't have anywhere they can create tamperproof data, except for
> very small amounts in the keychain. Thus if a Mac app opens a
> malicious cache file that can take control of it that's a security
> bug, because it'd allow one program to grab any special privileges the
> user granted to another. The fact that the grabbing program has passed
> GateKeeper and notarization doesn't necessarily matter (Apple's
> guidance on this is unclear, but it seems plausible that this is their
> stance). In this case the key chain can be used as a root of trust by
> storing a hash of the CDS archive in it and checking that after
> mmap/before use. Alternatively, again, Apple provides an API that lets
> you associate an on-disk (xattr) CodeDirectory structure with a file
> which will then be checked incrementally at page fault time. Extreme
> care must be taken to avoid race conditions, but in theory, a
> CodeDirectory structure can be computed at dump time, written to disk
> as an xattr, and then stored again in the key chain (e.g. by
> pretending it's a "key" or "password"). After the security API is
> instructed to associate a CD with the file, it can be checked against
> the tamperproofed version stored in the key chain and if they match,
> the archive can then be mmapped and used as normal.
> 
> Native images don't have these issues because the state snapshot is
> stored inside the Mach-O file and thus gets covered by the normal
> mechanisms. However once it adds support for persisted heaps, the same
> issue may arise.
> 
> Whether it's worth doing the extra work to solve this is unclear. Macs
> are guaranteed to come with very fast NVMe disks and CPUs. Still, it's
> worth being aware of the issue.
> 
> 3.
> 
> Why not just use a native image then? Maybe we'll do that because the
> performance wins are really compelling, but again, v1 will ship
> without this for the following reasons:
> 
> a. Static minification can break things. Our integration tests
> currently invoke the entry point of the app directly, but that could
> be fixed to run the tool in an external process. For unit tests the
> situation is far murkier. It's a bit unclear how to run JUnit tests
> against the statically compiled version and it may not even make sense
> (because the tests would pin a bunch of code that might get stripped
> in the real app so what are you really testing?).
> 
> b. It'd break delta updates. Not the end of the world, but a factor.
> 
> c. I have no idea if we're using any libraries that spin bytecode
> dynamically. Even if we're not today, what if tomorrow we want to use
> such a library? Do we have to avoid using it and increase the cost of
> feature development, or roll back the native image and give our users
> a nasty performance downgrade? Neither option is attractive. Ideally
> SubstrateVM would contain a bytecode interpreter and use it when
> necessary. Lots of issues there but e.g. it'd probably be OK if it's
> not a general classloader and the code dependencies have to be known
> AOT.
> 
> d. Similar to (c), fully AOT compilation can explode code and thus
> download size even though many codepaths are cold and only execute
> once. It'd be nice if a native image could include a mix of bytecode
> and AOT compiled hotspots.
> 
> e. Once you're past the initial interactive stage the program is
> throughput sensitive. How much of a perf downgrade over HotSpot would
> we get, if any? With GraalVM EE we could use PGO and not lose any, but
> the ISV pricing is opaque. At any rate to answer this we have to fix
> the compatibility issues first. The prospect of improving startup time
> and then discovering we slowed down the actual builds isn't really
> appealing (though I suspect in our case AOT wouldn't really hurt
> much).
> 
> f. What if we want to support in-process plugins? Maybe we can use
> Espresso, but this is a road less travelled (lack of tutorials, well
> documented examples etc).
> 
> An interesting possibility is using a mix of approaches. For the bash
> competitor I mentioned earlier dynamic code loading is needed because
> the script bytecode is loaded into the host JVM, but the Kotlin
> compiler itself could theoretically be statically compiled to a JNI or
> Panama-accessible library. We tried this before and hit compatibility
> errors, but didn't make any effort to resolve them.
> 
> 4.
> 
> What about CRaC? It's Linux only so isn't interesting to us, given
> that most devs are on Windows/macOS. The benefits for Linux servers
> are clear though. Obvious question - can you make a snapshot on one
> machine/Linux distro, and resume them on a totally different one, or
> does it require a homogenous infrastructure?
> 
> 5.
> 
> A big reason AppCDS is nice is we get to keep the open world. This
> isn't only about compatibility, open worlds are just better. The most
> popular way to get software to desktop machines is Chrome and the web
> is totally open world. Apps are downloaded incrementally as the user
> navigates around, and companies exploit this fact aggressively. Large
> web sites can be far larger than would be considered practical to
> distribute to end user machines, and can easily update 50 times a day.
> Web developers have to think about latency on specific interactions,
> but they don't have to think about the size of the entire app and that
> allows them to scale up feature sets as fast as funding allows. In
> contrast the closed world mobile versions of their sites are a parade
> of horror stories in which firms have to e.g. hotpatch Dalvik to work
> around method count limits (Facebook), or in which code size issues
> nearly wrecked the entire company (Uber):
> 
> https://twitter.com/StanTwinB/status/1336914412708405248
> 
> Right now code size isn't a particularly serious problem for us, but
> the ease of including open source libraries means footprint grows all
> the time. Especially for our shell scripting tool, there are tons of
> cool features that could be added but if we did all of them we'd
> probably end up with 500mb of bytecode. With an open world features
> can be downloaded on the fly as they get used and you can build a
> plugin ecosystem.
> 
> The new more incremental direction of Leyden is thus welcomed and
> appreciated, because it feels like a lot of ground can be covered by
> "small" changes like upgrading AppCDS and caching compiled hotspots.
> Even if the results aren't as impressive as with native-image, the
> benefits of keeping an open world can probably make up for it, at
> least for our use cases.
> 


From mike at hydraulic.software  Wed Jun  1 13:41:49 2022
From: mike at hydraulic.software (Mike Hearn)
Date: Wed, 1 Jun 2022 15:41:49 +0200
Subject: AppCDS / AOT thoughts based on CLI app experience
In-Reply-To: <603fcc35-03fe-54df-a47b-a659eaadf996@redhat.com>
References: <CAGv+2bMrYaGwg11moPbLwpSrhTPva-8t3sAF4SNcp_pvJWwN2g@mail.gmail.com>
 <603fcc35-03fe-54df-a47b-a659eaadf996@redhat.com>
Message-ID: <CAGv+2bOAQxrRFnnv5vJ99mQxmZmXf+eRzBnMtrA1enR7P9NA8A@mail.gmail.com>

Thanks Andrew. Yes, I saw the InfoQ article, it's excellent. Actually
it was reading that which prompted me to sign up and write out these
notes.

From ioi.lam at oracle.com  Wed Jun  1 16:23:36 2022
From: ioi.lam at oracle.com (Ioi Lam)
Date: Wed, 1 Jun 2022 09:23:36 -0700
Subject: Can Ahead of Time code benefit regular Java applications too?
In-Reply-To: <0bd32c26-661e-3730-d93d-35e79d4823a5@redhat.com>
References: <CAP2b4GPZxnMFaFEEZX8utmCM1pWqWfs=VRd9+0KDA73SPxg9iQ@mail.gmail.com>
 <9f70a2d5-5cb1-e615-b76b-957f95ac9928@oracle.com>
 <CAP2b4GPfpGCEC2bb2F8w61-ivyLUthqy2Jf=y9WpD_L9jV56cw@mail.gmail.com>
 <0bd32c26-661e-3730-d93d-35e79d4823a5@redhat.com>
Message-ID: <b23f9920-ccee-c4a0-d361-8355239d0e1a@oracle.com>


On 6/1/2022 2:32 AM, Andrew Haley wrote:
> On 6/1/22 06:07, Julian Waters wrote:
>> Likewise, perhaps working in a similar fashion to
>> intrinsics, you could have certain sections of regularly compiled 
>> Java code
>> within jars replaced by native code compiled by C1 (or C2?) if the 
>> JVM it
>> was compiled by and the target OS/CPU match the current running JVM and
>> OS/CPU
>
> The problem there would be that of jaotc: it worked, but because the pre-
> compiled code was not patchable, it had to use indirection for all 
> accesses.
> So, every field offset, method reference, etc. went through a writable
> section. All of these had to be fixed up, and of course it bulked out
> the runtime. The whole process, in the end, wasn't much quicker than
> C1 compilation.
>
I think part of this can be fixed with my "prelinking" proposal - if the 
app cannot alter the classes that are in the AOT code, then many of the 
redirections can be eliminated.

Also, some of the indirection in the original jaot had to deal with 
object references (e.g., String constants), because it didn't have the 
notion of a cached heap. Hopefully Leyden can have better integration 
between the AOT code and cached heap to make this problem go away.

Thanks
- Ioi

From akozlov at azul.com  Wed Jun  1 16:47:10 2022
From: akozlov at azul.com (Anton Kozlov)
Date: Wed, 1 Jun 2022 19:47:10 +0300
Subject: Project Leyden: Beginnings
In-Reply-To: <d38b9b35-7f34-2563-0591-e8c1c4f15f10@redhat.com>
References: <7c59af5c-9ede-19fb-7865-7bb854e93ca7@azul.com>
 <d38b9b35-7f34-2563-0591-e8c1c4f15f10@redhat.com>
Message-ID: <7c8c36d5-a8e4-8b1c-08fd-77f30eaefea4@azul.com>


On 5/31/22 12:32, Andrew Dinn wrote:
> One has to bear in mind that a closed world as defined by full program analysis (possibly supplemented with user directives to embrace things like reflective targets) can exclude everything that is not marked as reachable during the analysis from its generated image, maybe whole classes in some cases, or maybe just static/instance fields and methods of some classes.

I didn't use this exact definition but meant closed world image as the result
of a whole program analysis under a set of assumptions that are more strict
that the Java language.

For example, user directives is a meta-language describing white areas of the
program that cannot be analyzed by the compiler during the build.  A
meta-language in theory may be able to express a rich set of assumptions about
unknown areas.  E.g. Class.forName, a meta-language probably may express not
exact possible target classes, but assumed properties of those classes, like
the class does not use reflection itself, so cannot access private fields, and
the subsequent checkcast should succeed, so the unknown class won't be able to
access protected fields beyond own class hierarchy.

So a point of the program that is impossible/hard to reason about (e.g. the
reflection) may specify no assumptions except it has minimal interference with
the part that can be analyzed.  E.g. a servlet should not access the internal
details of the servlet container.  Then analyzeable part may probably be
optimized almost as efficiently as a completely analyzeable program.  Java
modules may indeed be useful to separate parts of the program to make the
analysis easier.

Thanks,
Anton

From akozlov at azul.com  Wed Jun  1 18:31:33 2022
From: akozlov at azul.com (Anton Kozlov)
Date: Wed, 1 Jun 2022 21:31:33 +0300
Subject: AppCDS / AOT thoughts based on CLI app experience
In-Reply-To: <CAGv+2bMrYaGwg11moPbLwpSrhTPva-8t3sAF4SNcp_pvJWwN2g@mail.gmail.com>
References: <CAGv+2bMrYaGwg11moPbLwpSrhTPva-8t3sAF4SNcp_pvJWwN2g@mail.gmail.com>
Message-ID: <159565a0-5c7d-84b1-a2cd-e30e0b509faa@azul.com>

Thank you for the excellent write-up!

Although many problems you've mentioned are not solved (and sometimes
are made worse) by CRaC, I can't resist mentioning a CRaC change for CLI
apps [1]. But this is offtopic, so BCCing leyden-dev and CC crac-dev.

On 6/1/22 16:03, Mike Hearn wrote:
> What about CRaC? It's Linux only so isn't interesting to us, given
> that most devs are on Windows/macOS. The benefits for Linux servers
> are clear though. Obvious question - can you make a snapshot on one
> machine/Linux distro, and resume them on a totally different one, or
> does it require a homogenous infrastructure?

In the current implementation, we've not started working on this.  By
the model, CRaC prevents file dependencies at the checkpoint and allows
VM to coordinate restore.  So eventually we should deliver images that
do not depend on the particular CPU and distribution.

The feasibility of the full implementation for Mac and Windows OS is
unclear.  But I think a reasonable effort will be required to provide an
implementation for testing and developing programs on those OSes, which
will match the behavior of Linux CRaC implementation.

Thanks,
Anton


From ioi.lam at oracle.com  Fri Jun  3 00:15:36 2022
From: ioi.lam at oracle.com (Ioi Lam)
Date: Thu, 2 Jun 2022 17:15:36 -0700
Subject: AppCDS / AOT thoughts based on CLI app experience
In-Reply-To: <CAGv+2bMrYaGwg11moPbLwpSrhTPva-8t3sAF4SNcp_pvJWwN2g@mail.gmail.com>
References: <CAGv+2bMrYaGwg11moPbLwpSrhTPva-8t3sAF4SNcp_pvJWwN2g@mail.gmail.com>
Message-ID: <12fc5517-78d7-dfc0-f9c1-cbcdba5b7ccd@oracle.com>

Hi Mike,

I am thrilled to hear that you're happy with CDS. Please see my 
responses below.

If you have other questions or requests for CDS, please let me know :-)

On 6/1/2022 6:03 AM, Mike Hearn wrote:
> Hi,
>
> It feels like most of the interest in static Java comes from the
> microservices / functions-as-a-service community. My new company spent
> the last year creating a developer tool that runs on the JVM (which
> will be useful for Java developers actually, but what it does is
> irrelevant here). Internally it's a kind of build system and is thus a
> large(ish) CLI app in which startup time and throughput are what
> matter most. We also have a separate internal tool that uses Kotlin
> scripting to implement a bash-like scripting language, and which is
> sensitive in the same ways.
>
> Today the JVM is often overlooked for writing CLI apps due to startup
> time, 'lightness' and packaging issues. I figured I'd write down some
> notes based on our experiences. They cover workflow, performance,
> implementation costs and security issues. Hopefully it's helpful.
>
> 1.
>
> I really like AppCDS because:
>
> a. It can't break the app so switching it on/off a no-brainer. Unlike
> native-image/static java, no additional testing overhead is created by
> it.
>
> b. It's effective even without heap snapshotting. We see a ~40%
> speedup for executing --help
>
> c. It's pay-as-you-go. We can use a small archive that's fast to
> create to accelerate just the most latency sensitive startup paths, or
> we can use it for the whole app, but ultimately costs are
> controllable.
>
> d. Archives are deterministic. Modern client-side packaging systems
> support delta updates, and CDS plays nicely with them. GraalVM native
> images are non-deterministic so every update is going to replace the
> entire app, which isn't much fun from an update speed or bandwidth
> consumption perspective.
>
> Startup time is dominated by PicoCLI which is a common problem for
> Java CLI apps. Supposedly the slowest part is building the model of
> the CLI interface using reflection, so it's a perfect candidate for
> AppCDS heap snapshotting.  I say supposedly, because I haven't seen
> concrete evidence that this is actually where the time goes, but it
> seems like a plausible belief. There's a long standing bug filed to
> replace reflection with code generation but it's a big job and so
> nobody did it.
>
> Unfortunately the app will ship without using AppCDS. Some workflow
> issues remain. These can be solved in the app itself, but it'd be nice
> if the JVM does it.
>
> The obvious way to use CDS is to ship an archive with the app. We
> might do this as a first iteration, but longer term don't want to for
> two reasons:
>
> a. The archive can get huge.
> b. Signature verification penalties on macOS (see below).
>
> For just making --help and similar short commands faster size isn't so
> bad (~6-10mb for us), but if it's used for a whole execution the
> archive size for a standard run is nearly the same as total bytecode
> size of the app. As more stuff gets cached this will get worse.
> Download size might not matter much for this particular app, but as a
> general principle it does. So a nice improvement would be to generate
> it client side.
>
> CDS files are caches and different platforms have different
> conventions for where those go. The JVM doesn't know about those
> conventions but our app does, so we'd need our custom native code
> launcher (which exists anyway for other reasons) to set the right
> paths for CDS.
>
> Then you have to pick the right flags depending on whether the CDS
> file exists or not. I follow CDS related changes and believe this is
> fixed in latest Java versions but maybe (?) not released yet.

Which version of Java are you using?

Since JDK 11, the default value of -Xshare is set to -Xshare:auto, so 
you can always do this:

$ java -XX:SharedArchiveFile=nosuch.jsa -version
java version "11" 2018-09-25
Java(TM) SE Runtime Environment 18.9 (build 11+28)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11+28, mixed mode)

If the file exists, it will be used automatically. Otherwise the VM will 
silently ignore the archive.

Since JDK 17, a default CDS archive is shipped with the JDK. So you will 
at least get some performance benefits of CDS for the built-in classes.

With the upcoming JDK 19, we have implemented a new feature (See 
JDK-8261455) to automatically create the CDS archive. Here's an example 
(I am using Javac because it's convenient, but you need to quote the JVM 
parameters with -J):

$ javac -J-XX:+AutoCreateSharedArchive -J-XX:SharedArchiveFile=javac.jsa 
HelloWorld.java

javac.jsa will be automatically created if it doesn't exist, or if it's 
not compatible with the JVM (e.g., if you have upgraded to a newer JDK).

In this case, the total elapsed time is improved from about 522ms (with 
default CDS archive) to 330ms (auto-generated archive).


> Even once that's fixed it's not quite obvious that we'd use it. The
> JVM runs much slower when dumping a dynamic CDS archive and the first
> run is when first impressions are made. Whilst for cloud stuff this is
> a matter of (artificially?) expensive resources, for CLI apps it's
> about more subjective things like feeling snappy. One idea is to delay
> dumping a CDS archive until after the first run is exiting, so it
> doesn't get in the way. The first run wouldn't benefit from the
> archive which is a pity (except on Linux where the package managers
> make it easy to run code post-install), but it at least wouldn't be
> slowed down by creating it either. The native launcher can schedule
> this. Alternatively there could be a brief pause on first run when the
> user is told explicitly that the app is optimizing itself, but how
> feasible that is depends very much on dump speed. Finally we could
> ship a small archive that only covers startup, and then in parallel
> make a dump of a full run in the background.

The dynamic CDS dumping happens when the JVM exits. We could ... (just 
throwing half-baked ideas) spawn a new daemon subprocess to do the 
dumping, while the main JVM process exits. So to the user there's no 
penalty.


> Speaking of which, there's a need for some protocol to drive an app
> through a representative 'trial run'. Whether it's generating the
> class list or the archive itself, it could be as simple as an
> alternative static method that sits next to main. If it were to be
> standardized the rest of the infrastructure becomes more re-usable,
> for instance build systems can take care of generating classlists, or
> the end-user packaging can take care of dynamic dumping.

Maybe we could have some sort of daemon that collects profiling data in 
the background, and update the archives when the application behavior is 
more understood.

> CDS has two modes and it's not clear which is better. I'm unusually
> obsessive about this stuff to the extent of reading the CDS source
> code, but despite that I have absolutely no idea if I should be trying
> to use static or dynamic archives. There used to be a performance
> difference between them but maybe it's fixed now? There's a lack of
> end-to-end guidance on how to exploit this feature best.

I agree our documentation is kind of lacking. We'll try to improve it.

Static and dynamic archives will be roughly the same speed (~10 ms 
faster with static dump for the javac example above).

The dynamic archive will be smaller, because it doesn't need to 
duplicate the built-in classes that are already in the static archive. 
Here's a size comparison for javac.jsa

static:? 20,217,856 bytes
dynamic: 10,153,984 bytes


> The ideal would obviously be losing the dump/exec split and make
> dynamic dumping continuous, incremental and imposing no performance
> penalty. Then we could just supply a path to where the CDS file should
> go and things magically warm up across executions. I have no idea how
> feasible that is.
>
> Once AppCDS archives are in place and being created at the right
> times, a @Snapshotted annotation for fields (or similar) should be an
> easy win to eliminate the bulk of the rest of the PicoCLI time.
> Dynamically loaded heaps would also be useful to eliminate the
> overhead of loading configs and instantiating the (build) task graph
> without a Gradle-style daemon.
>
> 2.
>
> AppCDS archives can open a subtle security issue when distributing
> code to desktop platforms. Because they're full of vtables anyone who
> can write to them can (we assume) take over any JVM that loads the
> archive and gain whatever privileges have been granted to that app.
> The archive file is fully trusted.

Will you have a similar problem if the JAR file of the application is 
maliciously modified?

Actually the vtables inside the CDS archive file contain all zeros, and 
are filled in by the VM after the archive is mapped.

What could be modified is the vtptr of archived MetaData objects. They 
usually point to somewhere near 0x800000000 (where the vtables are) but 
the attacker could modify them to point to arbitrary locations. I am not 
sure if this type of attack is easier than modifying the JAR files, or not.

Thanks
- Ioi


>
> On Windows and Linux this doesn't matter. On Linux sensitive files can
> be packaged or created in postinst scripts. On Windows either an app
> comes with a legacy installer/MSI file and thus doesn't have any
> recognized package identity that can be granted extra permissions, or
> it uses the current gen MSIX system. In the latter case Windows has a
> notion of app identity and so you can request permissions to access
> e.g. keychain entries, the user's calendar etc, but in that case
> Windows also gives you a private directory that's protected from other
> apps where sensitive files can be stashed. AppCDS archives can go
> there and we're done.
>
> MacOS is a problem child. There are two situations that matter.
>
> In the first case archives are shipped as data files with the app.
> Security is not an issue here, but there's a subtle performance
> footgun. On most platforms signatures of files shipped with an app are
> checked at install time but on macOS they aren't. Thanks to its NeXT
> roots it doesn't really have an installation concept, and thus the
> kernel checks signatures of files on first use then caches the
> signature check in the kernel vnode. By default the entire file is
> hashed in order to link it back to the root signature, which for large
> files can impose a small but noticeable delay before the app can open
> them. This first run penalty is unfortunate given that AppCDS exists
> partly to improve startup time. You can argue it doesn't matter much
> due to the caching, but it's worth being aware of - very large AppCDS
> archives would get fully paged in and hashed before the app even gets
> to do anything. In turn that means people might enable AppCDS with a
> big classlist expecting it to speed things up, not noticing that for
> Mac users only it slowed things down instead. There are ways to fix
> this using supported Apple APIs. One is to supply a CodeDirectory
> structure stored in extended attributes: you should get incremental
> hashing and normal page fault behaviour (untested!). Another is to
> wrap the data in a Mach-O file.
>
> In the second case the CDS archive is being generated client side. Mac
> apps don't have anywhere they can create tamperproof data, except for
> very small amounts in the keychain. Thus if a Mac app opens a
> malicious cache file that can take control of it that's a security
> bug, because it'd allow one program to grab any special privileges the
> user granted to another. The fact that the grabbing program has passed
> GateKeeper and notarization doesn't necessarily matter (Apple's
> guidance on this is unclear, but it seems plausible that this is their
> stance). In this case the key chain can be used as a root of trust by
> storing a hash of the CDS archive in it and checking that after
> mmap/before use. Alternatively, again, Apple provides an API that lets
> you associate an on-disk (xattr) CodeDirectory structure with a file
> which will then be checked incrementally at page fault time. Extreme
> care must be taken to avoid race conditions, but in theory, a
> CodeDirectory structure can be computed at dump time, written to disk
> as an xattr, and then stored again in the key chain (e.g. by
> pretending it's a "key" or "password"). After the security API is
> instructed to associate a CD with the file, it can be checked against
> the tamperproofed version stored in the key chain and if they match,
> the archive can then be mmapped and used as normal.
>
> Native images don't have these issues because the state snapshot is
> stored inside the Mach-O file and thus gets covered by the normal
> mechanisms. However once it adds support for persisted heaps, the same
> issue may arise.
>
> Whether it's worth doing the extra work to solve this is unclear. Macs
> are guaranteed to come with very fast NVMe disks and CPUs. Still, it's
> worth being aware of the issue.
>
> 3.
>
> Why not just use a native image then? Maybe we'll do that because the
> performance wins are really compelling, but again, v1 will ship
> without this for the following reasons:
>
> a. Static minification can break things. Our integration tests
> currently invoke the entry point of the app directly, but that could
> be fixed to run the tool in an external process. For unit tests the
> situation is far murkier. It's a bit unclear how to run JUnit tests
> against the statically compiled version and it may not even make sense
> (because the tests would pin a bunch of code that might get stripped
> in the real app so what are you really testing?).
>
> b. It'd break delta updates. Not the end of the world, but a factor.
>
> c. I have no idea if we're using any libraries that spin bytecode
> dynamically. Even if we're not today, what if tomorrow we want to use
> such a library? Do we have to avoid using it and increase the cost of
> feature development, or roll back the native image and give our users
> a nasty performance downgrade? Neither option is attractive. Ideally
> SubstrateVM would contain a bytecode interpreter and use it when
> necessary. Lots of issues there but e.g. it'd probably be OK if it's
> not a general classloader and the code dependencies have to be known
> AOT.
>
> d. Similar to (c), fully AOT compilation can explode code and thus
> download size even though many codepaths are cold and only execute
> once. It'd be nice if a native image could include a mix of bytecode
> and AOT compiled hotspots.
>
> e. Once you're past the initial interactive stage the program is
> throughput sensitive. How much of a perf downgrade over HotSpot would
> we get, if any? With GraalVM EE we could use PGO and not lose any, but
> the ISV pricing is opaque. At any rate to answer this we have to fix
> the compatibility issues first. The prospect of improving startup time
> and then discovering we slowed down the actual builds isn't really
> appealing (though I suspect in our case AOT wouldn't really hurt
> much).
>
> f. What if we want to support in-process plugins? Maybe we can use
> Espresso, but this is a road less travelled (lack of tutorials, well
> documented examples etc).
>
> An interesting possibility is using a mix of approaches. For the bash
> competitor I mentioned earlier dynamic code loading is needed because
> the script bytecode is loaded into the host JVM, but the Kotlin
> compiler itself could theoretically be statically compiled to a JNI or
> Panama-accessible library. We tried this before and hit compatibility
> errors, but didn't make any effort to resolve them.
>
> 4.
>
> What about CRaC? It's Linux only so isn't interesting to us, given
> that most devs are on Windows/macOS. The benefits for Linux servers
> are clear though. Obvious question - can you make a snapshot on one
> machine/Linux distro, and resume them on a totally different one, or
> does it require a homogenous infrastructure?
>
> 5.
>
> A big reason AppCDS is nice is we get to keep the open world. This
> isn't only about compatibility, open worlds are just better. The most
> popular way to get software to desktop machines is Chrome and the web
> is totally open world. Apps are downloaded incrementally as the user
> navigates around, and companies exploit this fact aggressively. Large
> web sites can be far larger than would be considered practical to
> distribute to end user machines, and can easily update 50 times a day.
> Web developers have to think about latency on specific interactions,
> but they don't have to think about the size of the entire app and that
> allows them to scale up feature sets as fast as funding allows. In
> contrast the closed world mobile versions of their sites are a parade
> of horror stories in which firms have to e.g. hotpatch Dalvik to work
> around method count limits (Facebook), or in which code size issues
> nearly wrecked the entire company (Uber):
>
> https://twitter.com/StanTwinB/status/1336914412708405248
>
> Right now code size isn't a particularly serious problem for us, but
> the ease of including open source libraries means footprint grows all
> the time. Especially for our shell scripting tool, there are tons of
> cool features that could be added but if we did all of them we'd
> probably end up with 500mb of bytecode. With an open world features
> can be downloaded on the fly as they get used and you can build a
> plugin ecosystem.
>
> The new more incremental direction of Leyden is thus welcomed and
> appreciated, because it feels like a lot of ground can be covered by
> "small" changes like upgrading AppCDS and caching compiled hotspots.
> Even if the results aren't as impressive as with native-image, the
> benefits of keeping an open world can probably make up for it, at
> least for our use cases.


From ioi.lam at oracle.com  Fri Jun  3 00:30:39 2022
From: ioi.lam at oracle.com (Ioi Lam)
Date: Thu, 2 Jun 2022 17:30:39 -0700
Subject: AppCDS / AOT thoughts based on CLI app experience
In-Reply-To: <12fc5517-78d7-dfc0-f9c1-cbcdba5b7ccd@oracle.com>
References: <CAGv+2bMrYaGwg11moPbLwpSrhTPva-8t3sAF4SNcp_pvJWwN2g@mail.gmail.com>
 <12fc5517-78d7-dfc0-f9c1-cbcdba5b7ccd@oracle.com>
Message-ID: <21aaa7c0-5e85-923e-cbab-6b7af68ae913@oracle.com>


On 6/2/2022 5:15 PM, Ioi Lam wrote:
> Hi Mike,
>
> I am thrilled to hear that you're happy with CDS. Please see my 
> responses below.
>
> If you have other questions or requests for CDS, please let me know :-)
>
> On 6/1/2022 6:03 AM, Mike Hearn wrote:
>> Hi,
>>
>> It feels like most of the interest in static Java comes from the
>> microservices / functions-as-a-service community. My new company spent
>> the last year creating a developer tool that runs on the JVM (which
>> will be useful for Java developers actually, but what it does is
>> irrelevant here). Internally it's a kind of build system and is thus a
>> large(ish) CLI app in which startup time and throughput are what
>> matter most. We also have a separate internal tool that uses Kotlin
>> scripting to implement a bash-like scripting language, and which is
>> sensitive in the same ways.
>>
>> Today the JVM is often overlooked for writing CLI apps due to startup
>> time, 'lightness' and packaging issues. I figured I'd write down some
>> notes based on our experiences. They cover workflow, performance,
>> implementation costs and security issues. Hopefully it's helpful.
>>
>> 1.
>>
>> I really like AppCDS because:
>>
>> a. It can't break the app so switching it on/off a no-brainer. Unlike
>> native-image/static java, no additional testing overhead is created by
>> it.
>>
>> b. It's effective even without heap snapshotting. We see a ~40%
>> speedup for executing --help
>>
>> c. It's pay-as-you-go. We can use a small archive that's fast to
>> create to accelerate just the most latency sensitive startup paths, or
>> we can use it for the whole app, but ultimately costs are
>> controllable.
>>
>> d. Archives are deterministic. Modern client-side packaging systems
>> support delta updates, and CDS plays nicely with them. GraalVM native
>> images are non-deterministic so every update is going to replace the
>> entire app, which isn't much fun from an update speed or bandwidth
>> consumption perspective.
>>
>> Startup time is dominated by PicoCLI which is a common problem for
>> Java CLI apps. Supposedly the slowest part is building the model of
>> the CLI interface using reflection, so it's a perfect candidate for
>> AppCDS heap snapshotting.? I say supposedly, because I haven't seen
>> concrete evidence that this is actually where the time goes, but it
>> seems like a plausible belief. There's a long standing bug filed to
>> replace reflection with code generation but it's a big job and so
>> nobody did it.
>>
>> Unfortunately the app will ship without using AppCDS. Some workflow
>> issues remain. These can be solved in the app itself, but it'd be nice
>> if the JVM does it.
>>
>> The obvious way to use CDS is to ship an archive with the app. We
>> might do this as a first iteration, but longer term don't want to for
>> two reasons:
>>
>> a. The archive can get huge.
>> b. Signature verification penalties on macOS (see below).
>>
>> For just making --help and similar short commands faster size isn't so
>> bad (~6-10mb for us), but if it's used for a whole execution the
>> archive size for a standard run is nearly the same as total bytecode
>> size of the app. As more stuff gets cached this will get worse.
>> Download size might not matter much for this particular app, but as a
>> general principle it does. So a nice improvement would be to generate
>> it client side.
>>
>> CDS files are caches and different platforms have different
>> conventions for where those go. The JVM doesn't know about those
>> conventions but our app does, so we'd need our custom native code
>> launcher (which exists anyway for other reasons) to set the right
>> paths for CDS.
>>
>> Then you have to pick the right flags depending on whether the CDS
>> file exists or not. I follow CDS related changes and believe this is
>> fixed in latest Java versions but maybe (?) not released yet.
>
> Which version of Java are you using?
>
> Since JDK 11, the default value of -Xshare is set to -Xshare:auto, so 
> you can always do this:
>
> $ java -XX:SharedArchiveFile=nosuch.jsa -version
> java version "11" 2018-09-25
> Java(TM) SE Runtime Environment 18.9 (build 11+28)
> Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11+28, mixed mode)
>
> If the file exists, it will be used automatically. Otherwise the VM 
> will silently ignore the archive.
>
> Since JDK 17, a default CDS archive is shipped with the JDK. So you 
> will at least get some performance benefits of CDS for the built-in 
> classes.
>
> With the upcoming JDK 19, we have implemented a new feature (See 
> JDK-8261455) to automatically create the CDS archive. Here's an 
> example (I am using Javac because it's convenient, but you need to 
> quote the JVM parameters with -J):
>
> $ javac -J-XX:+AutoCreateSharedArchive 
> -J-XX:SharedArchiveFile=javac.jsa HelloWorld.java
>
> javac.jsa will be automatically created if it doesn't exist, or if 
> it's not compatible with the JVM (e.g., if you have upgraded to a 
> newer JDK).
>
> In this case, the total elapsed time is improved from about 522ms 
> (with default CDS archive) to 330ms (auto-generated archive).
>
>
>> Even once that's fixed it's not quite obvious that we'd use it. The
>> JVM runs much slower when dumping a dynamic CDS archive and the first
>> run is when first impressions are made. Whilst for cloud stuff this is
>> a matter of (artificially?) expensive resources, for CLI apps it's
>> about more subjective things like feeling snappy. One idea is to delay
>> dumping a CDS archive until after the first run is exiting, so it
>> doesn't get in the way. The first run wouldn't benefit from the
>> archive which is a pity (except on Linux where the package managers
>> make it easy to run code post-install), but it at least wouldn't be
>> slowed down by creating it either. The native launcher can schedule
>> this. Alternatively there could be a brief pause on first run when the
>> user is told explicitly that the app is optimizing itself, but how
>> feasible that is depends very much on dump speed. Finally we could
>> ship a small archive that only covers startup, and then in parallel
>> make a dump of a full run in the background.
>
> The dynamic CDS dumping happens when the JVM exits. We could ... (just 
> throwing half-baked ideas) spawn a new daemon subprocess to do the 
> dumping, while the main JVM process exits. So to the user there's no 
> penalty.
>
>
>> Speaking of which, there's a need for some protocol to drive an app
>> through a representative 'trial run'. Whether it's generating the
>> class list or the archive itself, it could be as simple as an
>> alternative static method that sits next to main.

One thing you *could* do with JDK 19 on Linux is:

java -XX:+AutoCreateSharedArchive -XX:SharedArchiveFile=app.jsa -jar MyApp

In you main method, check the /proc/self/maps file to see if app.jsa is 
mapped. If not, the VM is dumping the dynamic CDS archive. In this case, 
your app can run in a special "trial run" mode that exercises different 
functionalities.

To make this easier to use, we could add a special system property, 
something like "jdk.cds.is.dumping", that can be queried by the application.

Thanks
- Ioi


>> If it were to be
>> standardized the rest of the infrastructure becomes more re-usable,
>> for instance build systems can take care of generating classlists, or
>> the end-user packaging can take care of dynamic dumping.
>
> Maybe we could have some sort of daemon that collects profiling data 
> in the background, and update the archives when the application 
> behavior is more understood.
>
>> CDS has two modes and it's not clear which is better. I'm unusually
>> obsessive about this stuff to the extent of reading the CDS source
>> code, but despite that I have absolutely no idea if I should be trying
>> to use static or dynamic archives. There used to be a performance
>> difference between them but maybe it's fixed now? There's a lack of
>> end-to-end guidance on how to exploit this feature best.
>
> I agree our documentation is kind of lacking. We'll try to improve it.
>
> Static and dynamic archives will be roughly the same speed (~10 ms 
> faster with static dump for the javac example above).
>
> The dynamic archive will be smaller, because it doesn't need to 
> duplicate the built-in classes that are already in the static archive. 
> Here's a size comparison for javac.jsa
>
> static:? 20,217,856 bytes
> dynamic: 10,153,984 bytes
>
>
>> The ideal would obviously be losing the dump/exec split and make
>> dynamic dumping continuous, incremental and imposing no performance
>> penalty. Then we could just supply a path to where the CDS file should
>> go and things magically warm up across executions. I have no idea how
>> feasible that is.
>>
>> Once AppCDS archives are in place and being created at the right
>> times, a @Snapshotted annotation for fields (or similar) should be an
>> easy win to eliminate the bulk of the rest of the PicoCLI time.
>> Dynamically loaded heaps would also be useful to eliminate the
>> overhead of loading configs and instantiating the (build) task graph
>> without a Gradle-style daemon.
>>
>> 2.
>>
>> AppCDS archives can open a subtle security issue when distributing
>> code to desktop platforms. Because they're full of vtables anyone who
>> can write to them can (we assume) take over any JVM that loads the
>> archive and gain whatever privileges have been granted to that app.
>> The archive file is fully trusted.
>
> Will you have a similar problem if the JAR file of the application is 
> maliciously modified?
>
> Actually the vtables inside the CDS archive file contain all zeros, 
> and are filled in by the VM after the archive is mapped.
>
> What could be modified is the vtptr of archived MetaData objects. They 
> usually point to somewhere near 0x800000000 (where the vtables are) 
> but the attacker could modify them to point to arbitrary locations. I 
> am not sure if this type of attack is easier than modifying the JAR 
> files, or not.
>
> Thanks
> - Ioi
>
>
>>
>> On Windows and Linux this doesn't matter. On Linux sensitive files can
>> be packaged or created in postinst scripts. On Windows either an app
>> comes with a legacy installer/MSI file and thus doesn't have any
>> recognized package identity that can be granted extra permissions, or
>> it uses the current gen MSIX system. In the latter case Windows has a
>> notion of app identity and so you can request permissions to access
>> e.g. keychain entries, the user's calendar etc, but in that case
>> Windows also gives you a private directory that's protected from other
>> apps where sensitive files can be stashed. AppCDS archives can go
>> there and we're done.
>>
>> MacOS is a problem child. There are two situations that matter.
>>
>> In the first case archives are shipped as data files with the app.
>> Security is not an issue here, but there's a subtle performance
>> footgun. On most platforms signatures of files shipped with an app are
>> checked at install time but on macOS they aren't. Thanks to its NeXT
>> roots it doesn't really have an installation concept, and thus the
>> kernel checks signatures of files on first use then caches the
>> signature check in the kernel vnode. By default the entire file is
>> hashed in order to link it back to the root signature, which for large
>> files can impose a small but noticeable delay before the app can open
>> them. This first run penalty is unfortunate given that AppCDS exists
>> partly to improve startup time. You can argue it doesn't matter much
>> due to the caching, but it's worth being aware of - very large AppCDS
>> archives would get fully paged in and hashed before the app even gets
>> to do anything. In turn that means people might enable AppCDS with a
>> big classlist expecting it to speed things up, not noticing that for
>> Mac users only it slowed things down instead. There are ways to fix
>> this using supported Apple APIs. One is to supply a CodeDirectory
>> structure stored in extended attributes: you should get incremental
>> hashing and normal page fault behaviour (untested!). Another is to
>> wrap the data in a Mach-O file.
>>
>> In the second case the CDS archive is being generated client side. Mac
>> apps don't have anywhere they can create tamperproof data, except for
>> very small amounts in the keychain. Thus if a Mac app opens a
>> malicious cache file that can take control of it that's a security
>> bug, because it'd allow one program to grab any special privileges the
>> user granted to another. The fact that the grabbing program has passed
>> GateKeeper and notarization doesn't necessarily matter (Apple's
>> guidance on this is unclear, but it seems plausible that this is their
>> stance). In this case the key chain can be used as a root of trust by
>> storing a hash of the CDS archive in it and checking that after
>> mmap/before use. Alternatively, again, Apple provides an API that lets
>> you associate an on-disk (xattr) CodeDirectory structure with a file
>> which will then be checked incrementally at page fault time. Extreme
>> care must be taken to avoid race conditions, but in theory, a
>> CodeDirectory structure can be computed at dump time, written to disk
>> as an xattr, and then stored again in the key chain (e.g. by
>> pretending it's a "key" or "password"). After the security API is
>> instructed to associate a CD with the file, it can be checked against
>> the tamperproofed version stored in the key chain and if they match,
>> the archive can then be mmapped and used as normal.
>>
>> Native images don't have these issues because the state snapshot is
>> stored inside the Mach-O file and thus gets covered by the normal
>> mechanisms. However once it adds support for persisted heaps, the same
>> issue may arise.
>>
>> Whether it's worth doing the extra work to solve this is unclear. Macs
>> are guaranteed to come with very fast NVMe disks and CPUs. Still, it's
>> worth being aware of the issue.
>>
>> 3.
>>
>> Why not just use a native image then? Maybe we'll do that because the
>> performance wins are really compelling, but again, v1 will ship
>> without this for the following reasons:
>>
>> a. Static minification can break things. Our integration tests
>> currently invoke the entry point of the app directly, but that could
>> be fixed to run the tool in an external process. For unit tests the
>> situation is far murkier. It's a bit unclear how to run JUnit tests
>> against the statically compiled version and it may not even make sense
>> (because the tests would pin a bunch of code that might get stripped
>> in the real app so what are you really testing?).
>>
>> b. It'd break delta updates. Not the end of the world, but a factor.
>>
>> c. I have no idea if we're using any libraries that spin bytecode
>> dynamically. Even if we're not today, what if tomorrow we want to use
>> such a library? Do we have to avoid using it and increase the cost of
>> feature development, or roll back the native image and give our users
>> a nasty performance downgrade? Neither option is attractive. Ideally
>> SubstrateVM would contain a bytecode interpreter and use it when
>> necessary. Lots of issues there but e.g. it'd probably be OK if it's
>> not a general classloader and the code dependencies have to be known
>> AOT.
>>
>> d. Similar to (c), fully AOT compilation can explode code and thus
>> download size even though many codepaths are cold and only execute
>> once. It'd be nice if a native image could include a mix of bytecode
>> and AOT compiled hotspots.
>>
>> e. Once you're past the initial interactive stage the program is
>> throughput sensitive. How much of a perf downgrade over HotSpot would
>> we get, if any? With GraalVM EE we could use PGO and not lose any, but
>> the ISV pricing is opaque. At any rate to answer this we have to fix
>> the compatibility issues first. The prospect of improving startup time
>> and then discovering we slowed down the actual builds isn't really
>> appealing (though I suspect in our case AOT wouldn't really hurt
>> much).
>>
>> f. What if we want to support in-process plugins? Maybe we can use
>> Espresso, but this is a road less travelled (lack of tutorials, well
>> documented examples etc).
>>
>> An interesting possibility is using a mix of approaches. For the bash
>> competitor I mentioned earlier dynamic code loading is needed because
>> the script bytecode is loaded into the host JVM, but the Kotlin
>> compiler itself could theoretically be statically compiled to a JNI or
>> Panama-accessible library. We tried this before and hit compatibility
>> errors, but didn't make any effort to resolve them.
>>
>> 4.
>>
>> What about CRaC? It's Linux only so isn't interesting to us, given
>> that most devs are on Windows/macOS. The benefits for Linux servers
>> are clear though. Obvious question - can you make a snapshot on one
>> machine/Linux distro, and resume them on a totally different one, or
>> does it require a homogenous infrastructure?
>>
>> 5.
>>
>> A big reason AppCDS is nice is we get to keep the open world. This
>> isn't only about compatibility, open worlds are just better. The most
>> popular way to get software to desktop machines is Chrome and the web
>> is totally open world. Apps are downloaded incrementally as the user
>> navigates around, and companies exploit this fact aggressively. Large
>> web sites can be far larger than would be considered practical to
>> distribute to end user machines, and can easily update 50 times a day.
>> Web developers have to think about latency on specific interactions,
>> but they don't have to think about the size of the entire app and that
>> allows them to scale up feature sets as fast as funding allows. In
>> contrast the closed world mobile versions of their sites are a parade
>> of horror stories in which firms have to e.g. hotpatch Dalvik to work
>> around method count limits (Facebook), or in which code size issues
>> nearly wrecked the entire company (Uber):
>>
>> https://twitter.com/StanTwinB/status/1336914412708405248
>>
>> Right now code size isn't a particularly serious problem for us, but
>> the ease of including open source libraries means footprint grows all
>> the time. Especially for our shell scripting tool, there are tons of
>> cool features that could be added but if we did all of them we'd
>> probably end up with 500mb of bytecode. With an open world features
>> can be downloaded on the fly as they get used and you can build a
>> plugin ecosystem.
>>
>> The new more incremental direction of Leyden is thus welcomed and
>> appreciated, because it feels like a lot of ground can be covered by
>> "small" changes like upgrading AppCDS and caching compiled hotspots.
>> Even if the results aren't as impressive as with native-image, the
>> benefits of keeping an open world can probably make up for it, at
>> least for our use cases.
>


From kasperni at gmail.com  Fri Jun  3 08:45:21 2022
From: kasperni at gmail.com (Kasper Nielsen)
Date: Fri, 3 Jun 2022 09:45:21 +0100
Subject: Experimentation with build time and runtime class initialization
 in qbicc
In-Reply-To: <CAJq4Gi4KS0_N81yDefWPYbKALDLLHU7kR6cEf10KUMZwQa6ABg@mail.gmail.com>
References: <0EE27016-2D6A-46A8-825A-1AFF788A5C67@us.ibm.com>
 <CAPs6152=+jjmhZ+jAdiNa7LXHUwyh-EPXYjrSj_s4A8yQAt3mg@mail.gmail.com>
 <CAJq4Gi4KS0_N81yDefWPYbKALDLLHU7kR6cEf10KUMZwQa6ABg@mail.gmail.com>
Message-ID: <CAPs61509=cU0OaraqYbuziLnRuwrcq-+gwG=JdOwuPacgf4A8Q@mail.gmail.com>

On Tue, 31 May 2022 at 16:50, Dan Heidinga <heidinga at redhat.com> wrote:

> On Fri, May 27, 2022 at 7:53 AM Kasper Nielsen <kasperni at gmail.com> wrote:
> >
> > Hi David,
> >
> > Thanks for the write-up.
> >
> > One thing that isn't completely clear to me after reading this is why
> > language
> > changes (<rtinit>) are needed?
>
> The <rtinit> model was a convenient way for us to explore a model that
> put all class initialization at build time, while allowing a small set
> of fields to be reinitialized at runtime.  It also minimized the
> changes we had to make to the core JDK classes which makes maintaining
> the changes much easier given the rate of JDK updates.  SubstrateVM
> uses a similar approach with their Substitutions for what I assume are
> similar reasons.
>
> Leyden will be able to update the JDK core classes directly and can
> take a more direct approach to indicating in which phase a static
> field should be initialized.
>
> >  It seems to me this could be entirely
> > implemented via a standard API. Using ClassValue as the main inspiration
> you
> > could have something like:
> >
> > abstract class RuntimeLocal<T> {
> >     protected RuntimeLocal() {
> >        checkBuildTime();
> >        VM.registerForRuntimeInitialization(this);
> >     }
> >     protected abstract T computeValue();
> >     public final T get(); // Calls to get are optimized by the vm
> > }
> >
> >
> > Usage would be something similar to:
> >
> > class Usage {
> >
> >  static final LocalDateTime BUILD_TIME = LocalDateTime.now();
> >
> >   static final RuntimeLocal<LocalDateTime> RUNTIME_TIME = new
> > RuntimeLocal<>() {
> >     protected LocalDateTime computeValue() {
> >       return LocalDateTime.now();
> >     }
> >   };
> > }
> >
> > I might be missing some details, but it seems to me that this approach
> would
> > be strongly favorable to needing to change the language as well as adding
> > new bytecodes.
>
> This is a good starting point.  I went a fair ways looking at how to
> group static fields into different classes to decouple their lifetimes
> and found that I couldn't cleanly split them into two groups.


I think there is an important distinction to make here between "phased class
initialization" and "phased field initialization".

Having used GraalVM's native image for some time. My experience is that is
very hard to reason about phased class initialization.

A saner model, I would argue, would be one where all classes are
initialized at
image build-time and never reinitialized. If a class needs laziness or
reinitialization this must be done explicitly using <rtinit>/RuntimeLocal.
If
you have groups of fields that need to be initialized together this can be
done by storing them in a record which can then be stored in a
reinit field. In this model, you would still need to think about the usage
of
reinit fields. But you would never need to spend cycles on figuring out what
phase a class was initialized in.

But this is all something that can be discussed further down the line.


> The problem is that while it's clear that some fields can be
> initialized early (build time) and others must be initialized late
> (runtime), there is a third group that needs to be reinitialized.  I
> list 3 buckets: early, late, and reinit, but that's a minimum number.
> There may be more than 3.  And due to the "soupy" nature of <clinit>,
> it's not always easy to avoid depending on a field that's in a
> different bucket.  And values in that 3rd bucket - the fields that
> need to be reinitialized - don't have a clear meaning when their value
> propagates around the program.  Does it need to be cleared everywhere
> and force reinit of all consumers? Lots to figure out here.
>
> We need a better model - whether that's library features or new
> language features - that makes it easier to express when (which phase)
> an operation should occur and some way to talk about the dependency
> chain of that value (all the classes that have to be initialized,
> values calculated, etc).
>

I must admit I'm a bit skeptical about something like dependency tracking.
Take something like System.lineSeparator() and a platform-independent image.
Is it really realistic that we track all strings that are created using this
method doing build-time? But, as you said lots to figure out:)

From mike at hydraulic.software  Fri Jun  3 08:57:45 2022
From: mike at hydraulic.software (Mike Hearn)
Date: Fri, 3 Jun 2022 10:57:45 +0200
Subject: AppCDS / AOT thoughts based on CLI app experience
In-Reply-To: <12fc5517-78d7-dfc0-f9c1-cbcdba5b7ccd@oracle.com>
References: <CAGv+2bMrYaGwg11moPbLwpSrhTPva-8t3sAF4SNcp_pvJWwN2g@mail.gmail.com>
 <12fc5517-78d7-dfc0-f9c1-cbcdba5b7ccd@oracle.com>
Message-ID: <CAGv+2bMZ_H5OPJ2EPqjGDKRyPbvtUAbB-K8v39Pc-UwgbVFetw@mail.gmail.com>

Hi Ioi,

We're using a JDK 17 with a few backports.

Unfortunately the default CDS archives goes missing during jlinking.
It's an easy fix. Actually, the product in question is a packaging
tool, it's not only for the JVM but it supports JVM apps quite well,
and  re-creating the CDS archive post-jlink is on the list of features
to add. It's packaged with itself so that'll fix it for our apps too.

> $ javac -J-XX:+AutoCreateSharedArchive -J-XX:SharedArchiveFile=javac.jsa
> HelloWorld.java
>
> javac.jsa will be automatically created if it doesn't exist, or if it's
> not compatible with the JVM (e.g., if you have upgraded to a newer JDK).

Yes, that's a nice improvement in usability. By the way, don't forget
-Xlog:cds=off because otherwise CDS likes to write lots of warnings to
the terminal (not a great look for a CLI app).

> The dynamic CDS dumping happens when the JVM exits.

Yes but it seems to slow down execution before that time as well.

Here's some timings for our app to parse CLI options, read the build
config, compute the task graph, print the available tasks, and reach
the end of main():

- With CDS off: ~0.8 seconds
- With CDS dumping active: ~1.25 seconds
- With CDS active: ~0.6 seconds

so the app appears to run ~50% slower when dynamic dumping is active
and that's not including the dump time itself. That's why I'm
suggesting doing it in the background as a totally separate
post-install step (with background forking required for platforms that
don't support or strongly discourage install scripts).

I get the impression this may not be expected? Is the JVM genuinely
doing extra work at runtime when dynamic dumping is active?

> Maybe we could have some sort of daemon that collects profiling data in
> the background, and update the archives when the application behavior is
> more understood.

Sure, the ideal would be something like "always dumping" mode in which
there's no slowdown. So you just give the JVM a directory (or >1
directory) and it caches internal structures, JITd code and persistent
heap snapshots there. Fire and forget. Then if you want to trade off
bandwidth vs first run time you can pre-populate the first directory
in the list with the results of a short run, like just getting to
first pixels for a desktop app or flag handling for a CLI app, and any
additional data generated goes into the second directory.

Bonus points if you find a way to share those directories over an NFS
mount - then you have a JIT server 'for free' in cloud deployments.

> The dynamic archive will be smaller, because it doesn't need to
> duplicate the built-in classes that are already in the static archive.

Right. That's true. I'd forgotten that you can combine them like that.
So we could ship a small static archive in the download that just
accelerates time-to-first-interaction, and generate a larger dump
client-side in the background that covers the whole execution.

> Will you have a similar problem if the JAR file of the application is
> maliciously modified?

If they're downloaded and stored in the home directory, yes, but, JARs
support code signing with per-file hashing so there's a way to fix
that built in to the platform. If they're just shipped as data files
in the app then it doesn't matter because they're signed and
tamperproofed using OS specific mechanisms.

All this is a bit theoretical. IntelliJ downloads unsigned JARs as
plugins and nobody seems to care. It's possible that's because it
doesn't request any special privileges so there's nothing to attack,
but in macOS things as basic as access to ~/Downloads is a permission
these days. Also JetBrains are moving to code signing their JARs
anyway. So ... yeah. Like I said. Hard to know how much to really care
about this. It might be one of those things that doesn't matter until
the day it does.

> What could be modified is the vtptr of archived MetaData objects. They
> usually point to somewhere near 0x800000000 (where the vtables are) but
> the attacker could modify them to point to arbitrary locations. I am not
> sure if this type of attack is easier than modifying the JAR files, or not.

Well, the issue here is a combination of where the files are generated
and performance. Again it's all a bit theoretical because the
performance discussion is rooted in the "disk access is slow" world
which isn't really true anymore. I've done some casual tests on my
laptop and did appear to see a real slowdown from this "hash whole
file on open" effect, but, it was a while ago and it wasn't rigorous
at all. It's also a total PITA to reproduce because there's no
explicit way to flush the cache, so you have to constantly re-copy
signed binaries over and over to force kernel cache misses. If I
explained how I measured this, Alexey Shipilev would yell at me :) so
I'll just leave it here as food for thought instead. And yeah it's
also not clear how much the uncached times matter these days. Years
ago it mattered a lot because people rebooted their machines often,
but Macs hibernate all the time and reboot only rarely so the caches
will remain warm.

I don't think treating AppCDS archives as hostile in the JVM itself
would be worth it. This is a Mac specific issue and that would be a
major constraint e.g. it'd mean you can't cache JITd native code in
the archives. Doesn't make sense. Better to tamperproof unbundled
archives in other ways, like computing ad-hoc signatures and stashing
the CodeDirectory in an xattr (if it ever matters).

From heidinga at redhat.com  Mon Jun  6 14:36:18 2022
From: heidinga at redhat.com (Dan Heidinga)
Date: Mon, 6 Jun 2022 10:36:18 -0400
Subject: Experimentation with build time and runtime class initialization
 in qbicc
In-Reply-To: <b9d90157-7b6d-5cb6-ca76-4660d6a4b23f@oracle.com>
References: <0EE27016-2D6A-46A8-825A-1AFF788A5C67@us.ibm.com>
 <CAPs6152=+jjmhZ+jAdiNa7LXHUwyh-EPXYjrSj_s4A8yQAt3mg@mail.gmail.com>
 <CAJq4Gi4KS0_N81yDefWPYbKALDLLHU7kR6cEf10KUMZwQa6ABg@mail.gmail.com>
 <b9d90157-7b6d-5cb6-ca76-4660d6a4b23f@oracle.com>
Message-ID: <CAJq4Gi6=qrkrpwEe9eCLk3gDdz92kcC+Pq3QBu85ugtqa0QbmQ@mail.gmail.com>

On Tue, May 31, 2022 at 12:17 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>
> I think Dan is homing in on one of the key questions, which is the nature of the third bucket (static finals that require reinitialization.)  It would be useful for everyone following the discussion if we had a more complete list of situations you've encountered where this seems essential, and their notable aspects.

In qbicc, the places we've had to reinitialize static fields are
captured in the qbicc/qbicc-class-library repo [0] using "$_runtime"
source files [1].  Many of the cases have to do with capturing the
build time vs the runtime environment.

The number of available CPUs is captured in several places:
* j.l.Runtime :
https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/Runtime%24_runtime.java
* j.u.c.Exchanger:
https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/util/concurrent/Exchanger%24_runtime.java
* j.u.c.Phaser :
https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/util/concurrent/Exchanger%24_runtime.java
* j.u.c.a.Striped64 :
https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/util/concurrent/atomic/Striped64%24_runtime.java

The environment variables are captured:
* j.l.ProcessEnvironment :
https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/ProcessEnvironment%24_runtime.java

The in / out / err file descriptors need to be reinitialized:
* j.io.FileDescriptor :
https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/io/FileDescriptor%24_runtime.java

Prevent threads from being created in a static initializer:
* j.l.ref.Reference :
https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/ref/Reference%24_patch.java
* Likely more cases for this we just haven't hit yet

Unsafe pageSize needs to be configured at runtime.  As do
UnsafeConstants like ADDRESS_SIZE0:
* j.i.m.Unsafe :
https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/jdk/internal/misc/Unsafe%24_patch.java
* j.i.m.UnsafeConstants:
https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/jdk/internal/misc/UnsafeConstants%24_patch.java
& https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/jdk/internal/misc/UnsafeConstants%24_runtime.java

Capturing the default directory:
* sun.nio.fs.UnixFileSystem :
https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/sun/nio/fs/UnixFileSystem%24_runtime.java

We're still working through detangling the "initPhase" process in
j.l.System into a build time and runtime ("rtInitPhase") version:
https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/System%24_patch.java

We also did some investigation of how feasible it would be to rewrite
SubstrateVM's Substitutions to use the IODH pattern and I can share
that info as well but it'll take a bit for me to write it up in a
clear state.

--Dan

[0] https://github.com/qbicc/qbicc-class-library
[1] https://github.com/qbicc/qbicc-class-library/search?q=%24_runtime

>
> As you point out, there are a host of potential "solutions"; while it is surely premature to try to propose a solution, it is never too early to come to a better understanding of the problem.
>
>
>
> On 5/31/2022 11:50 AM, Dan Heidinga wrote:
>
> On Fri, May 27, 2022 at 7:53 AM Kasper Nielsen <kasperni at gmail.com> wrote:
>
> Hi David,
>
> Thanks for the write-up.
>
> One thing that isn't completely clear to me after reading this is why
> language
> changes (<rtinit>) are needed?
>
> The <rtinit> model was a convenient way for us to explore a model that
> put all class initialization at build time, while allowing a small set
> of fields to be reinitialized at runtime.  It also minimized the
> changes we had to make to the core JDK classes which makes maintaining
> the changes much easier given the rate of JDK updates.  SubstrateVM
> uses a similar approach with their Substitutions for what I assume are
> similar reasons.
>
> Leyden will be able to update the JDK core classes directly and can
> take a more direct approach to indicating in which phase a static
> field should be initialized.
>
>  It seems to me this could be entirely
> implemented via a standard API. Using ClassValue as the main inspiration you
> could have something like:
>
> abstract class RuntimeLocal<T> {
>     protected RuntimeLocal() {
>        checkBuildTime();
>        VM.registerForRuntimeInitialization(this);
>     }
>     protected abstract T computeValue();
>     public final T get(); // Calls to get are optimized by the vm
> }
>
>
> Usage would be something similar to:
>
> class Usage {
>
>  static final LocalDateTime BUILD_TIME = LocalDateTime.now();
>
>   static final RuntimeLocal<LocalDateTime> RUNTIME_TIME = new
> RuntimeLocal<>() {
>     protected LocalDateTime computeValue() {
>       return LocalDateTime.now();
>     }
>   };
> }
>
> I might be missing some details, but it seems to me that this approach would
> be strongly favorable to needing to change the language as well as adding
> new bytecodes.
>
> This is a good starting point.  I went a fair ways looking at how to
> group static fields into different classes to decouple their lifetimes
> and found that I couldn't cleanly split them into two groups.  I used
> the Initialization on demand holder pattern (IODH) rather than your
> RuntimeLocal but the idea is very similar.
>
> The problem is that while it's clear that some fields can be
> initialized early (build time) and others must be initialized late
> (runtime), there is a third group that needs to be reinitialized.  I
> list 3 buckets: early, late, and reinit, but that's a minimum number.
> There may be more than 3.  And due to the "soupy" nature of <clinit>,
> it's not always easy to avoid depending on a field that's in a
> different bucket.  And values in that 3rd bucket - the fields that
> need to be reinitialized - don't have a clear meaning when their value
> propagates around the program.  Does it need to be cleared everywhere
> and force reinit of all consumers? Lots to figure out here.
>
> We need a better model - whether that's library features or new
> language features - that makes it easier to express when (which phase)
> an operation should occur and some way to talk about the dependency
> chain of that value (all the classes that have to be initialized,
> values calculated, etc).
>
> --Dan
>
> /Kasper
>
> On Thu, 26 May 2022 at 21:22, David P Grove <groved at us.ibm.com> wrote:
>
> Hi,
>   I?ve appended the contents of the referenced wiki page in this email.
> Apologies in advance if the formatting doesn?t come through as intended.
>
>                 There is a full implementation of this (GPLv2 + Classpath
> exception) as part of the qbicc project on GitHub.  There is also a GitHub
> discussion in the qbicc project that links to various GitHub issues that
> capture the history that led to the current design.  I will not hyperlink
> to those here so that if people have any IP concerns, they can avoid seeing
> them.  They are easily findable.
>
> Regards,
>
> --dave
>
>
>


From brian.goetz at oracle.com  Mon Jun  6 17:45:10 2022
From: brian.goetz at oracle.com (Brian Goetz)
Date: Mon, 6 Jun 2022 17:45:10 +0000
Subject: Experimentation with build time and runtime class initialization
 in qbicc
In-Reply-To: <CAJq4Gi6=qrkrpwEe9eCLk3gDdz92kcC+Pq3QBu85ugtqa0QbmQ@mail.gmail.com>
References: <0EE27016-2D6A-46A8-825A-1AFF788A5C67@us.ibm.com>
 <CAPs6152=+jjmhZ+jAdiNa7LXHUwyh-EPXYjrSj_s4A8yQAt3mg@mail.gmail.com>
 <CAJq4Gi4KS0_N81yDefWPYbKALDLLHU7kR6cEf10KUMZwQa6ABg@mail.gmail.com>
 <b9d90157-7b6d-5cb6-ca76-4660d6a4b23f@oracle.com>
 <CAJq4Gi6=qrkrpwEe9eCLk3gDdz92kcC+Pq3QBu85ugtqa0QbmQ@mail.gmail.com>
Message-ID: <0387C49D-8761-464D-A494-88529EFF9433@oracle.com>

Thanks, Dan, for the detailed information.  The other investigation also seems interesting, so I hope some day you?ll find the time to write it up.  

There?s lots to unpack here, but I want to focus on a specific aspect, related to the issue of ?stale? or ?aliased? compile-time values that I raised in my earlier mail.  Taking the specific example of caching Runtime.availableProcessors(), let?s ask: WHY are these classes caching R.aP() in a static?  There are two possible cases:

 - Pure caching.  Here, the author has made a choice (right or wrong) that calling R.aP() repeatedly will be too expensive, and so caches the value in a static for later use for, say, allocating arena arrays in the constructor of Striped64 or Exchanger ? but the instances created in the early phase are still valid in the later phase, and compatible with instances created in the later phase.  
 - Enforcement of invariant.  Here, the author has captured the fact that they require the value to be stable, because (say) they?re going to create multiple arrays and expect them all to be of the same length.  Here, early-phase and later-phase instances could not compatibly coexist.  

In the first case, reinitializing the cached field at phase change points may be harmless; it?s essentially equivalent to replacing reads of fields with repeated evaluation of the initializer (assuming the initialization is pure); in the second, the runtime has broken an invariant the author had reason to believe is valid.  

Without diving into solutions at this point, we can?t escape the following observations:

 - This is what happens when you try to reinterpret old code with new semantics; code that had every reason to work properly when it was written, becomes retroactively broken when the runtime reinterprets old cold in a new way.  New semantics require permission from the user.  
 - If there are N separate desirable (but incompatible) outcomes, such as the two cases cited above, their code has to be different from each other.  Right now, we can?t tell the difference between these cases.  

If, as in the ?its an invariant? case, it would be unacceptable for the value to change (i.e., when the user said ?static final?, they were serious), the one of the following has to happen:

 - We must be prepared to keep the earlier-phase result in later phases, even if the underlying quantity has changed;
 - We must defer evaluation until the later phase (potentially deferring all dependent early evaluations);
 - We fail at early-eval time if someone attempts to evaluate the must-be-stable quantity in the early phase, and let the programmer sort it out.

In fact, to the extent we want early evaluation, I suspect that we may want to be able to express *all three* of these in the programming model.  


> On Jun 6, 2022, at 10:36 AM, Dan Heidinga <heidinga at redhat.com> wrote:
> 
> On Tue, May 31, 2022 at 12:17 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>> 
>> I think Dan is homing in on one of the key questions, which is the nature of the third bucket (static finals that require reinitialization.)  It would be useful for everyone following the discussion if we had a more complete list of situations you've encountered where this seems essential, and their notable aspects.
> 
> In qbicc, the places we've had to reinitialize static fields are
> captured in the qbicc/qbicc-class-library repo [0] using "$_runtime"
> source files [1].  Many of the cases have to do with capturing the
> build time vs the runtime environment.
> 
> The number of available CPUs is captured in several places:
> * j.l.Runtime :
> https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/Runtime*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWZcddsUo$ 
> * j.u.c.Exchanger:
> https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/util/concurrent/Exchanger*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWgRvBKQc$ 
> * j.u.c.Phaser :
> https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/util/concurrent/Exchanger*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWgRvBKQc$ 
> * j.u.c.a.Striped64 :
> https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/util/concurrent/atomic/Striped64*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWLhwhUqM$ 
> 
> The environment variables are captured:
> * j.l.ProcessEnvironment :
> https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/ProcessEnvironment*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWD8e1uHk$ 
> 
> The in / out / err file descriptors need to be reinitialized:
> * j.io.FileDescriptor :
> https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/io/FileDescriptor*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWoWz91ck$ 
> 
> Prevent threads from being created in a static initializer:
> * j.l.ref.Reference :
> https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/ref/Reference*24_patch.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWDR1ZEl4$ 
> * Likely more cases for this we just haven't hit yet
> 
> Unsafe pageSize needs to be configured at runtime.  As do
> UnsafeConstants like ADDRESS_SIZE0:
> * j.i.m.Unsafe :
> https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/jdk/internal/misc/Unsafe*24_patch.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskW0SpuyLU$ 
> * j.i.m.UnsafeConstants:
> https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/jdk/internal/misc/UnsafeConstants*24_patch.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskW37nD06M$ 
> & https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/jdk/internal/misc/UnsafeConstants*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWUc6nR8s$ 
> 
> Capturing the default directory:
> * sun.nio.fs.UnixFileSystem :
> https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/sun/nio/fs/UnixFileSystem*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWplHr18o$ 
> 
> We're still working through detangling the "initPhase" process in
> j.l.System into a build time and runtime ("rtInitPhase") version:
> https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/System*24_patch.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWmsN5BXk$ 
> 
> We also did some investigation of how feasible it would be to rewrite
> SubstrateVM's Substitutions to use the IODH pattern and I can share
> that info as well but it'll take a bit for me to write it up in a
> clear state.
> 
> --Dan
> 
> [0] https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library__;!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWo-EkTjg$ 
> [1] https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/search?q=*24_runtime__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWJRfnDJs$ 
> 
>> 
>> As you point out, there are a host of potential "solutions"; while it is surely premature to try to propose a solution, it is never too early to come to a better understanding of the problem.
>> 
>> 
>> 
>> On 5/31/2022 11:50 AM, Dan Heidinga wrote:
>> 
>> On Fri, May 27, 2022 at 7:53 AM Kasper Nielsen <kasperni at gmail.com> wrote:
>> 
>> Hi David,
>> 
>> Thanks for the write-up.
>> 
>> One thing that isn't completely clear to me after reading this is why
>> language
>> changes (<rtinit>) are needed?
>> 
>> The <rtinit> model was a convenient way for us to explore a model that
>> put all class initialization at build time, while allowing a small set
>> of fields to be reinitialized at runtime.  It also minimized the
>> changes we had to make to the core JDK classes which makes maintaining
>> the changes much easier given the rate of JDK updates.  SubstrateVM
>> uses a similar approach with their Substitutions for what I assume are
>> similar reasons.
>> 
>> Leyden will be able to update the JDK core classes directly and can
>> take a more direct approach to indicating in which phase a static
>> field should be initialized.
>> 
>> It seems to me this could be entirely
>> implemented via a standard API. Using ClassValue as the main inspiration you
>> could have something like:
>> 
>> abstract class RuntimeLocal<T> {
>>    protected RuntimeLocal() {
>>       checkBuildTime();
>>       VM.registerForRuntimeInitialization(this);
>>    }
>>    protected abstract T computeValue();
>>    public final T get(); // Calls to get are optimized by the vm
>> }
>> 
>> 
>> Usage would be something similar to:
>> 
>> class Usage {
>> 
>> static final LocalDateTime BUILD_TIME = LocalDateTime.now();
>> 
>>  static final RuntimeLocal<LocalDateTime> RUNTIME_TIME = new
>> RuntimeLocal<>() {
>>    protected LocalDateTime computeValue() {
>>      return LocalDateTime.now();
>>    }
>>  };
>> }
>> 
>> I might be missing some details, but it seems to me that this approach would
>> be strongly favorable to needing to change the language as well as adding
>> new bytecodes.
>> 
>> This is a good starting point.  I went a fair ways looking at how to
>> group static fields into different classes to decouple their lifetimes
>> and found that I couldn't cleanly split them into two groups.  I used
>> the Initialization on demand holder pattern (IODH) rather than your
>> RuntimeLocal but the idea is very similar.
>> 
>> The problem is that while it's clear that some fields can be
>> initialized early (build time) and others must be initialized late
>> (runtime), there is a third group that needs to be reinitialized.  I
>> list 3 buckets: early, late, and reinit, but that's a minimum number.
>> There may be more than 3.  And due to the "soupy" nature of <clinit>,
>> it's not always easy to avoid depending on a field that's in a
>> different bucket.  And values in that 3rd bucket - the fields that
>> need to be reinitialized - don't have a clear meaning when their value
>> propagates around the program.  Does it need to be cleared everywhere
>> and force reinit of all consumers? Lots to figure out here.
>> 
>> We need a better model - whether that's library features or new
>> language features - that makes it easier to express when (which phase)
>> an operation should occur and some way to talk about the dependency
>> chain of that value (all the classes that have to be initialized,
>> values calculated, etc).
>> 
>> --Dan
>> 
>> /Kasper
>> 
>> On Thu, 26 May 2022 at 21:22, David P Grove <groved at us.ibm.com> wrote:
>> 
>> Hi,
>>  I?ve appended the contents of the referenced wiki page in this email.
>> Apologies in advance if the formatting doesn?t come through as intended.
>> 
>>                There is a full implementation of this (GPLv2 + Classpath
>> exception) as part of the qbicc project on GitHub.  There is also a GitHub
>> discussion in the qbicc project that links to various GitHub issues that
>> capture the history that led to the current design.  I will not hyperlink
>> to those here so that if people have any IP concerns, they can avoid seeing
>> them.  They are easily findable.
>> 
>> Regards,
>> 
>> --dave
>> 
>> 
>> 
>