From tanksherman27 at gmail.com Wed Jun 1 05:07:50 2022 From: tanksherman27 at gmail.com (Julian Waters) Date: Wed, 1 Jun 2022 13:07:50 +0800 Subject: Can Ahead of Time code benefit regular Java applications too? In-Reply-To: <9f70a2d5-5cb1-e615-b76b-957f95ac9928@oracle.com> References: <9f70a2d5-5cb1-e615-b76b-957f95ac9928@oracle.com> Message-ID: The prospect of version agnostic jars which any JVM version can use certainly sounds lucrative, but I don't think it's a must-have, especially if the issues supporting such a feature makes pursuing the idea not worth it. To my knowledge, when it was still being actively developed, jaotc shared libraries were also specific to the OS and JVM they were compiled for. Likewise, perhaps working in a similar fashion to intrinsics, you could have certain sections of regularly compiled Java code within jars replaced by native code compiled by C1 (or C2?) if the JVM it was compiled by and the target OS/CPU match the current running JVM and OS/CPU (Indeed, this is how the Velocity project checks if it should load its own shared libraries or fall back to a Java implementation if it detects that the current platform is suitable for native code acceleration - https://github.com/PaperMC/Velocity/tree/dev/3.0.0/native). In a way, this might be similar to Anton's suggestion of "A closed world start image that is restored into an open world Java application" on another thread within this mailing list, and the involvement of CraC within Leyden. best regards, Julian On Wed, Jun 1, 2022 at 5:42 AM Ioi Lam wrote: > > > On 5/30/2022 6:07 AM, Julian Waters wrote: > > Hi all, > > > > Since Leyden's goal has shifted from originally exploring only binaries > > compiled directly to native code, to "address the long-term pain points > of > > Java?s slow startup time, slow time to peak performance, and large > > footprint", would there be any merit in looking at allowing native code > to > > be embedded within jars to bypass the Interpreter at runtime? (Maybe have > > Ahead of Time code that replaces the Interpreter be compiled by C1, and > > treat it as part of the C1 pipeline so it can be profiled while being > run) > > Ideally it'd be similar to the now defunct jaotc, but more compact > (within > > the jar itself or perhaps the classfiles somehow) instead of compiling > the > > Ahead of Time code into an entirely separate file which then needs to be > > explicitly passed to the JVM at runtime. This may or may not be a good > > starting point before advancing to entirely standalone Java binaries, > but I > > digress. Perhaps the experience of the CraC team would be of some help in > > this area? > > > > best regards, > > Julian > > What kind of interface and dependency between the JVM and the native > code would be needed to support this? > > As far as I can tell, the Leyden discussions have been about producing > artifacts (native code or heap dumps) that are tightly bound to a > specific build of the JDK. If you want a (version agnostic) JAR file to > contain native code that can be used by arbitrary JDKs, that would raise > the complexity quite significantly. > > Thanks > - Ioi > > From tanksherman27 at gmail.com Wed Jun 1 05:25:12 2022 From: tanksherman27 at gmail.com (Julian Waters) Date: Wed, 1 Jun 2022 13:25:12 +0800 Subject: Improve determinism in the Java language In-Reply-To: References: Message-ID: I'm leaning towards making certain parts of Java stricter if it's being compiled Ahead of Time, such as the compile time linking you mention, much like what languages such as C and C++ require you to do when generating binaries (Using the rough analogy of object files as compared to classfiles). Many of the dynamic features in the language typically only make sense if being run with a JVM anyway, such as using reflection to modify access to fields and methods, something which is significantly harder to do in a standalone executable. Not being able to optimize code based on a certain condition seems like a bit of a waste to me. best regards, Julian On Wed, Jun 1, 2022 at 5:21 AM Ioi Lam wrote: > A lot of the recent Leyden discussion has been around "what > optimizations can be done ahead of time" (e.g., static field > initialization). However, I think we also need to look at a > lower level. > > One reason that Java has been difficult to optimize ahead-of-time > is the tremendous dynamism in the language. > > Here are a few things that I think we can do to make Java programs > more deterministic so that ahead of time optimizations can > be applied: > > 1 Deterministic Program Code > > A Java program can essentially rewrite itself and even > the libraries it uses. Here's an example: > > class App { > static { > if (...) { > MethodHandles.lookup() > .defineClass(.. hacked App$Bar ...); > } > } > static final Bar bar = new Bar(); > static class Bar { > .... > } > } > > > - We can't effectively AOT-compile the program code because > the native code may not match the runtime generated > bytecodes. > > - We can't pre-initialize the App.bar field because its shape > may be different. > > One idea is to disallow such code patching when Leyden is enabled. > For example, we can require that to use Leyden, an application > must be "prelinked", which means that as soon as the application > is loaded, the classes App and App$Bar are already loaded. The > defineClass() call will fail with a LinkageError (duplicated class > definition). > > > 2 Decouple class namespaces from dynamic bytecode generation > > This is a corollary of the above item. Java uses > ClassLoader.defineClass() for BOTH namespace and dynamic > bytecode generation. I would stipulate that most users > of Leyden want to do the former and not the latter. > > We should have a new API to load a fixed set of classes > into a namespace. > > > 3 order > > Java allows s that recursively depend on each other. The > result depends on the reference order of these classes. > > class A { static final int a = B.b++; } > class B { static final int b = A.a++; } > > We could have a problem if the application assumes that A is > always initialized before B, but the Leyden optimizer > initializes them in the opposite order. > > We could: > > - Refuse to optimize classes that have mutually recursive > , or > - Change the language spec to give the JVM more freedom to > decide the initialization order. > > > From aph at redhat.com Wed Jun 1 09:32:45 2022 From: aph at redhat.com (Andrew Haley) Date: Wed, 1 Jun 2022 10:32:45 +0100 Subject: Can Ahead of Time code benefit regular Java applications too? In-Reply-To: References: <9f70a2d5-5cb1-e615-b76b-957f95ac9928@oracle.com> Message-ID: <0bd32c26-661e-3730-d93d-35e79d4823a5@redhat.com> On 6/1/22 06:07, Julian Waters wrote: > Likewise, perhaps working in a similar fashion to > intrinsics, you could have certain sections of regularly compiled Java code > within jars replaced by native code compiled by C1 (or C2?) if the JVM it > was compiled by and the target OS/CPU match the current running JVM and > OS/CPU The problem there would be that of jaotc: it worked, but because the pre- compiled code was not patchable, it had to use indirection for all accesses. So, every field offset, method reference, etc. went through a writable section. All of these had to be fixed up, and of course it bulked out the runtime. The whole process, in the end, wasn't much quicker than C1 compilation. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From mike at hydraulic.software Wed Jun 1 13:03:08 2022 From: mike at hydraulic.software (Mike Hearn) Date: Wed, 1 Jun 2022 15:03:08 +0200 Subject: AppCDS / AOT thoughts based on CLI app experience Message-ID: Hi, It feels like most of the interest in static Java comes from the microservices / functions-as-a-service community. My new company spent the last year creating a developer tool that runs on the JVM (which will be useful for Java developers actually, but what it does is irrelevant here). Internally it's a kind of build system and is thus a large(ish) CLI app in which startup time and throughput are what matter most. We also have a separate internal tool that uses Kotlin scripting to implement a bash-like scripting language, and which is sensitive in the same ways. Today the JVM is often overlooked for writing CLI apps due to startup time, 'lightness' and packaging issues. I figured I'd write down some notes based on our experiences. They cover workflow, performance, implementation costs and security issues. Hopefully it's helpful. 1. I really like AppCDS because: a. It can't break the app so switching it on/off a no-brainer. Unlike native-image/static java, no additional testing overhead is created by it. b. It's effective even without heap snapshotting. We see a ~40% speedup for executing --help c. It's pay-as-you-go. We can use a small archive that's fast to create to accelerate just the most latency sensitive startup paths, or we can use it for the whole app, but ultimately costs are controllable. d. Archives are deterministic. Modern client-side packaging systems support delta updates, and CDS plays nicely with them. GraalVM native images are non-deterministic so every update is going to replace the entire app, which isn't much fun from an update speed or bandwidth consumption perspective. Startup time is dominated by PicoCLI which is a common problem for Java CLI apps. Supposedly the slowest part is building the model of the CLI interface using reflection, so it's a perfect candidate for AppCDS heap snapshotting. I say supposedly, because I haven't seen concrete evidence that this is actually where the time goes, but it seems like a plausible belief. There's a long standing bug filed to replace reflection with code generation but it's a big job and so nobody did it. Unfortunately the app will ship without using AppCDS. Some workflow issues remain. These can be solved in the app itself, but it'd be nice if the JVM does it. The obvious way to use CDS is to ship an archive with the app. We might do this as a first iteration, but longer term don't want to for two reasons: a. The archive can get huge. b. Signature verification penalties on macOS (see below). For just making --help and similar short commands faster size isn't so bad (~6-10mb for us), but if it's used for a whole execution the archive size for a standard run is nearly the same as total bytecode size of the app. As more stuff gets cached this will get worse. Download size might not matter much for this particular app, but as a general principle it does. So a nice improvement would be to generate it client side. CDS files are caches and different platforms have different conventions for where those go. The JVM doesn't know about those conventions but our app does, so we'd need our custom native code launcher (which exists anyway for other reasons) to set the right paths for CDS. Then you have to pick the right flags depending on whether the CDS file exists or not. I follow CDS related changes and believe this is fixed in latest Java versions but maybe (?) not released yet. Even once that's fixed it's not quite obvious that we'd use it. The JVM runs much slower when dumping a dynamic CDS archive and the first run is when first impressions are made. Whilst for cloud stuff this is a matter of (artificially?) expensive resources, for CLI apps it's about more subjective things like feeling snappy. One idea is to delay dumping a CDS archive until after the first run is exiting, so it doesn't get in the way. The first run wouldn't benefit from the archive which is a pity (except on Linux where the package managers make it easy to run code post-install), but it at least wouldn't be slowed down by creating it either. The native launcher can schedule this. Alternatively there could be a brief pause on first run when the user is told explicitly that the app is optimizing itself, but how feasible that is depends very much on dump speed. Finally we could ship a small archive that only covers startup, and then in parallel make a dump of a full run in the background. Speaking of which, there's a need for some protocol to drive an app through a representative 'trial run'. Whether it's generating the class list or the archive itself, it could be as simple as an alternative static method that sits next to main. If it were to be standardized the rest of the infrastructure becomes more re-usable, for instance build systems can take care of generating classlists, or the end-user packaging can take care of dynamic dumping. CDS has two modes and it's not clear which is better. I'm unusually obsessive about this stuff to the extent of reading the CDS source code, but despite that I have absolutely no idea if I should be trying to use static or dynamic archives. There used to be a performance difference between them but maybe it's fixed now? There's a lack of end-to-end guidance on how to exploit this feature best. The ideal would obviously be losing the dump/exec split and make dynamic dumping continuous, incremental and imposing no performance penalty. Then we could just supply a path to where the CDS file should go and things magically warm up across executions. I have no idea how feasible that is. Once AppCDS archives are in place and being created at the right times, a @Snapshotted annotation for fields (or similar) should be an easy win to eliminate the bulk of the rest of the PicoCLI time. Dynamically loaded heaps would also be useful to eliminate the overhead of loading configs and instantiating the (build) task graph without a Gradle-style daemon. 2. AppCDS archives can open a subtle security issue when distributing code to desktop platforms. Because they're full of vtables anyone who can write to them can (we assume) take over any JVM that loads the archive and gain whatever privileges have been granted to that app. The archive file is fully trusted. On Windows and Linux this doesn't matter. On Linux sensitive files can be packaged or created in postinst scripts. On Windows either an app comes with a legacy installer/MSI file and thus doesn't have any recognized package identity that can be granted extra permissions, or it uses the current gen MSIX system. In the latter case Windows has a notion of app identity and so you can request permissions to access e.g. keychain entries, the user's calendar etc, but in that case Windows also gives you a private directory that's protected from other apps where sensitive files can be stashed. AppCDS archives can go there and we're done. MacOS is a problem child. There are two situations that matter. In the first case archives are shipped as data files with the app. Security is not an issue here, but there's a subtle performance footgun. On most platforms signatures of files shipped with an app are checked at install time but on macOS they aren't. Thanks to its NeXT roots it doesn't really have an installation concept, and thus the kernel checks signatures of files on first use then caches the signature check in the kernel vnode. By default the entire file is hashed in order to link it back to the root signature, which for large files can impose a small but noticeable delay before the app can open them. This first run penalty is unfortunate given that AppCDS exists partly to improve startup time. You can argue it doesn't matter much due to the caching, but it's worth being aware of - very large AppCDS archives would get fully paged in and hashed before the app even gets to do anything. In turn that means people might enable AppCDS with a big classlist expecting it to speed things up, not noticing that for Mac users only it slowed things down instead. There are ways to fix this using supported Apple APIs. One is to supply a CodeDirectory structure stored in extended attributes: you should get incremental hashing and normal page fault behaviour (untested!). Another is to wrap the data in a Mach-O file. In the second case the CDS archive is being generated client side. Mac apps don't have anywhere they can create tamperproof data, except for very small amounts in the keychain. Thus if a Mac app opens a malicious cache file that can take control of it that's a security bug, because it'd allow one program to grab any special privileges the user granted to another. The fact that the grabbing program has passed GateKeeper and notarization doesn't necessarily matter (Apple's guidance on this is unclear, but it seems plausible that this is their stance). In this case the key chain can be used as a root of trust by storing a hash of the CDS archive in it and checking that after mmap/before use. Alternatively, again, Apple provides an API that lets you associate an on-disk (xattr) CodeDirectory structure with a file which will then be checked incrementally at page fault time. Extreme care must be taken to avoid race conditions, but in theory, a CodeDirectory structure can be computed at dump time, written to disk as an xattr, and then stored again in the key chain (e.g. by pretending it's a "key" or "password"). After the security API is instructed to associate a CD with the file, it can be checked against the tamperproofed version stored in the key chain and if they match, the archive can then be mmapped and used as normal. Native images don't have these issues because the state snapshot is stored inside the Mach-O file and thus gets covered by the normal mechanisms. However once it adds support for persisted heaps, the same issue may arise. Whether it's worth doing the extra work to solve this is unclear. Macs are guaranteed to come with very fast NVMe disks and CPUs. Still, it's worth being aware of the issue. 3. Why not just use a native image then? Maybe we'll do that because the performance wins are really compelling, but again, v1 will ship without this for the following reasons: a. Static minification can break things. Our integration tests currently invoke the entry point of the app directly, but that could be fixed to run the tool in an external process. For unit tests the situation is far murkier. It's a bit unclear how to run JUnit tests against the statically compiled version and it may not even make sense (because the tests would pin a bunch of code that might get stripped in the real app so what are you really testing?). b. It'd break delta updates. Not the end of the world, but a factor. c. I have no idea if we're using any libraries that spin bytecode dynamically. Even if we're not today, what if tomorrow we want to use such a library? Do we have to avoid using it and increase the cost of feature development, or roll back the native image and give our users a nasty performance downgrade? Neither option is attractive. Ideally SubstrateVM would contain a bytecode interpreter and use it when necessary. Lots of issues there but e.g. it'd probably be OK if it's not a general classloader and the code dependencies have to be known AOT. d. Similar to (c), fully AOT compilation can explode code and thus download size even though many codepaths are cold and only execute once. It'd be nice if a native image could include a mix of bytecode and AOT compiled hotspots. e. Once you're past the initial interactive stage the program is throughput sensitive. How much of a perf downgrade over HotSpot would we get, if any? With GraalVM EE we could use PGO and not lose any, but the ISV pricing is opaque. At any rate to answer this we have to fix the compatibility issues first. The prospect of improving startup time and then discovering we slowed down the actual builds isn't really appealing (though I suspect in our case AOT wouldn't really hurt much). f. What if we want to support in-process plugins? Maybe we can use Espresso, but this is a road less travelled (lack of tutorials, well documented examples etc). An interesting possibility is using a mix of approaches. For the bash competitor I mentioned earlier dynamic code loading is needed because the script bytecode is loaded into the host JVM, but the Kotlin compiler itself could theoretically be statically compiled to a JNI or Panama-accessible library. We tried this before and hit compatibility errors, but didn't make any effort to resolve them. 4. What about CRaC? It's Linux only so isn't interesting to us, given that most devs are on Windows/macOS. The benefits for Linux servers are clear though. Obvious question - can you make a snapshot on one machine/Linux distro, and resume them on a totally different one, or does it require a homogenous infrastructure? 5. A big reason AppCDS is nice is we get to keep the open world. This isn't only about compatibility, open worlds are just better. The most popular way to get software to desktop machines is Chrome and the web is totally open world. Apps are downloaded incrementally as the user navigates around, and companies exploit this fact aggressively. Large web sites can be far larger than would be considered practical to distribute to end user machines, and can easily update 50 times a day. Web developers have to think about latency on specific interactions, but they don't have to think about the size of the entire app and that allows them to scale up feature sets as fast as funding allows. In contrast the closed world mobile versions of their sites are a parade of horror stories in which firms have to e.g. hotpatch Dalvik to work around method count limits (Facebook), or in which code size issues nearly wrecked the entire company (Uber): https://twitter.com/StanTwinB/status/1336914412708405248 Right now code size isn't a particularly serious problem for us, but the ease of including open source libraries means footprint grows all the time. Especially for our shell scripting tool, there are tons of cool features that could be added but if we did all of them we'd probably end up with 500mb of bytecode. With an open world features can be downloaded on the fly as they get used and you can build a plugin ecosystem. The new more incremental direction of Leyden is thus welcomed and appreciated, because it feels like a lot of ground can be covered by "small" changes like upgrading AppCDS and caching compiled hotspots. Even if the results aren't as impressive as with native-image, the benefits of keeping an open world can probably make up for it, at least for our use cases. From adinn at redhat.com Wed Jun 1 13:28:58 2022 From: adinn at redhat.com (Andrew Dinn) Date: Wed, 1 Jun 2022 14:28:58 +0100 Subject: AppCDS / AOT thoughts based on CLI app experience In-Reply-To: References: Message-ID: <603fcc35-03fe-54df-a47b-a659eaadf996@redhat.com> Hi Mike, Thanks very much for that extremely valuable input, in particular the very clear breakdown of the swings and roundabouts you have noted when it comes to using CDS/AppCDS or native Java vs the vanilla dynamic JVM. It is very important that project Leyden considers the whole development and deployment cycle, not just the size and startup time/footprint of the delivered static Java executable (indeed, Dan Heidinga and I just published an article about this topic on InfoQ that you might find relevant). Your comment about AppCDS being "pay-as-you-go" resonated most strongly. I hope that one of the pay-offs of the incremental approach Mark has recommended for the project will be the ability to provide "pay-as-you-go" improvements in startup time and footprint where a user can balance development benefits and costs against those arising at deployment time. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill On 01/06/2022 14:03, Mike Hearn wrote: > Hi, > > It feels like most of the interest in static Java comes from the > microservices / functions-as-a-service community. My new company spent > the last year creating a developer tool that runs on the JVM (which > will be useful for Java developers actually, but what it does is > irrelevant here). Internally it's a kind of build system and is thus a > large(ish) CLI app in which startup time and throughput are what > matter most. We also have a separate internal tool that uses Kotlin > scripting to implement a bash-like scripting language, and which is > sensitive in the same ways. > > Today the JVM is often overlooked for writing CLI apps due to startup > time, 'lightness' and packaging issues. I figured I'd write down some > notes based on our experiences. They cover workflow, performance, > implementation costs and security issues. Hopefully it's helpful. > > 1. > > I really like AppCDS because: > > a. It can't break the app so switching it on/off a no-brainer. Unlike > native-image/static java, no additional testing overhead is created by > it. > > b. It's effective even without heap snapshotting. We see a ~40% > speedup for executing --help > > c. It's pay-as-you-go. We can use a small archive that's fast to > create to accelerate just the most latency sensitive startup paths, or > we can use it for the whole app, but ultimately costs are > controllable. > > d. Archives are deterministic. Modern client-side packaging systems > support delta updates, and CDS plays nicely with them. GraalVM native > images are non-deterministic so every update is going to replace the > entire app, which isn't much fun from an update speed or bandwidth > consumption perspective. > > Startup time is dominated by PicoCLI which is a common problem for > Java CLI apps. Supposedly the slowest part is building the model of > the CLI interface using reflection, so it's a perfect candidate for > AppCDS heap snapshotting. I say supposedly, because I haven't seen > concrete evidence that this is actually where the time goes, but it > seems like a plausible belief. There's a long standing bug filed to > replace reflection with code generation but it's a big job and so > nobody did it. > > Unfortunately the app will ship without using AppCDS. Some workflow > issues remain. These can be solved in the app itself, but it'd be nice > if the JVM does it. > > The obvious way to use CDS is to ship an archive with the app. We > might do this as a first iteration, but longer term don't want to for > two reasons: > > a. The archive can get huge. > b. Signature verification penalties on macOS (see below). > > For just making --help and similar short commands faster size isn't so > bad (~6-10mb for us), but if it's used for a whole execution the > archive size for a standard run is nearly the same as total bytecode > size of the app. As more stuff gets cached this will get worse. > Download size might not matter much for this particular app, but as a > general principle it does. So a nice improvement would be to generate > it client side. > > CDS files are caches and different platforms have different > conventions for where those go. The JVM doesn't know about those > conventions but our app does, so we'd need our custom native code > launcher (which exists anyway for other reasons) to set the right > paths for CDS. > > Then you have to pick the right flags depending on whether the CDS > file exists or not. I follow CDS related changes and believe this is > fixed in latest Java versions but maybe (?) not released yet. > > Even once that's fixed it's not quite obvious that we'd use it. The > JVM runs much slower when dumping a dynamic CDS archive and the first > run is when first impressions are made. Whilst for cloud stuff this is > a matter of (artificially?) expensive resources, for CLI apps it's > about more subjective things like feeling snappy. One idea is to delay > dumping a CDS archive until after the first run is exiting, so it > doesn't get in the way. The first run wouldn't benefit from the > archive which is a pity (except on Linux where the package managers > make it easy to run code post-install), but it at least wouldn't be > slowed down by creating it either. The native launcher can schedule > this. Alternatively there could be a brief pause on first run when the > user is told explicitly that the app is optimizing itself, but how > feasible that is depends very much on dump speed. Finally we could > ship a small archive that only covers startup, and then in parallel > make a dump of a full run in the background. > > Speaking of which, there's a need for some protocol to drive an app > through a representative 'trial run'. Whether it's generating the > class list or the archive itself, it could be as simple as an > alternative static method that sits next to main. If it were to be > standardized the rest of the infrastructure becomes more re-usable, > for instance build systems can take care of generating classlists, or > the end-user packaging can take care of dynamic dumping. > > CDS has two modes and it's not clear which is better. I'm unusually > obsessive about this stuff to the extent of reading the CDS source > code, but despite that I have absolutely no idea if I should be trying > to use static or dynamic archives. There used to be a performance > difference between them but maybe it's fixed now? There's a lack of > end-to-end guidance on how to exploit this feature best. > > The ideal would obviously be losing the dump/exec split and make > dynamic dumping continuous, incremental and imposing no performance > penalty. Then we could just supply a path to where the CDS file should > go and things magically warm up across executions. I have no idea how > feasible that is. > > Once AppCDS archives are in place and being created at the right > times, a @Snapshotted annotation for fields (or similar) should be an > easy win to eliminate the bulk of the rest of the PicoCLI time. > Dynamically loaded heaps would also be useful to eliminate the > overhead of loading configs and instantiating the (build) task graph > without a Gradle-style daemon. > > 2. > > AppCDS archives can open a subtle security issue when distributing > code to desktop platforms. Because they're full of vtables anyone who > can write to them can (we assume) take over any JVM that loads the > archive and gain whatever privileges have been granted to that app. > The archive file is fully trusted. > > On Windows and Linux this doesn't matter. On Linux sensitive files can > be packaged or created in postinst scripts. On Windows either an app > comes with a legacy installer/MSI file and thus doesn't have any > recognized package identity that can be granted extra permissions, or > it uses the current gen MSIX system. In the latter case Windows has a > notion of app identity and so you can request permissions to access > e.g. keychain entries, the user's calendar etc, but in that case > Windows also gives you a private directory that's protected from other > apps where sensitive files can be stashed. AppCDS archives can go > there and we're done. > > MacOS is a problem child. There are two situations that matter. > > In the first case archives are shipped as data files with the app. > Security is not an issue here, but there's a subtle performance > footgun. On most platforms signatures of files shipped with an app are > checked at install time but on macOS they aren't. Thanks to its NeXT > roots it doesn't really have an installation concept, and thus the > kernel checks signatures of files on first use then caches the > signature check in the kernel vnode. By default the entire file is > hashed in order to link it back to the root signature, which for large > files can impose a small but noticeable delay before the app can open > them. This first run penalty is unfortunate given that AppCDS exists > partly to improve startup time. You can argue it doesn't matter much > due to the caching, but it's worth being aware of - very large AppCDS > archives would get fully paged in and hashed before the app even gets > to do anything. In turn that means people might enable AppCDS with a > big classlist expecting it to speed things up, not noticing that for > Mac users only it slowed things down instead. There are ways to fix > this using supported Apple APIs. One is to supply a CodeDirectory > structure stored in extended attributes: you should get incremental > hashing and normal page fault behaviour (untested!). Another is to > wrap the data in a Mach-O file. > > In the second case the CDS archive is being generated client side. Mac > apps don't have anywhere they can create tamperproof data, except for > very small amounts in the keychain. Thus if a Mac app opens a > malicious cache file that can take control of it that's a security > bug, because it'd allow one program to grab any special privileges the > user granted to another. The fact that the grabbing program has passed > GateKeeper and notarization doesn't necessarily matter (Apple's > guidance on this is unclear, but it seems plausible that this is their > stance). In this case the key chain can be used as a root of trust by > storing a hash of the CDS archive in it and checking that after > mmap/before use. Alternatively, again, Apple provides an API that lets > you associate an on-disk (xattr) CodeDirectory structure with a file > which will then be checked incrementally at page fault time. Extreme > care must be taken to avoid race conditions, but in theory, a > CodeDirectory structure can be computed at dump time, written to disk > as an xattr, and then stored again in the key chain (e.g. by > pretending it's a "key" or "password"). After the security API is > instructed to associate a CD with the file, it can be checked against > the tamperproofed version stored in the key chain and if they match, > the archive can then be mmapped and used as normal. > > Native images don't have these issues because the state snapshot is > stored inside the Mach-O file and thus gets covered by the normal > mechanisms. However once it adds support for persisted heaps, the same > issue may arise. > > Whether it's worth doing the extra work to solve this is unclear. Macs > are guaranteed to come with very fast NVMe disks and CPUs. Still, it's > worth being aware of the issue. > > 3. > > Why not just use a native image then? Maybe we'll do that because the > performance wins are really compelling, but again, v1 will ship > without this for the following reasons: > > a. Static minification can break things. Our integration tests > currently invoke the entry point of the app directly, but that could > be fixed to run the tool in an external process. For unit tests the > situation is far murkier. It's a bit unclear how to run JUnit tests > against the statically compiled version and it may not even make sense > (because the tests would pin a bunch of code that might get stripped > in the real app so what are you really testing?). > > b. It'd break delta updates. Not the end of the world, but a factor. > > c. I have no idea if we're using any libraries that spin bytecode > dynamically. Even if we're not today, what if tomorrow we want to use > such a library? Do we have to avoid using it and increase the cost of > feature development, or roll back the native image and give our users > a nasty performance downgrade? Neither option is attractive. Ideally > SubstrateVM would contain a bytecode interpreter and use it when > necessary. Lots of issues there but e.g. it'd probably be OK if it's > not a general classloader and the code dependencies have to be known > AOT. > > d. Similar to (c), fully AOT compilation can explode code and thus > download size even though many codepaths are cold and only execute > once. It'd be nice if a native image could include a mix of bytecode > and AOT compiled hotspots. > > e. Once you're past the initial interactive stage the program is > throughput sensitive. How much of a perf downgrade over HotSpot would > we get, if any? With GraalVM EE we could use PGO and not lose any, but > the ISV pricing is opaque. At any rate to answer this we have to fix > the compatibility issues first. The prospect of improving startup time > and then discovering we slowed down the actual builds isn't really > appealing (though I suspect in our case AOT wouldn't really hurt > much). > > f. What if we want to support in-process plugins? Maybe we can use > Espresso, but this is a road less travelled (lack of tutorials, well > documented examples etc). > > An interesting possibility is using a mix of approaches. For the bash > competitor I mentioned earlier dynamic code loading is needed because > the script bytecode is loaded into the host JVM, but the Kotlin > compiler itself could theoretically be statically compiled to a JNI or > Panama-accessible library. We tried this before and hit compatibility > errors, but didn't make any effort to resolve them. > > 4. > > What about CRaC? It's Linux only so isn't interesting to us, given > that most devs are on Windows/macOS. The benefits for Linux servers > are clear though. Obvious question - can you make a snapshot on one > machine/Linux distro, and resume them on a totally different one, or > does it require a homogenous infrastructure? > > 5. > > A big reason AppCDS is nice is we get to keep the open world. This > isn't only about compatibility, open worlds are just better. The most > popular way to get software to desktop machines is Chrome and the web > is totally open world. Apps are downloaded incrementally as the user > navigates around, and companies exploit this fact aggressively. Large > web sites can be far larger than would be considered practical to > distribute to end user machines, and can easily update 50 times a day. > Web developers have to think about latency on specific interactions, > but they don't have to think about the size of the entire app and that > allows them to scale up feature sets as fast as funding allows. In > contrast the closed world mobile versions of their sites are a parade > of horror stories in which firms have to e.g. hotpatch Dalvik to work > around method count limits (Facebook), or in which code size issues > nearly wrecked the entire company (Uber): > > https://twitter.com/StanTwinB/status/1336914412708405248 > > Right now code size isn't a particularly serious problem for us, but > the ease of including open source libraries means footprint grows all > the time. Especially for our shell scripting tool, there are tons of > cool features that could be added but if we did all of them we'd > probably end up with 500mb of bytecode. With an open world features > can be downloaded on the fly as they get used and you can build a > plugin ecosystem. > > The new more incremental direction of Leyden is thus welcomed and > appreciated, because it feels like a lot of ground can be covered by > "small" changes like upgrading AppCDS and caching compiled hotspots. > Even if the results aren't as impressive as with native-image, the > benefits of keeping an open world can probably make up for it, at > least for our use cases. > From mike at hydraulic.software Wed Jun 1 13:41:49 2022 From: mike at hydraulic.software (Mike Hearn) Date: Wed, 1 Jun 2022 15:41:49 +0200 Subject: AppCDS / AOT thoughts based on CLI app experience In-Reply-To: <603fcc35-03fe-54df-a47b-a659eaadf996@redhat.com> References: <603fcc35-03fe-54df-a47b-a659eaadf996@redhat.com> Message-ID: Thanks Andrew. Yes, I saw the InfoQ article, it's excellent. Actually it was reading that which prompted me to sign up and write out these notes. From ioi.lam at oracle.com Wed Jun 1 16:23:36 2022 From: ioi.lam at oracle.com (Ioi Lam) Date: Wed, 1 Jun 2022 09:23:36 -0700 Subject: Can Ahead of Time code benefit regular Java applications too? In-Reply-To: <0bd32c26-661e-3730-d93d-35e79d4823a5@redhat.com> References: <9f70a2d5-5cb1-e615-b76b-957f95ac9928@oracle.com> <0bd32c26-661e-3730-d93d-35e79d4823a5@redhat.com> Message-ID: On 6/1/2022 2:32 AM, Andrew Haley wrote: > On 6/1/22 06:07, Julian Waters wrote: >> Likewise, perhaps working in a similar fashion to >> intrinsics, you could have certain sections of regularly compiled >> Java code >> within jars replaced by native code compiled by C1 (or C2?) if the >> JVM it >> was compiled by and the target OS/CPU match the current running JVM and >> OS/CPU > > The problem there would be that of jaotc: it worked, but because the pre- > compiled code was not patchable, it had to use indirection for all > accesses. > So, every field offset, method reference, etc. went through a writable > section. All of these had to be fixed up, and of course it bulked out > the runtime. The whole process, in the end, wasn't much quicker than > C1 compilation. > I think part of this can be fixed with my "prelinking" proposal - if the app cannot alter the classes that are in the AOT code, then many of the redirections can be eliminated. Also, some of the indirection in the original jaot had to deal with object references (e.g., String constants), because it didn't have the notion of a cached heap. Hopefully Leyden can have better integration between the AOT code and cached heap to make this problem go away. Thanks - Ioi From akozlov at azul.com Wed Jun 1 16:47:10 2022 From: akozlov at azul.com (Anton Kozlov) Date: Wed, 1 Jun 2022 19:47:10 +0300 Subject: Project Leyden: Beginnings In-Reply-To: References: <7c59af5c-9ede-19fb-7865-7bb854e93ca7@azul.com> Message-ID: <7c8c36d5-a8e4-8b1c-08fd-77f30eaefea4@azul.com> On 5/31/22 12:32, Andrew Dinn wrote: > One has to bear in mind that a closed world as defined by full program analysis (possibly supplemented with user directives to embrace things like reflective targets) can exclude everything that is not marked as reachable during the analysis from its generated image, maybe whole classes in some cases, or maybe just static/instance fields and methods of some classes. I didn't use this exact definition but meant closed world image as the result of a whole program analysis under a set of assumptions that are more strict that the Java language. For example, user directives is a meta-language describing white areas of the program that cannot be analyzed by the compiler during the build. A meta-language in theory may be able to express a rich set of assumptions about unknown areas. E.g. Class.forName, a meta-language probably may express not exact possible target classes, but assumed properties of those classes, like the class does not use reflection itself, so cannot access private fields, and the subsequent checkcast should succeed, so the unknown class won't be able to access protected fields beyond own class hierarchy. So a point of the program that is impossible/hard to reason about (e.g. the reflection) may specify no assumptions except it has minimal interference with the part that can be analyzed. E.g. a servlet should not access the internal details of the servlet container. Then analyzeable part may probably be optimized almost as efficiently as a completely analyzeable program. Java modules may indeed be useful to separate parts of the program to make the analysis easier. Thanks, Anton From akozlov at azul.com Wed Jun 1 18:31:33 2022 From: akozlov at azul.com (Anton Kozlov) Date: Wed, 1 Jun 2022 21:31:33 +0300 Subject: AppCDS / AOT thoughts based on CLI app experience In-Reply-To: References: Message-ID: <159565a0-5c7d-84b1-a2cd-e30e0b509faa@azul.com> Thank you for the excellent write-up! Although many problems you've mentioned are not solved (and sometimes are made worse) by CRaC, I can't resist mentioning a CRaC change for CLI apps [1]. But this is offtopic, so BCCing leyden-dev and CC crac-dev. On 6/1/22 16:03, Mike Hearn wrote: > What about CRaC? It's Linux only so isn't interesting to us, given > that most devs are on Windows/macOS. The benefits for Linux servers > are clear though. Obvious question - can you make a snapshot on one > machine/Linux distro, and resume them on a totally different one, or > does it require a homogenous infrastructure? In the current implementation, we've not started working on this. By the model, CRaC prevents file dependencies at the checkpoint and allows VM to coordinate restore. So eventually we should deliver images that do not depend on the particular CPU and distribution. The feasibility of the full implementation for Mac and Windows OS is unclear. But I think a reasonable effort will be required to provide an implementation for testing and developing programs on those OSes, which will match the behavior of Linux CRaC implementation. Thanks, Anton From ioi.lam at oracle.com Fri Jun 3 00:15:36 2022 From: ioi.lam at oracle.com (Ioi Lam) Date: Thu, 2 Jun 2022 17:15:36 -0700 Subject: AppCDS / AOT thoughts based on CLI app experience In-Reply-To: References: Message-ID: <12fc5517-78d7-dfc0-f9c1-cbcdba5b7ccd@oracle.com> Hi Mike, I am thrilled to hear that you're happy with CDS. Please see my responses below. If you have other questions or requests for CDS, please let me know :-) On 6/1/2022 6:03 AM, Mike Hearn wrote: > Hi, > > It feels like most of the interest in static Java comes from the > microservices / functions-as-a-service community. My new company spent > the last year creating a developer tool that runs on the JVM (which > will be useful for Java developers actually, but what it does is > irrelevant here). Internally it's a kind of build system and is thus a > large(ish) CLI app in which startup time and throughput are what > matter most. We also have a separate internal tool that uses Kotlin > scripting to implement a bash-like scripting language, and which is > sensitive in the same ways. > > Today the JVM is often overlooked for writing CLI apps due to startup > time, 'lightness' and packaging issues. I figured I'd write down some > notes based on our experiences. They cover workflow, performance, > implementation costs and security issues. Hopefully it's helpful. > > 1. > > I really like AppCDS because: > > a. It can't break the app so switching it on/off a no-brainer. Unlike > native-image/static java, no additional testing overhead is created by > it. > > b. It's effective even without heap snapshotting. We see a ~40% > speedup for executing --help > > c. It's pay-as-you-go. We can use a small archive that's fast to > create to accelerate just the most latency sensitive startup paths, or > we can use it for the whole app, but ultimately costs are > controllable. > > d. Archives are deterministic. Modern client-side packaging systems > support delta updates, and CDS plays nicely with them. GraalVM native > images are non-deterministic so every update is going to replace the > entire app, which isn't much fun from an update speed or bandwidth > consumption perspective. > > Startup time is dominated by PicoCLI which is a common problem for > Java CLI apps. Supposedly the slowest part is building the model of > the CLI interface using reflection, so it's a perfect candidate for > AppCDS heap snapshotting. I say supposedly, because I haven't seen > concrete evidence that this is actually where the time goes, but it > seems like a plausible belief. There's a long standing bug filed to > replace reflection with code generation but it's a big job and so > nobody did it. > > Unfortunately the app will ship without using AppCDS. Some workflow > issues remain. These can be solved in the app itself, but it'd be nice > if the JVM does it. > > The obvious way to use CDS is to ship an archive with the app. We > might do this as a first iteration, but longer term don't want to for > two reasons: > > a. The archive can get huge. > b. Signature verification penalties on macOS (see below). > > For just making --help and similar short commands faster size isn't so > bad (~6-10mb for us), but if it's used for a whole execution the > archive size for a standard run is nearly the same as total bytecode > size of the app. As more stuff gets cached this will get worse. > Download size might not matter much for this particular app, but as a > general principle it does. So a nice improvement would be to generate > it client side. > > CDS files are caches and different platforms have different > conventions for where those go. The JVM doesn't know about those > conventions but our app does, so we'd need our custom native code > launcher (which exists anyway for other reasons) to set the right > paths for CDS. > > Then you have to pick the right flags depending on whether the CDS > file exists or not. I follow CDS related changes and believe this is > fixed in latest Java versions but maybe (?) not released yet. Which version of Java are you using? Since JDK 11, the default value of -Xshare is set to -Xshare:auto, so you can always do this: $ java -XX:SharedArchiveFile=nosuch.jsa -version java version "11" 2018-09-25 Java(TM) SE Runtime Environment 18.9 (build 11+28) Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11+28, mixed mode) If the file exists, it will be used automatically. Otherwise the VM will silently ignore the archive. Since JDK 17, a default CDS archive is shipped with the JDK. So you will at least get some performance benefits of CDS for the built-in classes. With the upcoming JDK 19, we have implemented a new feature (See JDK-8261455) to automatically create the CDS archive. Here's an example (I am using Javac because it's convenient, but you need to quote the JVM parameters with -J): $ javac -J-XX:+AutoCreateSharedArchive -J-XX:SharedArchiveFile=javac.jsa HelloWorld.java javac.jsa will be automatically created if it doesn't exist, or if it's not compatible with the JVM (e.g., if you have upgraded to a newer JDK). In this case, the total elapsed time is improved from about 522ms (with default CDS archive) to 330ms (auto-generated archive). > Even once that's fixed it's not quite obvious that we'd use it. The > JVM runs much slower when dumping a dynamic CDS archive and the first > run is when first impressions are made. Whilst for cloud stuff this is > a matter of (artificially?) expensive resources, for CLI apps it's > about more subjective things like feeling snappy. One idea is to delay > dumping a CDS archive until after the first run is exiting, so it > doesn't get in the way. The first run wouldn't benefit from the > archive which is a pity (except on Linux where the package managers > make it easy to run code post-install), but it at least wouldn't be > slowed down by creating it either. The native launcher can schedule > this. Alternatively there could be a brief pause on first run when the > user is told explicitly that the app is optimizing itself, but how > feasible that is depends very much on dump speed. Finally we could > ship a small archive that only covers startup, and then in parallel > make a dump of a full run in the background. The dynamic CDS dumping happens when the JVM exits. We could ... (just throwing half-baked ideas) spawn a new daemon subprocess to do the dumping, while the main JVM process exits. So to the user there's no penalty. > Speaking of which, there's a need for some protocol to drive an app > through a representative 'trial run'. Whether it's generating the > class list or the archive itself, it could be as simple as an > alternative static method that sits next to main. If it were to be > standardized the rest of the infrastructure becomes more re-usable, > for instance build systems can take care of generating classlists, or > the end-user packaging can take care of dynamic dumping. Maybe we could have some sort of daemon that collects profiling data in the background, and update the archives when the application behavior is more understood. > CDS has two modes and it's not clear which is better. I'm unusually > obsessive about this stuff to the extent of reading the CDS source > code, but despite that I have absolutely no idea if I should be trying > to use static or dynamic archives. There used to be a performance > difference between them but maybe it's fixed now? There's a lack of > end-to-end guidance on how to exploit this feature best. I agree our documentation is kind of lacking. We'll try to improve it. Static and dynamic archives will be roughly the same speed (~10 ms faster with static dump for the javac example above). The dynamic archive will be smaller, because it doesn't need to duplicate the built-in classes that are already in the static archive. Here's a size comparison for javac.jsa static:? 20,217,856 bytes dynamic: 10,153,984 bytes > The ideal would obviously be losing the dump/exec split and make > dynamic dumping continuous, incremental and imposing no performance > penalty. Then we could just supply a path to where the CDS file should > go and things magically warm up across executions. I have no idea how > feasible that is. > > Once AppCDS archives are in place and being created at the right > times, a @Snapshotted annotation for fields (or similar) should be an > easy win to eliminate the bulk of the rest of the PicoCLI time. > Dynamically loaded heaps would also be useful to eliminate the > overhead of loading configs and instantiating the (build) task graph > without a Gradle-style daemon. > > 2. > > AppCDS archives can open a subtle security issue when distributing > code to desktop platforms. Because they're full of vtables anyone who > can write to them can (we assume) take over any JVM that loads the > archive and gain whatever privileges have been granted to that app. > The archive file is fully trusted. Will you have a similar problem if the JAR file of the application is maliciously modified? Actually the vtables inside the CDS archive file contain all zeros, and are filled in by the VM after the archive is mapped. What could be modified is the vtptr of archived MetaData objects. They usually point to somewhere near 0x800000000 (where the vtables are) but the attacker could modify them to point to arbitrary locations. I am not sure if this type of attack is easier than modifying the JAR files, or not. Thanks - Ioi > > On Windows and Linux this doesn't matter. On Linux sensitive files can > be packaged or created in postinst scripts. On Windows either an app > comes with a legacy installer/MSI file and thus doesn't have any > recognized package identity that can be granted extra permissions, or > it uses the current gen MSIX system. In the latter case Windows has a > notion of app identity and so you can request permissions to access > e.g. keychain entries, the user's calendar etc, but in that case > Windows also gives you a private directory that's protected from other > apps where sensitive files can be stashed. AppCDS archives can go > there and we're done. > > MacOS is a problem child. There are two situations that matter. > > In the first case archives are shipped as data files with the app. > Security is not an issue here, but there's a subtle performance > footgun. On most platforms signatures of files shipped with an app are > checked at install time but on macOS they aren't. Thanks to its NeXT > roots it doesn't really have an installation concept, and thus the > kernel checks signatures of files on first use then caches the > signature check in the kernel vnode. By default the entire file is > hashed in order to link it back to the root signature, which for large > files can impose a small but noticeable delay before the app can open > them. This first run penalty is unfortunate given that AppCDS exists > partly to improve startup time. You can argue it doesn't matter much > due to the caching, but it's worth being aware of - very large AppCDS > archives would get fully paged in and hashed before the app even gets > to do anything. In turn that means people might enable AppCDS with a > big classlist expecting it to speed things up, not noticing that for > Mac users only it slowed things down instead. There are ways to fix > this using supported Apple APIs. One is to supply a CodeDirectory > structure stored in extended attributes: you should get incremental > hashing and normal page fault behaviour (untested!). Another is to > wrap the data in a Mach-O file. > > In the second case the CDS archive is being generated client side. Mac > apps don't have anywhere they can create tamperproof data, except for > very small amounts in the keychain. Thus if a Mac app opens a > malicious cache file that can take control of it that's a security > bug, because it'd allow one program to grab any special privileges the > user granted to another. The fact that the grabbing program has passed > GateKeeper and notarization doesn't necessarily matter (Apple's > guidance on this is unclear, but it seems plausible that this is their > stance). In this case the key chain can be used as a root of trust by > storing a hash of the CDS archive in it and checking that after > mmap/before use. Alternatively, again, Apple provides an API that lets > you associate an on-disk (xattr) CodeDirectory structure with a file > which will then be checked incrementally at page fault time. Extreme > care must be taken to avoid race conditions, but in theory, a > CodeDirectory structure can be computed at dump time, written to disk > as an xattr, and then stored again in the key chain (e.g. by > pretending it's a "key" or "password"). After the security API is > instructed to associate a CD with the file, it can be checked against > the tamperproofed version stored in the key chain and if they match, > the archive can then be mmapped and used as normal. > > Native images don't have these issues because the state snapshot is > stored inside the Mach-O file and thus gets covered by the normal > mechanisms. However once it adds support for persisted heaps, the same > issue may arise. > > Whether it's worth doing the extra work to solve this is unclear. Macs > are guaranteed to come with very fast NVMe disks and CPUs. Still, it's > worth being aware of the issue. > > 3. > > Why not just use a native image then? Maybe we'll do that because the > performance wins are really compelling, but again, v1 will ship > without this for the following reasons: > > a. Static minification can break things. Our integration tests > currently invoke the entry point of the app directly, but that could > be fixed to run the tool in an external process. For unit tests the > situation is far murkier. It's a bit unclear how to run JUnit tests > against the statically compiled version and it may not even make sense > (because the tests would pin a bunch of code that might get stripped > in the real app so what are you really testing?). > > b. It'd break delta updates. Not the end of the world, but a factor. > > c. I have no idea if we're using any libraries that spin bytecode > dynamically. Even if we're not today, what if tomorrow we want to use > such a library? Do we have to avoid using it and increase the cost of > feature development, or roll back the native image and give our users > a nasty performance downgrade? Neither option is attractive. Ideally > SubstrateVM would contain a bytecode interpreter and use it when > necessary. Lots of issues there but e.g. it'd probably be OK if it's > not a general classloader and the code dependencies have to be known > AOT. > > d. Similar to (c), fully AOT compilation can explode code and thus > download size even though many codepaths are cold and only execute > once. It'd be nice if a native image could include a mix of bytecode > and AOT compiled hotspots. > > e. Once you're past the initial interactive stage the program is > throughput sensitive. How much of a perf downgrade over HotSpot would > we get, if any? With GraalVM EE we could use PGO and not lose any, but > the ISV pricing is opaque. At any rate to answer this we have to fix > the compatibility issues first. The prospect of improving startup time > and then discovering we slowed down the actual builds isn't really > appealing (though I suspect in our case AOT wouldn't really hurt > much). > > f. What if we want to support in-process plugins? Maybe we can use > Espresso, but this is a road less travelled (lack of tutorials, well > documented examples etc). > > An interesting possibility is using a mix of approaches. For the bash > competitor I mentioned earlier dynamic code loading is needed because > the script bytecode is loaded into the host JVM, but the Kotlin > compiler itself could theoretically be statically compiled to a JNI or > Panama-accessible library. We tried this before and hit compatibility > errors, but didn't make any effort to resolve them. > > 4. > > What about CRaC? It's Linux only so isn't interesting to us, given > that most devs are on Windows/macOS. The benefits for Linux servers > are clear though. Obvious question - can you make a snapshot on one > machine/Linux distro, and resume them on a totally different one, or > does it require a homogenous infrastructure? > > 5. > > A big reason AppCDS is nice is we get to keep the open world. This > isn't only about compatibility, open worlds are just better. The most > popular way to get software to desktop machines is Chrome and the web > is totally open world. Apps are downloaded incrementally as the user > navigates around, and companies exploit this fact aggressively. Large > web sites can be far larger than would be considered practical to > distribute to end user machines, and can easily update 50 times a day. > Web developers have to think about latency on specific interactions, > but they don't have to think about the size of the entire app and that > allows them to scale up feature sets as fast as funding allows. In > contrast the closed world mobile versions of their sites are a parade > of horror stories in which firms have to e.g. hotpatch Dalvik to work > around method count limits (Facebook), or in which code size issues > nearly wrecked the entire company (Uber): > > https://twitter.com/StanTwinB/status/1336914412708405248 > > Right now code size isn't a particularly serious problem for us, but > the ease of including open source libraries means footprint grows all > the time. Especially for our shell scripting tool, there are tons of > cool features that could be added but if we did all of them we'd > probably end up with 500mb of bytecode. With an open world features > can be downloaded on the fly as they get used and you can build a > plugin ecosystem. > > The new more incremental direction of Leyden is thus welcomed and > appreciated, because it feels like a lot of ground can be covered by > "small" changes like upgrading AppCDS and caching compiled hotspots. > Even if the results aren't as impressive as with native-image, the > benefits of keeping an open world can probably make up for it, at > least for our use cases. From ioi.lam at oracle.com Fri Jun 3 00:30:39 2022 From: ioi.lam at oracle.com (Ioi Lam) Date: Thu, 2 Jun 2022 17:30:39 -0700 Subject: AppCDS / AOT thoughts based on CLI app experience In-Reply-To: <12fc5517-78d7-dfc0-f9c1-cbcdba5b7ccd@oracle.com> References: <12fc5517-78d7-dfc0-f9c1-cbcdba5b7ccd@oracle.com> Message-ID: <21aaa7c0-5e85-923e-cbab-6b7af68ae913@oracle.com> On 6/2/2022 5:15 PM, Ioi Lam wrote: > Hi Mike, > > I am thrilled to hear that you're happy with CDS. Please see my > responses below. > > If you have other questions or requests for CDS, please let me know :-) > > On 6/1/2022 6:03 AM, Mike Hearn wrote: >> Hi, >> >> It feels like most of the interest in static Java comes from the >> microservices / functions-as-a-service community. My new company spent >> the last year creating a developer tool that runs on the JVM (which >> will be useful for Java developers actually, but what it does is >> irrelevant here). Internally it's a kind of build system and is thus a >> large(ish) CLI app in which startup time and throughput are what >> matter most. We also have a separate internal tool that uses Kotlin >> scripting to implement a bash-like scripting language, and which is >> sensitive in the same ways. >> >> Today the JVM is often overlooked for writing CLI apps due to startup >> time, 'lightness' and packaging issues. I figured I'd write down some >> notes based on our experiences. They cover workflow, performance, >> implementation costs and security issues. Hopefully it's helpful. >> >> 1. >> >> I really like AppCDS because: >> >> a. It can't break the app so switching it on/off a no-brainer. Unlike >> native-image/static java, no additional testing overhead is created by >> it. >> >> b. It's effective even without heap snapshotting. We see a ~40% >> speedup for executing --help >> >> c. It's pay-as-you-go. We can use a small archive that's fast to >> create to accelerate just the most latency sensitive startup paths, or >> we can use it for the whole app, but ultimately costs are >> controllable. >> >> d. Archives are deterministic. Modern client-side packaging systems >> support delta updates, and CDS plays nicely with them. GraalVM native >> images are non-deterministic so every update is going to replace the >> entire app, which isn't much fun from an update speed or bandwidth >> consumption perspective. >> >> Startup time is dominated by PicoCLI which is a common problem for >> Java CLI apps. Supposedly the slowest part is building the model of >> the CLI interface using reflection, so it's a perfect candidate for >> AppCDS heap snapshotting.? I say supposedly, because I haven't seen >> concrete evidence that this is actually where the time goes, but it >> seems like a plausible belief. There's a long standing bug filed to >> replace reflection with code generation but it's a big job and so >> nobody did it. >> >> Unfortunately the app will ship without using AppCDS. Some workflow >> issues remain. These can be solved in the app itself, but it'd be nice >> if the JVM does it. >> >> The obvious way to use CDS is to ship an archive with the app. We >> might do this as a first iteration, but longer term don't want to for >> two reasons: >> >> a. The archive can get huge. >> b. Signature verification penalties on macOS (see below). >> >> For just making --help and similar short commands faster size isn't so >> bad (~6-10mb for us), but if it's used for a whole execution the >> archive size for a standard run is nearly the same as total bytecode >> size of the app. As more stuff gets cached this will get worse. >> Download size might not matter much for this particular app, but as a >> general principle it does. So a nice improvement would be to generate >> it client side. >> >> CDS files are caches and different platforms have different >> conventions for where those go. The JVM doesn't know about those >> conventions but our app does, so we'd need our custom native code >> launcher (which exists anyway for other reasons) to set the right >> paths for CDS. >> >> Then you have to pick the right flags depending on whether the CDS >> file exists or not. I follow CDS related changes and believe this is >> fixed in latest Java versions but maybe (?) not released yet. > > Which version of Java are you using? > > Since JDK 11, the default value of -Xshare is set to -Xshare:auto, so > you can always do this: > > $ java -XX:SharedArchiveFile=nosuch.jsa -version > java version "11" 2018-09-25 > Java(TM) SE Runtime Environment 18.9 (build 11+28) > Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11+28, mixed mode) > > If the file exists, it will be used automatically. Otherwise the VM > will silently ignore the archive. > > Since JDK 17, a default CDS archive is shipped with the JDK. So you > will at least get some performance benefits of CDS for the built-in > classes. > > With the upcoming JDK 19, we have implemented a new feature (See > JDK-8261455) to automatically create the CDS archive. Here's an > example (I am using Javac because it's convenient, but you need to > quote the JVM parameters with -J): > > $ javac -J-XX:+AutoCreateSharedArchive > -J-XX:SharedArchiveFile=javac.jsa HelloWorld.java > > javac.jsa will be automatically created if it doesn't exist, or if > it's not compatible with the JVM (e.g., if you have upgraded to a > newer JDK). > > In this case, the total elapsed time is improved from about 522ms > (with default CDS archive) to 330ms (auto-generated archive). > > >> Even once that's fixed it's not quite obvious that we'd use it. The >> JVM runs much slower when dumping a dynamic CDS archive and the first >> run is when first impressions are made. Whilst for cloud stuff this is >> a matter of (artificially?) expensive resources, for CLI apps it's >> about more subjective things like feeling snappy. One idea is to delay >> dumping a CDS archive until after the first run is exiting, so it >> doesn't get in the way. The first run wouldn't benefit from the >> archive which is a pity (except on Linux where the package managers >> make it easy to run code post-install), but it at least wouldn't be >> slowed down by creating it either. The native launcher can schedule >> this. Alternatively there could be a brief pause on first run when the >> user is told explicitly that the app is optimizing itself, but how >> feasible that is depends very much on dump speed. Finally we could >> ship a small archive that only covers startup, and then in parallel >> make a dump of a full run in the background. > > The dynamic CDS dumping happens when the JVM exits. We could ... (just > throwing half-baked ideas) spawn a new daemon subprocess to do the > dumping, while the main JVM process exits. So to the user there's no > penalty. > > >> Speaking of which, there's a need for some protocol to drive an app >> through a representative 'trial run'. Whether it's generating the >> class list or the archive itself, it could be as simple as an >> alternative static method that sits next to main. One thing you *could* do with JDK 19 on Linux is: java -XX:+AutoCreateSharedArchive -XX:SharedArchiveFile=app.jsa -jar MyApp In you main method, check the /proc/self/maps file to see if app.jsa is mapped. If not, the VM is dumping the dynamic CDS archive. In this case, your app can run in a special "trial run" mode that exercises different functionalities. To make this easier to use, we could add a special system property, something like "jdk.cds.is.dumping", that can be queried by the application. Thanks - Ioi >> If it were to be >> standardized the rest of the infrastructure becomes more re-usable, >> for instance build systems can take care of generating classlists, or >> the end-user packaging can take care of dynamic dumping. > > Maybe we could have some sort of daemon that collects profiling data > in the background, and update the archives when the application > behavior is more understood. > >> CDS has two modes and it's not clear which is better. I'm unusually >> obsessive about this stuff to the extent of reading the CDS source >> code, but despite that I have absolutely no idea if I should be trying >> to use static or dynamic archives. There used to be a performance >> difference between them but maybe it's fixed now? There's a lack of >> end-to-end guidance on how to exploit this feature best. > > I agree our documentation is kind of lacking. We'll try to improve it. > > Static and dynamic archives will be roughly the same speed (~10 ms > faster with static dump for the javac example above). > > The dynamic archive will be smaller, because it doesn't need to > duplicate the built-in classes that are already in the static archive. > Here's a size comparison for javac.jsa > > static:? 20,217,856 bytes > dynamic: 10,153,984 bytes > > >> The ideal would obviously be losing the dump/exec split and make >> dynamic dumping continuous, incremental and imposing no performance >> penalty. Then we could just supply a path to where the CDS file should >> go and things magically warm up across executions. I have no idea how >> feasible that is. >> >> Once AppCDS archives are in place and being created at the right >> times, a @Snapshotted annotation for fields (or similar) should be an >> easy win to eliminate the bulk of the rest of the PicoCLI time. >> Dynamically loaded heaps would also be useful to eliminate the >> overhead of loading configs and instantiating the (build) task graph >> without a Gradle-style daemon. >> >> 2. >> >> AppCDS archives can open a subtle security issue when distributing >> code to desktop platforms. Because they're full of vtables anyone who >> can write to them can (we assume) take over any JVM that loads the >> archive and gain whatever privileges have been granted to that app. >> The archive file is fully trusted. > > Will you have a similar problem if the JAR file of the application is > maliciously modified? > > Actually the vtables inside the CDS archive file contain all zeros, > and are filled in by the VM after the archive is mapped. > > What could be modified is the vtptr of archived MetaData objects. They > usually point to somewhere near 0x800000000 (where the vtables are) > but the attacker could modify them to point to arbitrary locations. I > am not sure if this type of attack is easier than modifying the JAR > files, or not. > > Thanks > - Ioi > > >> >> On Windows and Linux this doesn't matter. On Linux sensitive files can >> be packaged or created in postinst scripts. On Windows either an app >> comes with a legacy installer/MSI file and thus doesn't have any >> recognized package identity that can be granted extra permissions, or >> it uses the current gen MSIX system. In the latter case Windows has a >> notion of app identity and so you can request permissions to access >> e.g. keychain entries, the user's calendar etc, but in that case >> Windows also gives you a private directory that's protected from other >> apps where sensitive files can be stashed. AppCDS archives can go >> there and we're done. >> >> MacOS is a problem child. There are two situations that matter. >> >> In the first case archives are shipped as data files with the app. >> Security is not an issue here, but there's a subtle performance >> footgun. On most platforms signatures of files shipped with an app are >> checked at install time but on macOS they aren't. Thanks to its NeXT >> roots it doesn't really have an installation concept, and thus the >> kernel checks signatures of files on first use then caches the >> signature check in the kernel vnode. By default the entire file is >> hashed in order to link it back to the root signature, which for large >> files can impose a small but noticeable delay before the app can open >> them. This first run penalty is unfortunate given that AppCDS exists >> partly to improve startup time. You can argue it doesn't matter much >> due to the caching, but it's worth being aware of - very large AppCDS >> archives would get fully paged in and hashed before the app even gets >> to do anything. In turn that means people might enable AppCDS with a >> big classlist expecting it to speed things up, not noticing that for >> Mac users only it slowed things down instead. There are ways to fix >> this using supported Apple APIs. One is to supply a CodeDirectory >> structure stored in extended attributes: you should get incremental >> hashing and normal page fault behaviour (untested!). Another is to >> wrap the data in a Mach-O file. >> >> In the second case the CDS archive is being generated client side. Mac >> apps don't have anywhere they can create tamperproof data, except for >> very small amounts in the keychain. Thus if a Mac app opens a >> malicious cache file that can take control of it that's a security >> bug, because it'd allow one program to grab any special privileges the >> user granted to another. The fact that the grabbing program has passed >> GateKeeper and notarization doesn't necessarily matter (Apple's >> guidance on this is unclear, but it seems plausible that this is their >> stance). In this case the key chain can be used as a root of trust by >> storing a hash of the CDS archive in it and checking that after >> mmap/before use. Alternatively, again, Apple provides an API that lets >> you associate an on-disk (xattr) CodeDirectory structure with a file >> which will then be checked incrementally at page fault time. Extreme >> care must be taken to avoid race conditions, but in theory, a >> CodeDirectory structure can be computed at dump time, written to disk >> as an xattr, and then stored again in the key chain (e.g. by >> pretending it's a "key" or "password"). After the security API is >> instructed to associate a CD with the file, it can be checked against >> the tamperproofed version stored in the key chain and if they match, >> the archive can then be mmapped and used as normal. >> >> Native images don't have these issues because the state snapshot is >> stored inside the Mach-O file and thus gets covered by the normal >> mechanisms. However once it adds support for persisted heaps, the same >> issue may arise. >> >> Whether it's worth doing the extra work to solve this is unclear. Macs >> are guaranteed to come with very fast NVMe disks and CPUs. Still, it's >> worth being aware of the issue. >> >> 3. >> >> Why not just use a native image then? Maybe we'll do that because the >> performance wins are really compelling, but again, v1 will ship >> without this for the following reasons: >> >> a. Static minification can break things. Our integration tests >> currently invoke the entry point of the app directly, but that could >> be fixed to run the tool in an external process. For unit tests the >> situation is far murkier. It's a bit unclear how to run JUnit tests >> against the statically compiled version and it may not even make sense >> (because the tests would pin a bunch of code that might get stripped >> in the real app so what are you really testing?). >> >> b. It'd break delta updates. Not the end of the world, but a factor. >> >> c. I have no idea if we're using any libraries that spin bytecode >> dynamically. Even if we're not today, what if tomorrow we want to use >> such a library? Do we have to avoid using it and increase the cost of >> feature development, or roll back the native image and give our users >> a nasty performance downgrade? Neither option is attractive. Ideally >> SubstrateVM would contain a bytecode interpreter and use it when >> necessary. Lots of issues there but e.g. it'd probably be OK if it's >> not a general classloader and the code dependencies have to be known >> AOT. >> >> d. Similar to (c), fully AOT compilation can explode code and thus >> download size even though many codepaths are cold and only execute >> once. It'd be nice if a native image could include a mix of bytecode >> and AOT compiled hotspots. >> >> e. Once you're past the initial interactive stage the program is >> throughput sensitive. How much of a perf downgrade over HotSpot would >> we get, if any? With GraalVM EE we could use PGO and not lose any, but >> the ISV pricing is opaque. At any rate to answer this we have to fix >> the compatibility issues first. The prospect of improving startup time >> and then discovering we slowed down the actual builds isn't really >> appealing (though I suspect in our case AOT wouldn't really hurt >> much). >> >> f. What if we want to support in-process plugins? Maybe we can use >> Espresso, but this is a road less travelled (lack of tutorials, well >> documented examples etc). >> >> An interesting possibility is using a mix of approaches. For the bash >> competitor I mentioned earlier dynamic code loading is needed because >> the script bytecode is loaded into the host JVM, but the Kotlin >> compiler itself could theoretically be statically compiled to a JNI or >> Panama-accessible library. We tried this before and hit compatibility >> errors, but didn't make any effort to resolve them. >> >> 4. >> >> What about CRaC? It's Linux only so isn't interesting to us, given >> that most devs are on Windows/macOS. The benefits for Linux servers >> are clear though. Obvious question - can you make a snapshot on one >> machine/Linux distro, and resume them on a totally different one, or >> does it require a homogenous infrastructure? >> >> 5. >> >> A big reason AppCDS is nice is we get to keep the open world. This >> isn't only about compatibility, open worlds are just better. The most >> popular way to get software to desktop machines is Chrome and the web >> is totally open world. Apps are downloaded incrementally as the user >> navigates around, and companies exploit this fact aggressively. Large >> web sites can be far larger than would be considered practical to >> distribute to end user machines, and can easily update 50 times a day. >> Web developers have to think about latency on specific interactions, >> but they don't have to think about the size of the entire app and that >> allows them to scale up feature sets as fast as funding allows. In >> contrast the closed world mobile versions of their sites are a parade >> of horror stories in which firms have to e.g. hotpatch Dalvik to work >> around method count limits (Facebook), or in which code size issues >> nearly wrecked the entire company (Uber): >> >> https://twitter.com/StanTwinB/status/1336914412708405248 >> >> Right now code size isn't a particularly serious problem for us, but >> the ease of including open source libraries means footprint grows all >> the time. Especially for our shell scripting tool, there are tons of >> cool features that could be added but if we did all of them we'd >> probably end up with 500mb of bytecode. With an open world features >> can be downloaded on the fly as they get used and you can build a >> plugin ecosystem. >> >> The new more incremental direction of Leyden is thus welcomed and >> appreciated, because it feels like a lot of ground can be covered by >> "small" changes like upgrading AppCDS and caching compiled hotspots. >> Even if the results aren't as impressive as with native-image, the >> benefits of keeping an open world can probably make up for it, at >> least for our use cases. > From kasperni at gmail.com Fri Jun 3 08:45:21 2022 From: kasperni at gmail.com (Kasper Nielsen) Date: Fri, 3 Jun 2022 09:45:21 +0100 Subject: Experimentation with build time and runtime class initialization in qbicc In-Reply-To: References: <0EE27016-2D6A-46A8-825A-1AFF788A5C67@us.ibm.com> Message-ID: On Tue, 31 May 2022 at 16:50, Dan Heidinga wrote: > On Fri, May 27, 2022 at 7:53 AM Kasper Nielsen wrote: > > > > Hi David, > > > > Thanks for the write-up. > > > > One thing that isn't completely clear to me after reading this is why > > language > > changes () are needed? > > The model was a convenient way for us to explore a model that > put all class initialization at build time, while allowing a small set > of fields to be reinitialized at runtime. It also minimized the > changes we had to make to the core JDK classes which makes maintaining > the changes much easier given the rate of JDK updates. SubstrateVM > uses a similar approach with their Substitutions for what I assume are > similar reasons. > > Leyden will be able to update the JDK core classes directly and can > take a more direct approach to indicating in which phase a static > field should be initialized. > > > It seems to me this could be entirely > > implemented via a standard API. Using ClassValue as the main inspiration > you > > could have something like: > > > > abstract class RuntimeLocal { > > protected RuntimeLocal() { > > checkBuildTime(); > > VM.registerForRuntimeInitialization(this); > > } > > protected abstract T computeValue(); > > public final T get(); // Calls to get are optimized by the vm > > } > > > > > > Usage would be something similar to: > > > > class Usage { > > > > static final LocalDateTime BUILD_TIME = LocalDateTime.now(); > > > > static final RuntimeLocal RUNTIME_TIME = new > > RuntimeLocal<>() { > > protected LocalDateTime computeValue() { > > return LocalDateTime.now(); > > } > > }; > > } > > > > I might be missing some details, but it seems to me that this approach > would > > be strongly favorable to needing to change the language as well as adding > > new bytecodes. > > This is a good starting point. I went a fair ways looking at how to > group static fields into different classes to decouple their lifetimes > and found that I couldn't cleanly split them into two groups. I think there is an important distinction to make here between "phased class initialization" and "phased field initialization". Having used GraalVM's native image for some time. My experience is that is very hard to reason about phased class initialization. A saner model, I would argue, would be one where all classes are initialized at image build-time and never reinitialized. If a class needs laziness or reinitialization this must be done explicitly using /RuntimeLocal. If you have groups of fields that need to be initialized together this can be done by storing them in a record which can then be stored in a reinit field. In this model, you would still need to think about the usage of reinit fields. But you would never need to spend cycles on figuring out what phase a class was initialized in. But this is all something that can be discussed further down the line. > The problem is that while it's clear that some fields can be > initialized early (build time) and others must be initialized late > (runtime), there is a third group that needs to be reinitialized. I > list 3 buckets: early, late, and reinit, but that's a minimum number. > There may be more than 3. And due to the "soupy" nature of , > it's not always easy to avoid depending on a field that's in a > different bucket. And values in that 3rd bucket - the fields that > need to be reinitialized - don't have a clear meaning when their value > propagates around the program. Does it need to be cleared everywhere > and force reinit of all consumers? Lots to figure out here. > > We need a better model - whether that's library features or new > language features - that makes it easier to express when (which phase) > an operation should occur and some way to talk about the dependency > chain of that value (all the classes that have to be initialized, > values calculated, etc). > I must admit I'm a bit skeptical about something like dependency tracking. Take something like System.lineSeparator() and a platform-independent image. Is it really realistic that we track all strings that are created using this method doing build-time? But, as you said lots to figure out:) From mike at hydraulic.software Fri Jun 3 08:57:45 2022 From: mike at hydraulic.software (Mike Hearn) Date: Fri, 3 Jun 2022 10:57:45 +0200 Subject: AppCDS / AOT thoughts based on CLI app experience In-Reply-To: <12fc5517-78d7-dfc0-f9c1-cbcdba5b7ccd@oracle.com> References: <12fc5517-78d7-dfc0-f9c1-cbcdba5b7ccd@oracle.com> Message-ID: Hi Ioi, We're using a JDK 17 with a few backports. Unfortunately the default CDS archives goes missing during jlinking. It's an easy fix. Actually, the product in question is a packaging tool, it's not only for the JVM but it supports JVM apps quite well, and re-creating the CDS archive post-jlink is on the list of features to add. It's packaged with itself so that'll fix it for our apps too. > $ javac -J-XX:+AutoCreateSharedArchive -J-XX:SharedArchiveFile=javac.jsa > HelloWorld.java > > javac.jsa will be automatically created if it doesn't exist, or if it's > not compatible with the JVM (e.g., if you have upgraded to a newer JDK). Yes, that's a nice improvement in usability. By the way, don't forget -Xlog:cds=off because otherwise CDS likes to write lots of warnings to the terminal (not a great look for a CLI app). > The dynamic CDS dumping happens when the JVM exits. Yes but it seems to slow down execution before that time as well. Here's some timings for our app to parse CLI options, read the build config, compute the task graph, print the available tasks, and reach the end of main(): - With CDS off: ~0.8 seconds - With CDS dumping active: ~1.25 seconds - With CDS active: ~0.6 seconds so the app appears to run ~50% slower when dynamic dumping is active and that's not including the dump time itself. That's why I'm suggesting doing it in the background as a totally separate post-install step (with background forking required for platforms that don't support or strongly discourage install scripts). I get the impression this may not be expected? Is the JVM genuinely doing extra work at runtime when dynamic dumping is active? > Maybe we could have some sort of daemon that collects profiling data in > the background, and update the archives when the application behavior is > more understood. Sure, the ideal would be something like "always dumping" mode in which there's no slowdown. So you just give the JVM a directory (or >1 directory) and it caches internal structures, JITd code and persistent heap snapshots there. Fire and forget. Then if you want to trade off bandwidth vs first run time you can pre-populate the first directory in the list with the results of a short run, like just getting to first pixels for a desktop app or flag handling for a CLI app, and any additional data generated goes into the second directory. Bonus points if you find a way to share those directories over an NFS mount - then you have a JIT server 'for free' in cloud deployments. > The dynamic archive will be smaller, because it doesn't need to > duplicate the built-in classes that are already in the static archive. Right. That's true. I'd forgotten that you can combine them like that. So we could ship a small static archive in the download that just accelerates time-to-first-interaction, and generate a larger dump client-side in the background that covers the whole execution. > Will you have a similar problem if the JAR file of the application is > maliciously modified? If they're downloaded and stored in the home directory, yes, but, JARs support code signing with per-file hashing so there's a way to fix that built in to the platform. If they're just shipped as data files in the app then it doesn't matter because they're signed and tamperproofed using OS specific mechanisms. All this is a bit theoretical. IntelliJ downloads unsigned JARs as plugins and nobody seems to care. It's possible that's because it doesn't request any special privileges so there's nothing to attack, but in macOS things as basic as access to ~/Downloads is a permission these days. Also JetBrains are moving to code signing their JARs anyway. So ... yeah. Like I said. Hard to know how much to really care about this. It might be one of those things that doesn't matter until the day it does. > What could be modified is the vtptr of archived MetaData objects. They > usually point to somewhere near 0x800000000 (where the vtables are) but > the attacker could modify them to point to arbitrary locations. I am not > sure if this type of attack is easier than modifying the JAR files, or not. Well, the issue here is a combination of where the files are generated and performance. Again it's all a bit theoretical because the performance discussion is rooted in the "disk access is slow" world which isn't really true anymore. I've done some casual tests on my laptop and did appear to see a real slowdown from this "hash whole file on open" effect, but, it was a while ago and it wasn't rigorous at all. It's also a total PITA to reproduce because there's no explicit way to flush the cache, so you have to constantly re-copy signed binaries over and over to force kernel cache misses. If I explained how I measured this, Alexey Shipilev would yell at me :) so I'll just leave it here as food for thought instead. And yeah it's also not clear how much the uncached times matter these days. Years ago it mattered a lot because people rebooted their machines often, but Macs hibernate all the time and reboot only rarely so the caches will remain warm. I don't think treating AppCDS archives as hostile in the JVM itself would be worth it. This is a Mac specific issue and that would be a major constraint e.g. it'd mean you can't cache JITd native code in the archives. Doesn't make sense. Better to tamperproof unbundled archives in other ways, like computing ad-hoc signatures and stashing the CodeDirectory in an xattr (if it ever matters). From heidinga at redhat.com Mon Jun 6 14:36:18 2022 From: heidinga at redhat.com (Dan Heidinga) Date: Mon, 6 Jun 2022 10:36:18 -0400 Subject: Experimentation with build time and runtime class initialization in qbicc In-Reply-To: References: <0EE27016-2D6A-46A8-825A-1AFF788A5C67@us.ibm.com> Message-ID: On Tue, May 31, 2022 at 12:17 PM Brian Goetz wrote: > > I think Dan is homing in on one of the key questions, which is the nature of the third bucket (static finals that require reinitialization.) It would be useful for everyone following the discussion if we had a more complete list of situations you've encountered where this seems essential, and their notable aspects. In qbicc, the places we've had to reinitialize static fields are captured in the qbicc/qbicc-class-library repo [0] using "$_runtime" source files [1]. Many of the cases have to do with capturing the build time vs the runtime environment. The number of available CPUs is captured in several places: * j.l.Runtime : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/Runtime%24_runtime.java * j.u.c.Exchanger: https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/util/concurrent/Exchanger%24_runtime.java * j.u.c.Phaser : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/util/concurrent/Exchanger%24_runtime.java * j.u.c.a.Striped64 : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/util/concurrent/atomic/Striped64%24_runtime.java The environment variables are captured: * j.l.ProcessEnvironment : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/ProcessEnvironment%24_runtime.java The in / out / err file descriptors need to be reinitialized: * j.io.FileDescriptor : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/io/FileDescriptor%24_runtime.java Prevent threads from being created in a static initializer: * j.l.ref.Reference : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/ref/Reference%24_patch.java * Likely more cases for this we just haven't hit yet Unsafe pageSize needs to be configured at runtime. As do UnsafeConstants like ADDRESS_SIZE0: * j.i.m.Unsafe : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/jdk/internal/misc/Unsafe%24_patch.java * j.i.m.UnsafeConstants: https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/jdk/internal/misc/UnsafeConstants%24_patch.java & https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/jdk/internal/misc/UnsafeConstants%24_runtime.java Capturing the default directory: * sun.nio.fs.UnixFileSystem : https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/sun/nio/fs/UnixFileSystem%24_runtime.java We're still working through detangling the "initPhase" process in j.l.System into a build time and runtime ("rtInitPhase") version: https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/System%24_patch.java We also did some investigation of how feasible it would be to rewrite SubstrateVM's Substitutions to use the IODH pattern and I can share that info as well but it'll take a bit for me to write it up in a clear state. --Dan [0] https://github.com/qbicc/qbicc-class-library [1] https://github.com/qbicc/qbicc-class-library/search?q=%24_runtime > > As you point out, there are a host of potential "solutions"; while it is surely premature to try to propose a solution, it is never too early to come to a better understanding of the problem. > > > > On 5/31/2022 11:50 AM, Dan Heidinga wrote: > > On Fri, May 27, 2022 at 7:53 AM Kasper Nielsen wrote: > > Hi David, > > Thanks for the write-up. > > One thing that isn't completely clear to me after reading this is why > language > changes () are needed? > > The model was a convenient way for us to explore a model that > put all class initialization at build time, while allowing a small set > of fields to be reinitialized at runtime. It also minimized the > changes we had to make to the core JDK classes which makes maintaining > the changes much easier given the rate of JDK updates. SubstrateVM > uses a similar approach with their Substitutions for what I assume are > similar reasons. > > Leyden will be able to update the JDK core classes directly and can > take a more direct approach to indicating in which phase a static > field should be initialized. > > It seems to me this could be entirely > implemented via a standard API. Using ClassValue as the main inspiration you > could have something like: > > abstract class RuntimeLocal { > protected RuntimeLocal() { > checkBuildTime(); > VM.registerForRuntimeInitialization(this); > } > protected abstract T computeValue(); > public final T get(); // Calls to get are optimized by the vm > } > > > Usage would be something similar to: > > class Usage { > > static final LocalDateTime BUILD_TIME = LocalDateTime.now(); > > static final RuntimeLocal RUNTIME_TIME = new > RuntimeLocal<>() { > protected LocalDateTime computeValue() { > return LocalDateTime.now(); > } > }; > } > > I might be missing some details, but it seems to me that this approach would > be strongly favorable to needing to change the language as well as adding > new bytecodes. > > This is a good starting point. I went a fair ways looking at how to > group static fields into different classes to decouple their lifetimes > and found that I couldn't cleanly split them into two groups. I used > the Initialization on demand holder pattern (IODH) rather than your > RuntimeLocal but the idea is very similar. > > The problem is that while it's clear that some fields can be > initialized early (build time) and others must be initialized late > (runtime), there is a third group that needs to be reinitialized. I > list 3 buckets: early, late, and reinit, but that's a minimum number. > There may be more than 3. And due to the "soupy" nature of , > it's not always easy to avoid depending on a field that's in a > different bucket. And values in that 3rd bucket - the fields that > need to be reinitialized - don't have a clear meaning when their value > propagates around the program. Does it need to be cleared everywhere > and force reinit of all consumers? Lots to figure out here. > > We need a better model - whether that's library features or new > language features - that makes it easier to express when (which phase) > an operation should occur and some way to talk about the dependency > chain of that value (all the classes that have to be initialized, > values calculated, etc). > > --Dan > > /Kasper > > On Thu, 26 May 2022 at 21:22, David P Grove wrote: > > Hi, > I?ve appended the contents of the referenced wiki page in this email. > Apologies in advance if the formatting doesn?t come through as intended. > > There is a full implementation of this (GPLv2 + Classpath > exception) as part of the qbicc project on GitHub. There is also a GitHub > discussion in the qbicc project that links to various GitHub issues that > capture the history that led to the current design. I will not hyperlink > to those here so that if people have any IP concerns, they can avoid seeing > them. They are easily findable. > > Regards, > > --dave > > > From brian.goetz at oracle.com Mon Jun 6 17:45:10 2022 From: brian.goetz at oracle.com (Brian Goetz) Date: Mon, 6 Jun 2022 17:45:10 +0000 Subject: Experimentation with build time and runtime class initialization in qbicc In-Reply-To: References: <0EE27016-2D6A-46A8-825A-1AFF788A5C67@us.ibm.com> Message-ID: <0387C49D-8761-464D-A494-88529EFF9433@oracle.com> Thanks, Dan, for the detailed information. The other investigation also seems interesting, so I hope some day you?ll find the time to write it up. There?s lots to unpack here, but I want to focus on a specific aspect, related to the issue of ?stale? or ?aliased? compile-time values that I raised in my earlier mail. Taking the specific example of caching Runtime.availableProcessors(), let?s ask: WHY are these classes caching R.aP() in a static? There are two possible cases: - Pure caching. Here, the author has made a choice (right or wrong) that calling R.aP() repeatedly will be too expensive, and so caches the value in a static for later use for, say, allocating arena arrays in the constructor of Striped64 or Exchanger ? but the instances created in the early phase are still valid in the later phase, and compatible with instances created in the later phase. - Enforcement of invariant. Here, the author has captured the fact that they require the value to be stable, because (say) they?re going to create multiple arrays and expect them all to be of the same length. Here, early-phase and later-phase instances could not compatibly coexist. In the first case, reinitializing the cached field at phase change points may be harmless; it?s essentially equivalent to replacing reads of fields with repeated evaluation of the initializer (assuming the initialization is pure); in the second, the runtime has broken an invariant the author had reason to believe is valid. Without diving into solutions at this point, we can?t escape the following observations: - This is what happens when you try to reinterpret old code with new semantics; code that had every reason to work properly when it was written, becomes retroactively broken when the runtime reinterprets old cold in a new way. New semantics require permission from the user. - If there are N separate desirable (but incompatible) outcomes, such as the two cases cited above, their code has to be different from each other. Right now, we can?t tell the difference between these cases. If, as in the ?its an invariant? case, it would be unacceptable for the value to change (i.e., when the user said ?static final?, they were serious), the one of the following has to happen: - We must be prepared to keep the earlier-phase result in later phases, even if the underlying quantity has changed; - We must defer evaluation until the later phase (potentially deferring all dependent early evaluations); - We fail at early-eval time if someone attempts to evaluate the must-be-stable quantity in the early phase, and let the programmer sort it out. In fact, to the extent we want early evaluation, I suspect that we may want to be able to express *all three* of these in the programming model. > On Jun 6, 2022, at 10:36 AM, Dan Heidinga wrote: > > On Tue, May 31, 2022 at 12:17 PM Brian Goetz wrote: >> >> I think Dan is homing in on one of the key questions, which is the nature of the third bucket (static finals that require reinitialization.) It would be useful for everyone following the discussion if we had a more complete list of situations you've encountered where this seems essential, and their notable aspects. > > In qbicc, the places we've had to reinitialize static fields are > captured in the qbicc/qbicc-class-library repo [0] using "$_runtime" > source files [1]. Many of the cases have to do with capturing the > build time vs the runtime environment. > > The number of available CPUs is captured in several places: > * j.l.Runtime : > https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/Runtime*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWZcddsUo$ > * j.u.c.Exchanger: > https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/util/concurrent/Exchanger*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWgRvBKQc$ > * j.u.c.Phaser : > https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/util/concurrent/Exchanger*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWgRvBKQc$ > * j.u.c.a.Striped64 : > https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/util/concurrent/atomic/Striped64*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWLhwhUqM$ > > The environment variables are captured: > * j.l.ProcessEnvironment : > https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/ProcessEnvironment*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWD8e1uHk$ > > The in / out / err file descriptors need to be reinitialized: > * j.io.FileDescriptor : > https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/io/FileDescriptor*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWoWz91ck$ > > Prevent threads from being created in a static initializer: > * j.l.ref.Reference : > https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/ref/Reference*24_patch.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWDR1ZEl4$ > * Likely more cases for this we just haven't hit yet > > Unsafe pageSize needs to be configured at runtime. As do > UnsafeConstants like ADDRESS_SIZE0: > * j.i.m.Unsafe : > https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/jdk/internal/misc/Unsafe*24_patch.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskW0SpuyLU$ > * j.i.m.UnsafeConstants: > https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/jdk/internal/misc/UnsafeConstants*24_patch.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskW37nD06M$ > & https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/jdk/internal/misc/UnsafeConstants*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWUc6nR8s$ > > Capturing the default directory: > * sun.nio.fs.UnixFileSystem : > https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/sun/nio/fs/UnixFileSystem*24_runtime.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWplHr18o$ > > We're still working through detangling the "initPhase" process in > j.l.System into a build time and runtime ("rtInitPhase") version: > https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/blob/17.x/java.base/src/main/java/java/lang/System*24_patch.java__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWmsN5BXk$ > > We also did some investigation of how feasible it would be to rewrite > SubstrateVM's Substitutions to use the IODH pattern and I can share > that info as well but it'll take a bit for me to write it up in a > clear state. > > --Dan > > [0] https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library__;!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWo-EkTjg$ > [1] https://urldefense.com/v3/__https://github.com/qbicc/qbicc-class-library/search?q=*24_runtime__;JQ!!ACWV5N9M2RV99hQ!KgOr2DVo5L8QdXtEcuC-663xn5kbfHfTsNu4t27jI-AfKCyfQi5GqoKLWA8ImxNCaVeZMOWekskWJRfnDJs$ > >> >> As you point out, there are a host of potential "solutions"; while it is surely premature to try to propose a solution, it is never too early to come to a better understanding of the problem. >> >> >> >> On 5/31/2022 11:50 AM, Dan Heidinga wrote: >> >> On Fri, May 27, 2022 at 7:53 AM Kasper Nielsen wrote: >> >> Hi David, >> >> Thanks for the write-up. >> >> One thing that isn't completely clear to me after reading this is why >> language >> changes () are needed? >> >> The model was a convenient way for us to explore a model that >> put all class initialization at build time, while allowing a small set >> of fields to be reinitialized at runtime. It also minimized the >> changes we had to make to the core JDK classes which makes maintaining >> the changes much easier given the rate of JDK updates. SubstrateVM >> uses a similar approach with their Substitutions for what I assume are >> similar reasons. >> >> Leyden will be able to update the JDK core classes directly and can >> take a more direct approach to indicating in which phase a static >> field should be initialized. >> >> It seems to me this could be entirely >> implemented via a standard API. Using ClassValue as the main inspiration you >> could have something like: >> >> abstract class RuntimeLocal { >> protected RuntimeLocal() { >> checkBuildTime(); >> VM.registerForRuntimeInitialization(this); >> } >> protected abstract T computeValue(); >> public final T get(); // Calls to get are optimized by the vm >> } >> >> >> Usage would be something similar to: >> >> class Usage { >> >> static final LocalDateTime BUILD_TIME = LocalDateTime.now(); >> >> static final RuntimeLocal RUNTIME_TIME = new >> RuntimeLocal<>() { >> protected LocalDateTime computeValue() { >> return LocalDateTime.now(); >> } >> }; >> } >> >> I might be missing some details, but it seems to me that this approach would >> be strongly favorable to needing to change the language as well as adding >> new bytecodes. >> >> This is a good starting point. I went a fair ways looking at how to >> group static fields into different classes to decouple their lifetimes >> and found that I couldn't cleanly split them into two groups. I used >> the Initialization on demand holder pattern (IODH) rather than your >> RuntimeLocal but the idea is very similar. >> >> The problem is that while it's clear that some fields can be >> initialized early (build time) and others must be initialized late >> (runtime), there is a third group that needs to be reinitialized. I >> list 3 buckets: early, late, and reinit, but that's a minimum number. >> There may be more than 3. And due to the "soupy" nature of , >> it's not always easy to avoid depending on a field that's in a >> different bucket. And values in that 3rd bucket - the fields that >> need to be reinitialized - don't have a clear meaning when their value >> propagates around the program. Does it need to be cleared everywhere >> and force reinit of all consumers? Lots to figure out here. >> >> We need a better model - whether that's library features or new >> language features - that makes it easier to express when (which phase) >> an operation should occur and some way to talk about the dependency >> chain of that value (all the classes that have to be initialized, >> values calculated, etc). >> >> --Dan >> >> /Kasper >> >> On Thu, 26 May 2022 at 21:22, David P Grove wrote: >> >> Hi, >> I?ve appended the contents of the referenced wiki page in this email. >> Apologies in advance if the formatting doesn?t come through as intended. >> >> There is a full implementation of this (GPLv2 + Classpath >> exception) as part of the qbicc project on GitHub. There is also a GitHub >> discussion in the qbicc project that links to various GitHub issues that >> capture the history that led to the current design. I will not hyperlink >> to those here so that if people have any IP concerns, they can avoid seeing >> them. They are easily findable. >> >> Regards, >> >> --dave >> >> >> >