Call for Discussion: New Project: Leyden

Thu Jun 4 18:03:09 UTC 2020

On 6/4/20 2:17 AM, Andrew Dinn wrote:
> I would like to throw in a general point here based on my experience of
> working with Graal and the SubstrateVM for some time, perhaps at the
> risk of hijacking this current direction of travel (for which, apologies).
>
> In my experience the notion of a 'static' image which provides the sort
> of footprint and startup improvements that Graal has shown to be
> possible while still retaining the dynamic load/link capabilities that
> the JVM provides (i.e. discarding the closed world assumption) appears
> to reside in the same ontological category as the King of France or the
> unicorn horn. Indeed, I suspect it may even be a square, round cupola.

My hypothesis is: you can have fast start-up speed,  small image size, 
and correctness -- just pick any two.

Sure, for trivial programs that can be fully analyzed, you can get all 3 
of them. However, once you're using (legacy) libraries that use 
reflection, it becomes extremely difficult.

> Graal derives almost all of its space/time reductions relative to the
> dynamic JVM from being able to discard great swathes of class metadata
> and class initialization. A closed world assumption is critical to the
> validity and comprehensive applicability of both those dietary regimes.
>
> Under that assumption almost all of the details of the loaded class base
> are simply not needed at runtime. All decisions which depend on what
> classes are present and how they are structured are baked in at compile
> time. A great deal of initialization code and resulting static data can
> be compressed out of VM startup by performing the necessary init at
> build-time and embedding only the (runtime-) referenced static field
> values and objects in a pre-populated heap, some of that data read-only,
> some of it read-write. n.b. that's just as much an option for app data
> as it is for JDK data. With a dynamic runtime it is much harder to
> determine 1) whether or not a static field might be referenced and
> (worse) 2) whether it's value might get computed differently from the
> value derivable at build-time, due to intermediate loading of dynamic code.
>
> The Graal 'static' image compiler also contributes significantly to
> improvement of image code size and speed when compared to the OpenJDK
> AOT compiler (also, at present, the Graal compiler in a very different
> configuration). The former can almost always employ direct linkage, the
> latter has to use copious amounts of indirection to allow for later
> update to the class base.
>
> The Graal 'static' image compiler obviously cannot rely on runtime
> feedback to improve code performance but neither can an OpenJDK AOT
> compiler when generating the initial AOT code. However, the former
> compiler can profit from the closed world assumption to perform a few
> AOT optimizations that can only be performed speculatively on OpenJDK if
> dynamic loading is to be allowed. Not all such speculative opportunities
> are currently taken, mainly because not all application classes are
> normally included in the compiled AppCDS suite.
>
> Of course, if OpenJDK AOT implemented a more comprehensive reachability
> analysis then it might be able to embrace many of the same speculative
> optimization opportunities over a more full, known application class
> suite. However, it would still need to employ more metadata and a more
> indirect linkage model in order to allow for dynamic loading.

We have one problem in AppCDS and AOT that costs us dearly in speed and 
size -- we have lots of checks for class shape changes. That's because a 
class named "Foo" can have completely different bytecodes and class 
hierarchy a run time, so we must invalidate the caches when that happens.

If we can have a way to guarantee that the classes will not change, we 
can optimize much better. For example, we can run with a special flag 
that will cause

      ClassLoader.defineClass("Foo", .....)

to throw an UnsupportedOperationException

Or, we start the VM in such a way that a known version of Foo is already 
"loaded" before any bytecode is executed, so there's no way for the app 
to plug in a different Foo.

Such restrictions can help with image size as well -- if Foo is rarely 
used, we can actually exclude it from the CDS image, and decode it from 
compressed classfile only when needed.

Thanks
- Ioi

>
> On 04/06/2020 07:10, Ioi Lam wrote:
>>
>> On 6/3/20 7:22 AM, Mike Hearn wrote:
>>> Thanks.
>>>
>>> I didn't quite follow the question about seconds or minutes. When
>>> iterating on code that's easily unit tested the JVM manages to provide
>>> a turnaround time of a few seconds, at least it does for me when
>>> incremental compilation is available. It gets more painful with app
>>> servers of course, hence the popularity of JRebel.
>>>
>>> Yes partitioning the app sounds like a great balance. It's the only
>>> way I can see for AppCDS/AOT type optimisations to be applicable
>>> during development, where otherwise JITC will continue to dominate
>>> despite that developers probably start their app more than anyone
>>> else! Usually only a tiny part of the app is changing and all the
>>> libraries are static, so build systems could figure out a semi-static
>>> set of modules pretty easily. Just assume anything pulled from a
>>> repository is static and eligible for optimisations.
>>>
>>> Heap archiving can be seen as a serialisation scheme, just one with an
>>> unusually unstable data format. I wonder if it helps to look at it
>>> like that - perhaps a lot of the logic can be moved into Java? For
>>> instance, do the AppCDS archives /have/ to be generated by C++ looking
>>> for magic annotations/iterating the heap, or could some privileged
>>> Java code just walk the graph, doing whatever checks are considered
>>> useful and building a simple Object[] which is then passed into the VM
>>> for the GC to format a heap region and return it as a byte buffer?
>>> Then build system authors, app containers etc can take over management
>>> of AppCDS files, like deciding when to [re]generate them and for which
>>> modules it's worth doing.
>> The current way for AppCDS for archiving heap objects is certainly too
>> restricted. I think we'll probably allow a subset of classes to be
>> initialized ahead of time, and the static fields of those classes will
>> be archived into the heap. There are challenges -- picking the
>> correct/optimal subset many require substantial work; we also need to
>> make sure we don't archive the wrong things (ephemeral resources,
>> environment dependencies, etc).
>>
>> Thanks
>> - Ioi
>>
>>> I've tried using the DCEVM in the past, which has better edit and
>>> continue. It sounds good but I found that I very rarely edited /only/
>>> post-initialisation code and so editing classes in place just ended up
>>> being kind of useless, as there was no way to force re-construction
>>> and no obvious semantics for how it could work. Especially problematic
>>> for GUI apps, which is the kind of app where you really benefit from
>>> fast iteration times but most of the iteration is in the code that
>>> constructs the GUI.
>>>
>>> I looked a bit at how to resolve that. I toyed a bit with adding a
>>> notification queue to classloaders, that lets app code learn when a
>>> class has been redefined, and some APIs on top that would
>>> automatically track dependencies and trigger re-construction of object
>>> graphs at various user-defined points. Because DCEVM uses a rather
>>> large and complex patch I ended up looking more into whether it'd be
>>> possible to do this with just classloaders, perhaps in combination
>>> with a much smaller JVM patch, as if you're rebuilding object graphs
>>> anyway then most of the magic class redefinition does isn't actually
>>> useful.
>>>
>>>