Call for Discussion: New Project: Leyden

Mon Jun 22 20:26:42 UTC 2020

I think at this point, it's too early for the "what to do, how to do it" 
type of discussions. Let's focus on the fundamental issues first and 
consider the possible programming models.

One fundamental issue with the current CDS archived heap implementation, 
which Jiangli's proposal is based on, is its inability to handle changes 
in program execution order:

https://bugs.openjdk.java.net/browse/JDK-8248046

The fundamental problem is that when running with CDS, we start with a 
empty VM with no loaded classes. Classes are incrementally loaded and 
initialized only when they are referenced. When each class is 
initialized, we will try to load its static fields from the CDS archived 
heap.

However, the values of the static fields can vary depending on program 
execution order:

    class A {static int x = 0:}
    class B {static int y = A.x; }
    class Test {
        public static void main(String args) {
            System.out.println("A.x = " + A.x);
            if (args.length > 0) {
                A.x ++;
            }
            System.out.println("B.y = " + B.y);
        }
    }

    # Output (without CDS heap archiving)
    $ java -cp . Test
    A.x = 0;
    B.y = 0;
    $ java -cp . Test foo
    A.x = 0;
    B.y = 1;

The problem is that we can only save the results of ONE particular 
execution order. So if we save B.y into the CDS archive, no matter what 
value is stored, at least one of the two above program invocations will 
produce an unexpected result.

=====

I think Graal/NativeImage provides a better/simpler model -- a selected 
set of class initializers are executed during the "image build" stage. 
At the end of this stage, the entire VM's state is saved into a snapshot.

At program "run time", the VM resumes at the snapshot. All classes that 
are already initialized during "image build" cannot be initialized 
again, so there's no way for the program to force a different execution 
order that would disagree with the values stored the snapshot.

Conceptually, the program's entry point is rewritten as:

        public static void main(String args) {
            init(A.class); // @synthetic
            init(B.class); // @synthetic
            .... original bytecodes
        }

Now, running the app in graal with produce these results:

    $ java -cp . Test
    A.x = 0;
    B.y = 0;
    $ java -cp . Test foo
    A.x = 0;
    B.y = 0;

It will still be "unexpected" by programs that don't know anything about 
graal (or snapshots in general). However, I think at least we have a 
programming model that's more understandable than what we have in CDS today.

Also, today CDS performs a lot of checks to ensure "things are happening 
exactly the same way as during archive dump time". If we switch to the 
snapshot model, these checks can be eliminated and start-up will be more 
efficient.

======

I think we have several options:

[1] Introduce a snapshot model into the Java Language Spec and clearly 
define the behavior

-- or, in general, some sort of "early binding" that combines multiple 
classes into a single unit, and restrict what can happen with this 
single unit. I.e., "a jlink that actually links classes ....".

[2] Introduce a new construct for order-independent initialization.

For [2], maybe we can *finally* have a *const* keyword :-)

    class A {static const int x = 0:}
    class B {static const int y = A.x; }

This way we can safely store B.y == 1 without worrying that its value 
may change. E.g.,

    class A {static int x = 0:}
    class B {static const int y = A.x; }

    Error: const B.y depends on non-const value A.x

    ---

    class A {static const int x = 0:}
    class B {static const int y = A.x ++; }

    Error: const A.x cannot be modified

I'd say [1] is easier for up-take, but more messy. It can also be more 
error-prone when bolted on top of existing code. [2] is cleaner, but 
probably will have slow uptake and narrower use cases.

Thanks
- Ioi

On 6/8/20 4:59 PM, Jiangli Zhou wrote:
> Hi Esteban,
>
> Thank you for the feedback! Very happy to see the involvement from you
> and others from GraalVM (based on Mark's CFV email)!
>
> Freeing developers from having to do any manual configuration (or
> having to write any scripts to semi-automate the configuration) in
> order to use the image during deployment would be very important. The
> adoption of a static image solution will likely be higher when that
> barrier is gone. Developers may be more willing to do source level
> change in the following case:
>
> - anticipated benefit is clear
> - easy to use/no manual configuration
>
> Would it be possible to do an incremental and divide & conquer
> approach? JDK library, app library, and application developers/experts
> may add the 'marker' in their own domain, and the savings would be
> higher with more Java code adopting this new feature.
>
> Another possible approach is using the 'marker' solution together with
> static analysis (brought up by another member during an internal
> discussion). It may be possible to provide two different usage modes,
> power mode and marker-only mode. If static analysis can sufficiently
> tell us which are the good candidates for pre-initialization at build
> time, we can pre-initialize more classes without relying on the
> 'marker'. That would be a 'power' mode. Static analysis does cost CPU
> cycles at build time and may require more memory. In many cases,
> developers may opt-in for the 'marker-only' mode to avoid higher
> CPU/memory cost at build time, when sufficient Java sources are marked
> for class pre-initialization...
>
> Looking forward to seeing more inputs/advice.
>
> Best,
>
> Jiangli
>
>
> On Mon, Jun 8, 2020 at 10:00 AM Esteban Ginez <esteban.ginez at oracle.com> wrote:
>> Hi Jiangli
>>
>> I took some time to review some of the docs you sent about class initialization, and I would like to highlight one key thing we have been observing in the Graalvm team.
>>
>> The usage of annotations to mark classes as ’pre-initializable’ is insufficient.
>> The  java community relies on well known, time tested,  java libraries. It is often the case that the application owner relies on dependencies he does not even know (or care to know) about . Restricting the capacity to pre-initialize an application via annotations, at minimum, will leave behind code that can not be annotated due to lack of ownership. Further, as we have been experiencing in the GraalVM team, it has been challenge for lower level libraries, to a-priory decide the set of code that can be pre-initialized, without taking into consideration the particular usage of the library.
>>
>> That being said, it is great to see other teams in the community working on similar problems, so thanks a lot of sharing your work
>> Warmly
>> E.
>>
>> On May 21, 2020, at 5:23 PM, Jiangli Zhou <jianglizhou at google.com> wrote:
>>
>> On Thu, May 21, 2020 at 3:48 PM <mark.reinhold at oracle.com> wrote:
>>
>> 2020/5/18 15:43:34 -0700, jianglizhou at google.com:
>>
>> Didn't see this discussion until today. I think this will be welcomed by
>> Java developers! How do we participate in the project and coordinate?
>>
>>
>> I’ll shortly start the CFV to create the Project.  Once the Project
>> exists, your participation will be most welcome!
>>
>>
>> Sounds great!
>>
>>
>>
>> I have been experimenting with preserving pre-initialized classes (at
>>
>> build
>>
>> time) as part of the CDS archive image. Class pre-initialization can help
>> field and method pre-resolution and generate better AOT code. Here is a
>>
>> class
>>
>> pre-resolution & pre-initialization proposal
>> <
>>
>> https://docs.google.com/document/d/17RV8W6iqJj9HhjGSO9k1IXfs_9lvHDNgxxr8pFoeDys/edit?usp=sharing
>>
>>
>> that enhances
>> the existing Hotspot heap archiving and provides more general support to
>> pre-initialize classes and preserve static field values for both JDK
>> classes and application classes (loaded by system class loader).
>>
>>
>> Sorry, but for IP clarity could you please either post that document to
>> cr.openjdk.java.net or enter it into a JBS issue?  (Likewise for the
>> slides that you mention later on.)
>>
>>
>> I've posted the docs to cr.openjdk.java.net:
>>
>> Design doc:
>> http://cr.openjdk.java.net/~jiangli/Leyden/Java%20Class%20Pre-resolution%20and%20Pre-initialization%20(OpenJDK).pdf
>> Slides:
>> http://cr.openjdk.java.net/~jiangli/Leyden/Selectively%20Pre-initializing%20and%20Preserving%20Java%20Classes%20(OpenJDK).pdf
>>
>> Will also attach them to JBS as suggested.
>>
>>
>>                                                                  I
>>
>> adopted
>>
>> the annotation approach in the proposal, after comparing it to other
>> alternatives, please see details in the design doc.
>>
>>
>> In a prototype, using an annotation to identify pre-initializable
>> classes is fine.  In the long run, however, we’ll likely wind up with a
>> keyword since annotations must not be used to change language semantics.
>>
>>
>> That sounds workable. The underlying heap object archiving mechanism can
>> support different approaches (list, interface, annotation, keyword, etc) as
>> an 'indicator' for pre-initializing & preserving classes and static fields.
>>
>> Really excited about the project!
>>
>> Best regards,
>> Jiangli
>>
>>
>> - Mark
>>
>>