Preload attribute

Dan Heidinga heidinga at redhat.com
Thu Jun 15 13:31:34 UTC 2023


Following on from our discussion during the EG meeting, I think our current
error handling approach (drop the errors) unfortunately violates the spec.

Here's the example:

===== CPReuse.java

public class CPReuse {


  public static void main(String[] args) {

    V.callme();

  }


  public static void forcePreload(V v) { }

}

===== V.java

value class V {

  final int i;


  public V() {  i = 5; };


  public static void callme() { }

}

======================



which generates a classfile with:


Constant pool:

   ...

   #7 = Methodref          #8.#9          // V.callme:()V

   #8 = Class              #10            // V


  public static void main(java.lang.String[]);

    ...

    Code:

      stack=0, locals=1, args_size=1

         0: invokestatic  #7                  // Method V.callme:()V

Classes to be preloaded:

  #8;                                     // value class V


The important thing to note here is that CONSTANT_Class #8 is used by both
the preload attribute and the methodref for the invokestatic.


Why is this a problem?


JVMS 5.4 Linking says:

Linking also involves *resolution of symbolic references in the class or
interface*, though not necessarily at the same time as the class or
interface is verified and prepared.


This specification allows an implementation flexibility as to when linking
activities (and, because of recursion, loading) take place, provided that
all of the following properties are maintained:

...

* *Errors* detected during linkage *are thrown at a point in the program
where some action is taken by the program that might, directly or
indirectly, require linkage *to the class or interface involved in the
error.


And JVMS 5.4.3 Resolution says:

...
Subsequent attempts to resolve the symbolic reference always fail with the
same error that was thrown as a result of the initial resolution attempt.


So using the existing flexibility in the resolution of symbolic references
means we still have to report errors where they may occur in the program.
Looking back at the example classfile, we're within the spec to preload
Constant_class #8 using the existing rules but if it fails during preload,
we need to poison Constant_class #8 so reuse of it fails with the same
exception even if the load of class "V" would succeed later.

This is not the outcome I wanted when digging through the spec - I liked
our decision to ignore errors from preload.

--Dan

On Wed, Jun 14, 2023 at 8:46 AM Dan Heidinga <heidinga at redhat.com> wrote:

>
>
> On Wed, Jun 14, 2023 at 2:56 AM <forax at univ-mlv.fr> wrote:
>
>>
>>
>> ------------------------------
>>
>> *From: *"Dan Heidinga" <heidinga at redhat.com>
>> *To: *"Remi Forax" <forax at univ-mlv.fr>
>> *Cc: *"Brian Goetz" <brian.goetz at oracle.com>, "John Rose" <
>> john.r.rose at oracle.com>, "daniel smith" <daniel.smith at oracle.com>,
>> "valhalla-spec-experts" <valhalla-spec-experts at openjdk.java.net>
>> *Sent: *Tuesday, June 13, 2023 5:10:07 PM
>> *Subject: *Re: Preload attribute
>>
>>
>>
>> On Tue, Jun 13, 2023 at 10:13 AM Remi Forax <forax at univ-mlv.fr> wrote:
>>
>>>
>>>
>>> ------------------------------
>>>
>>> *From: *"Dan Heidinga" <heidinga at redhat.com>
>>> *To: *"Brian Goetz" <brian.goetz at oracle.com>
>>> *Cc: *"John Rose" <john.r.rose at oracle.com>, "daniel smith" <
>>> daniel.smith at oracle.com>, "valhalla-spec-experts" <
>>> valhalla-spec-experts at openjdk.java.net>
>>> *Sent: *Tuesday, June 13, 2023 3:31:24 PM
>>> *Subject: *Re: Preload attribute
>>>
>>>
>>>
>>> On Mon, Jun 12, 2023 at 10:44 AM Brian Goetz <brian.goetz at oracle.com>
>>> wrote:
>>>
>>>> As a reminder, Leyden will give us a more general tool for "moving
>>>> stuff around" at build time than CDS does, and that the current CDS
>>>> behavior may well be folded into a set of condensers.
>>>>
>>>> We are trying to find the "perfect" place to put preload information,
>>>> but we have (as usual) an overconstrained notion of perfection; what makes
>>>> perfect sense for semantics or non-duplication may not make perfect sense
>>>> for runtime behavior.
>>>>
>>>> Leyden will let us cut this knot by letting us put the information in
>>>> the classfile in the semantically sensible place, and let tooling boil it
>>>> down later at pre-deployment time to a representation that is more
>>>> efficient for runtime.
>>>>
>>>> So what I suggest is focusing on capturing the source data, which IMO
>>>> seems to still be some flavor of "class/method X needs to know more about
>>>> value class V before making certain decisions".  Preloading is the
>>>> mechanism of how we find out that "more", and aggregated representations
>>>> such as per-module / CDS archives are a rearranging of the source data to
>>>> achieve a more runtime friendly representation _for a particular
>>>> configuration of classes_.
>>>>
>>>> tl;dr: Let's design what captures the semantics we need, and treat
>>>> computing e.g. optimal load order as a downstream transformation.
>>>>
>>>
>>> That sounds reasonable.
>>>
>>> I think my original question about how the JVMS treats preload still
>>> needs to be addressed though.  What guarantees / requirements should we
>>> impose on the JVM's handling of preload?  The current spec is not clear
>>> enough for users to understand what they get from it and is too clever in
>>> handing off loading rules to JVMS 5.4's flexibility.
>>>
>>> My current position is we need to specify the behaviour and the point in
>>> the loading process where the preload attempts will occur so users can
>>> depend on the behaviour.  From John's emails, I think he would prefer to
>>> see preload become strictly an optimization and be outside the spec (John
>>> correct me if I've misstated).
>>>
>>>
>>> I'm on John side, if the VM never report if an error occurs when the
>>> Preload attribute is read, the user has no side effect to see when the
>>> attribute is read, so there is no need to specify the exact point where
>>> this attribute is read.
>>>
>>
>> Preload attempts to load the class which does cause user visible side
>> effects - ClassLoader::loadClass is called for one which users can observe
>> in a number of ways.  JVMTI can expose this info as can
>> j.l.instrument.Instrumentation::getInitiatedClasses(ClassLoader) &
>> :getAllLoadedClasses().  I'm sure there are other ways as well.
>>
>> My point being it is observable so we should specify it clearly.
>>
>>
>> Observability of classloading is an issue that Leyden has to handle, the
>> Preload attribute is just an instance of that issue.
>>
>
> Remi, we (the Valhalla EG) don't get to design Leyden's solutions.  We
> need to work within the JVMS or extend it in ways that are appropriate for
> supporting our efforts.  Let's let the Leyden folks solve Leyden problems =)
>
>
>
>> For me, until a class must be initialized, the VM is free to initiate a
>> class loading before that point, if the exception is delayed to only appear
>> at that point.
>>
>
> The spec gives us a lot of leeway on when classes are loaded provided
> errors are reported at the correct time.  One of the major strengths of
> Java is the specifications and the guarantees they provide to our users.
> Those guarantees constrain what we JVM implementers can do but they also
> provide guide rails for framework authors and app developers to know what
> behaviour they can rely on from the JVM.  When we get too "cute" or clever
> in our application of the freedoms in the spec, we undermine those
> guarantees our users need.  And we undermine the value of what we're
> providing.
>
> Preload as an optimization has been a great model for us to get to where
> we are today - Q's removed while still getting the calling convention
> optimizations for values.  Who would have thought we'd get here given where
> we started?  But it's a model that we need to thank for its service and
> wish it well Marie Kondo-style.
>
> Now we need to spec the behaviour so users can rely on it or they will try
> to reverse engineer rules from today's behaviour that constrains us in the
> future.  Better to author the rules we want than to be constrained by
> past's that's-how-it-happened-to-work behaviour.
>
> --Dan
>
>
>
>>
>> Rémi
>>
>>
>> --Dan
>>
>>>
>>>
>>> --Dan
>>>
>>>
>>> Rémi
>>>
>>>
>>>
>>>>
>>>>
>>>> On 6/12/2023 9:26 AM, Dan Heidinga wrote:
>>>>
>>>> The top-line goal for the preload efforts is to trigger the
>>>> necessary "go and look" behaviour to support calling convention flattening
>>>> for values.  We want the broadest, most reliable mechanism to ensure that
>>>> we routinely get flattening in the calling convention for value types so
>>>> that the flattening horizon can extend beyond a single compiled body (ie: a
>>>> method and its inlines).
>>>>
>>>> Summarizing the options presented so far:
>>>>
>>>> A) Value classes should be put into the CDS archive to ensure they are
>>>> loaded early enough, in a group, and in a form that the VM can quickly
>>>> discover whether calling convention optimizations apply to them.  This
>>>> involves either a class list to create a static archive (allows jdk
>>>> classes) or using a dynamic archive with AppCDS.  Both cases require a
>>>> "cold run" to generate the data needed for CDS and only capture classes
>>>> that have been loaded during that run (I think that's correct?).
>>>>
>>>> B) Use a "Watch List" to list class names that should be looked for.
>>>> When the name appears, trigger loading early enough to allow calling
>>>> convention optimizations to apply.  Name conflicts are "safe" as the worst
>>>> case is a class is loaded early in multiple loaders but is only a value in
>>>> one loader.  The watch list can be: global or per-module.  It's possible a
>>>> tool like jlink or jmod could be used to generate the watch list by
>>>> scanning all the classes included in the jimage/jmod file.
>>>>
>>>> C) The per-class preload attribute.  Each class lists the value classes
>>>> it may reference to ensure they are loaded early enough.  Potentially a lot
>>>> of duplication as each class in an application would list many of the same
>>>> value classes.
>>>>
>>>> Did I miss any?
>>>>
>>>> There's also another dimension we've touched on: how eager is eager
>>>> loading.  Current preload behaviour is to batch load all the listed
>>>> classes.  Alternatively, loading could wait until one of the classes was
>>>> observed in method signature / field signature and load on an as-needed
>>>> basis.
>>>>
>>>> We've mostly concentrated on preload as an optimization for calling
>>>> conventions but there may be other uses of the mechanism as well.  A user
>>>> may want to ensure that classes are loaded early to prevent optimizations
>>>> that need to be walked back later based on their knowledge of application
>>>> behaviour.  For example, ensuring there is always more than a single
>>>> implementor of an interface loaded to prevent CHA optimizations on some
>>>> critical path where the second implementation is normally loaded late.  Or
>>>> to ensure an entire sealed hierarchy is loaded together.  I haven't put
>>>> much thought into this yet but expect users will find interesting ways to
>>>> use "preload" if it's reliable enough for them.  (And of course, some will
>>>> abuse it in ways that hurt their performance as well).
>>>>
>>>> Which of these options meets the goal ("reliable, routine calling
>>>> convention optimization for values") best?
>>>>
>>>> --Dan
>>>>
>>>> On Fri, Jun 9, 2023 at 9:38 PM John Rose <john.r.rose at oracle.com>
>>>> wrote:
>>>>
>>>>> On 9 Jun 2023, at 12:41, Dan Heidinga wrote:
>>>>>
>>>>> On Thu, Jun 8, 2023 at 4:51 PM John Rose <john.r.rose at oracle.com>
>>>>> wrote:
>>>>>
>>>>> On 8 Jun 2023, at 9:52, Dan Heidinga wrote:
>>>>>
>>>>> On Thu, Jun 8, 2023 at 12:44 PM John Rose <john.r.rose at oracle.com>
>>>>> wrote:
>>>>>
>>>>> On 8 Jun 2023, at 9:01, Dan Heidinga wrote:
>>>>>
>>>>> If we decouple the list of preloadable classes from the classfile, how
>>>>> would non-jdk classes be handled?> What if instead of ditching the
>>>>>
>>>>> attribute, or treating it like an
>>>>>
>>>>> optimization, we firmed up the contract and treated it as a guarantee…
>>>>>
>>>>> If we go down this route, let’s consider putting the control
>>>>> information
>>>>> into a module file (only) for starters. (Maybe class file later if
>>>>> needed.) There would be fewer states to document and test, since (by
>>>>> definition) class files could not get out of sync.
>>>>>
>>>>> A module would document, in one mplace, which types it would “prefer”
>>>>> to
>>>>> preload in order to optimize its APIs (internal or external).
>>>>>
>>>>> This might lead to more class loading than intended. The current
>>>>> approach
>>>>> has each classfile register the list of classes it wants preloaded to
>>>>> get
>>>>> the best linkage which means we only have to load those classes if we
>>>>> link
>>>>> the original class. There's a natural trigger for the preload and a
>>>>> limited set of classes to load.
>>>>>
>>>>> There’s a spectrum of tradeoffs here: We could put preload attributes
>>>>> on
>>>>> every method and field, to get the maximum amount of fine-grained lazy
>>>>> (pre-)loading, or put them in a global file per JVM instance. The more
>>>>> fine-grained, the harder it will be to write compliance testing, I
>>>>> think.
>>>>>
>>>>> Agreed. There's a sweet spot between expressiveness and overheads
>>>>> (testing, metadata, etc). Classfiles have historically been the place
>>>>> where the JVM tracks this kind of information as that fits well with
>>>>> separate compilation and avoids the "external metadata" problems of
>>>>> ie:
>>>>> GraalVM's extra-linguistic configuration files.
>>>>>
>>>>> When compiling the current class, javac already requires directly
>>>>> referenced classes to be findable and thus has the info required to
>>>>> write a
>>>>> preload attribute. Does javac necessarily have the same info when
>>>>> compiling the module-info classfile? Maybe when finding the
>>>>> non-exported
>>>>> packages for the module javac (or jlink? or jmod?) could also find the
>>>>> value classes that need preloading?
>>>>>
>>>>> That is what I am assuming. The module file would be edited by those
>>>>> guys. Or (maybe better) a plain flat textual list is put somewhere the JVM
>>>>> can find it.
>>>>>
>>>>> Moving it into a separate pass like this doesn't feel like quite the
>>>>> right
>>>>> fit though as it excludes the classpath and complicates the other
>>>>> tools
>>>>> processing of the modules.
>>>>>
>>>>> I think it’s better than that. When we are assembling a program (jlink
>>>>> or a Leyden condenser), the responsibility of publicizing value classes
>>>>> (for Preload) surely belongs to the declaration, not collectively on all
>>>>> the uses.
>>>>>
>>>>> So every module (jmod or whatever) that declares 1 or more value
>>>>> classes (if they are exported, at least) should list them on a publicized
>>>>> watch list.
>>>>>
>>>>> There is no need to replicate these watch lists across all potential
>>>>> API clients of a value class. There are reasons *not* to do this,
>>>>> since the clients have only partial, provisional information about the
>>>>> values.
>>>>>
>>>>> Moving to a single per-module list loses the natural trigger and may
>>>>> pre-load more classes than the application will use. If Module A has
>>>>> classes {A, B, C} and each one preloads 5 separate classes, with a
>>>>> per-module list that's forcing the loading of 15 additional classes
>>>>> (plus
>>>>> supers, etc). With a per-class list, we only preload the classes on a
>>>>> per-use basis. More of a pay for what you use model.
>>>>>
>>>>> Is there a natural trigger or way to limit the preloads to what I
>>>>> might
>>>>> use
>>>>> with the per-module file?
>>>>>
>>>>> That’s a very good question. I think what Preload *really is* is a
>>>>> list
>>>>> of “names that may require special handling before using in APIs”.
>>>>> They
>>>>> don’t need to be loaded when the preload attribute is parsed; they are
>>>>> simply put in a “watch list” to trigger additional loading *when
>>>>> necessary*. (This is already true.) So I think if we move the preload
>>>>> list to (say) the module level (if not a global file), then the JVM
>>>>> will
>>>>> have its watch list. (And, in fewer chunks than if we put all the
>>>>> stuff all
>>>>> the time redundantly in all class files that might need them: That
>>>>> requires
>>>>> frequent repetition.) The JVM can use its watch list as it does today,
>>>>> with
>>>>> watch lists populated separately for each class file.
>>>>>
>>>>> I initially thought a global list would lead to issues if two
>>>>> different
>>>>> classloaders defined classes of the same name but since this is a "go
>>>>> and
>>>>> look" signal, early loading based on name should be fine even in that
>>>>> case
>>>>> as each loader that mentions the name would be asked to be asked to
>>>>> load
>>>>> their version of the named class. So I think a per-JVM list would be
>>>>> OK
>>>>> from that perspective (though I still don't like it).
>>>>>
>>>>> Agreed.
>>>>>
>>>>> To emphasize: A watch list does not require loading. It means, “if you
>>>>> see
>>>>> this name at a point where you could use extra class info, then I
>>>>> encourage
>>>>> you to load sooner rather than later”. The only reason it is “a thing”
>>>>> at
>>>>> all is that the default behavior (of loading either as late as
>>>>> possible, or
>>>>> as part of a CDS-like thingy) should be changed only on an explicit
>>>>> signal.
>>>>>
>>>>> While true for what the JVM needs, this is hard behaviour to explain
>>>>> to
>>>>> users and challenging for compliance test writers (or maybe not if we
>>>>> continue to treat preload as an optimization).
>>>>>
>>>>> I’m trying to reduce this to a pure optimization. In that case, “watch
>>>>> lists” are just helpers, which are allowed to fail, and allowed to be
>>>>> garbage.
>>>>>
>>>>> Is this where we want to
>>>>> spend our complexity budget?
>>>>>
>>>>> (No, hence it should be an optimization.)
>>>>>
>>>>> Part of why I'm circling back to treating
>>>>> preload as a per-classfile attribute that forms a requirement on the
>>>>> VM
>>>>> rather than as an optimization is that the model becomes clearer for
>>>>> users,
>>>>> developers and testers.
>>>>>
>>>>> I think it’s still going to be murky. Why is putting the watch list on
>>>>> the API clients better than putting it on (or near) the value class
>>>>> definitions?
>>>>>
>>>>> And, hey, maybe CDS is all the primitive we need here: Just run -Xdump
>>>>> with all of your class path loaded. Et voila, no Preload at all.
>>>>>
>>>>> Users may find this behaviour surprising - I ran with a CDS archive
>>>>> and my
>>>>> JVM loaded classes earlier than it would have otherwise?
>>>>>
>>>>> CDS has the effect of making class loading in a more timely fashion,
>>>>> and (under Leyden) will almost certainly trigger reordering of loading as
>>>>> well. So promulgating a “watch list” has goals which align with CDS.
>>>>>
>>>>> I’m starting to think that the right “level” to pull for optimizing
>>>>> value-based APIs is to put the value classes in a CDS archive. That is a
>>>>> defacto watch list. The jlink guy should just make a table of all value
>>>>> classes. That’s the best form of Preload I can imagine, frankly.
>>>>>
>>>>
>>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/valhalla-spec-observers/attachments/20230615/f1158dbf/attachment-0001.htm>


More information about the valhalla-spec-observers mailing list