Toward Condensers
Dan Heidinga
heidinga at redhat.com
Wed Aug 2 13:55:58 UTC 2023
On Tue, Aug 1, 2023 at 5:00 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>
>
> On 8/1/2023 4:34 PM, Dan Heidinga wrote:
> > Thanks for sharing this document. And for the work, you, Brian and
> > Paul have been putting into this.
> >
> > A couple of questions / comments about the document based on my reading:
> >
> > * In the "The condenser pipeline" section, the model shows an
> > Extractor -> Condenser 1 -> Condenser 2 -> Distiller as an example of
> > the pipeline which is linear: A, then B, then C,.... Often when
> > applying optimizations (ie: in a compiler), there's a virtuous circle
> > where one optimization exposes new opportunities for another, which
> > triggers more opportunities for the first. This leads to running
> > through optimization passes until a fixpoint or some limit occurs.
> > Dead code elimination is often an pass that benefits from being
> > repeatedly run. This is still early days but I'll ask anyway: Have
> > you thought about how the condenser pipeline would benefit from
> > repeated application? Or how condensers could opt into repeated
> > application?
>
> Yes, some :) Suppose we want to apply the "replace Foo with Bar"
> condenser and the "inject Baz" condenser, but Bar might use Foo. So
> you'll want to run the first again after the second. While the diagram
> shows linearization, the simple Condenser interface allows for much more
> complicated composition. Additionally, we anticipate condensers will
> have factory methods that take configuration data, such as which
> classes/containers to transform. So the "inject Baz" condenser might
> scan for suitable Baz injection points, do the needful, and then turn
> around and
>
> Condenser fooAgain = new FooToBarCondenser(listOfContainersITouched);
> return fooAgain.condense(condensedResult);
>
> So the second running of the FooToBar can happen inside the InjectBaz
> condenser, from the outside it looks like there are just two, but the
> reality is more complicated. (Another way to say this is that composing
> condensers is so simple that maybe the runner need not really accept a
> list of condensers, but one condenser, that will do the composition
> itself, just as functions in haskell take exactly one argument.)
>
> How it ultimately gets surfaced may be a matter of some bikeshedding,
> but I think all the degrees of freedom needed are there already.
>
Thanks Brian. Glad to hear this is already on the table as trying to
retrofit it would be ugly. Looking forward to bikeshedding this in the
future.
>
> >
> > * in "The application model", the first goals states:
> > > Abstracted away from the representation — Condensers should not
> > > directly read and write files as they do their work; they should
> express
> > > their behavior in terms of changes to the model, and let the tooling
> > > handle the representation.
> >
> > I agree with the goal of having the model mediate access to the
> > classfiles / jars / modules / resource files, I think this goal may
> > overstate the requirement of "not directly read and write files" as
> > condensers that have offline training runs will want to use the
> > filesystem to access their training data. Would phrasing this as
> > "Condensers should only access classfiles and resources through the
> > data model; they should ....." express the intent more clearly?
>
> I think you've got the spirit of it. In a phased execution such as you
> describe, the config files or training data also represents shifting
> data across phases. Maybe that is embedded directly in the application
> configuration as resources, or maybe we have to support the notion of a
> "filesystem mount point" where we are asserting that a part of the
> runtime file system has been shifted in time to a place in the
> training-time file system. Details TBD.
>
That's an interesting idea but it hints at a bigger problem space than I
was thinking of. My concern had been that this acts as a prohibition
against Condensers reading Condenser-specific files from the filesystem and
given that's a probation we can't enforce, it would be better to avoid
making it. To make this more concrete, the LFCondenser example could read
the TRACE_RESOLVE data from a pre-generated file rather than scraping the
output from invoking the JVM directly. Let's not suggest prohibitions we
don't intend to enforce.
And definitely interested in details on the larger problem of shifting the
file system mount points across phases.
> > And the last goal states:
> > > Scrutable — It should be possible to answer the question,
> > > “what did that condenser do?”
> >
> > Which I don't see explicitly addressed in the rest of the document.
> > It's a good goal. Is the intention here that the ModelUpdater can be
> > interrogated to analyse the changes? Are you envisioning a logging
> > mechanism of some sort here or something else?
>
> I was thinking that the act of applying a ModelUpdater would optionally
> produce a log in a standardized format, so you could see what actually
> got done by a condenser. I also imagine that we will want to add Log
> data to the data model eventually, so condensers can dump out analysis
> data and have it show up in the log in the right place. The data model
> as it stands is clearly super-simplified, and will evolve a lot, but
> even this super-simplified version is enough to write condensers like
> "turn lambda capture into inner classes."
>
> > * In the "Data model" section, how are duplicate classes on the
> > classpath handled? Are Containers representing JARs on the classpath
> > explicitly ordered so as to linearize them to ensure the earliest
> > definition of a class wins? Does the model need to expose the
> > classpath ordering?
>
> The application model gives us a list of modules and a list of classpath
> entries. Since multi-valued attributes preserve their order, this
> should be enough to preserve the existing story: that modules represent
> a partition of packages, and we resolve conflicts on the classpath with
> "first wins". (We can also have condensers that assert properties like
> "no duplicates".)
>
You're right all the pieces are in place to be able to do "first wins"
processing. I had been thinking about this from the perspective of
processing the Stream<ClassKey> from a container and thinking that it would
be hard to tell if I was looking at the first definition of a class or
not. And it would be hard. But it's also the wrong question as I don't
want to have to ask it when processing every class and will regularly
forget to do so (as will others). The "no duplicates" condenser shifts the
problem and ensures each downstream condenser doesn't have to care. Ok, so
that's my agreement that I think the data's all there and that adding
anything additional to support this would be the wrong thing to do.
> > Is ContainerKind missing a "directory" type as well? It might be
> > possible to pretend a filesystem directory on the CP is a JAR for
> > model purposes, but that doesn't feel quite right. Did you consider
> > directories when making the model? If they were deliberately excluded
> > it might help to expand on why in the document.
>
> I'll defer this one to Paul.
>
> > Should the Data model include "classloader" as a member? With modules
> > we can map which module will be loaded by which classloader and can
> > guess for most classpath entries. For more complicated classloading
> > schemes (including self-first), it might be beneficial to model the
> > classloader network in the model as well. This may be something that
> > a non-standard condenser could augment the model / analysis with.
>
> Good question! Not sure yet.
>
> > * In "Example: Lambda forms" contains the following sentence
> > > After condensing we can update the java.base module on the file
> > system from the updated application model.
> >
> > which should probably delegate the updating of the file system to the
> > Distiller. Given the prohibition against filesystem access in the
> > "The application model" section, talking about the filesystem here
> > seems odd.
>
> This seems like an error in the doc. Will investigate.
>
> > I'm looking forward to seeing the prototype and trying to port my
> > pregenerate lambdas jlink plugin to be a condenser. I think it should
> > be a fairly smooth process.
>
> Indeed, I think that will be a good validation test, and I expect it
> will go smoothly.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20230802/f0ea9fd8/attachment.htm>
More information about the leyden-dev
mailing list