Toward Condensers
Brian Goetz
brian.goetz at oracle.com
Tue Aug 1 21:00:33 UTC 2023
On 8/1/2023 4:34 PM, Dan Heidinga wrote:
> Thanks for sharing this document. And for the work, you, Brian and
> Paul have been putting into this.
>
> A couple of questions / comments about the document based on my reading:
>
> * In the "The condenser pipeline" section, the model shows an
> Extractor -> Condenser 1 -> Condenser 2 -> Distiller as an example of
> the pipeline which is linear: A, then B, then C,.... Often when
> applying optimizations (ie: in a compiler), there's a virtuous circle
> where one optimization exposes new opportunities for another, which
> triggers more opportunities for the first. This leads to running
> through optimization passes until a fixpoint or some limit occurs.
> Dead code elimination is often an pass that benefits from being
> repeatedly run. This is still early days but I'll ask anyway: Have
> you thought about how the condenser pipeline would benefit from
> repeated application? Or how condensers could opt into repeated
> application?
Yes, some :) Suppose we want to apply the "replace Foo with Bar"
condenser and the "inject Baz" condenser, but Bar might use Foo. So
you'll want to run the first again after the second. While the diagram
shows linearization, the simple Condenser interface allows for much more
complicated composition. Additionally, we anticipate condensers will
have factory methods that take configuration data, such as which
classes/containers to transform. So the "inject Baz" condenser might
scan for suitable Baz injection points, do the needful, and then turn
around and
Condenser fooAgain = new FooToBarCondenser(listOfContainersITouched);
return fooAgain.condense(condensedResult);
So the second running of the FooToBar can happen inside the InjectBaz
condenser, from the outside it looks like there are just two, but the
reality is more complicated. (Another way to say this is that composing
condensers is so simple that maybe the runner need not really accept a
list of condensers, but one condenser, that will do the composition
itself, just as functions in haskell take exactly one argument.)
How it ultimately gets surfaced may be a matter of some bikeshedding,
but I think all the degrees of freedom needed are there already.
>
> * in "The application model", the first goals states:
> > Abstracted away from the representation — Condensers should not
> > directly read and write files as they do their work; they should express
> > their behavior in terms of changes to the model, and let the tooling
> > handle the representation.
>
> I agree with the goal of having the model mediate access to the
> classfiles / jars / modules / resource files, I think this goal may
> overstate the requirement of "not directly read and write files" as
> condensers that have offline training runs will want to use the
> filesystem to access their training data. Would phrasing this as
> "Condensers should only access classfiles and resources through the
> data model; they should ....." express the intent more clearly?
I think you've got the spirit of it. In a phased execution such as you
describe, the config files or training data also represents shifting
data across phases. Maybe that is embedded directly in the application
configuration as resources, or maybe we have to support the notion of a
"filesystem mount point" where we are asserting that a part of the
runtime file system has been shifted in time to a place in the
training-time file system. Details TBD.
> And the last goal states:
> > Scrutable — It should be possible to answer the question,
> > “what did that condenser do?”
>
> Which I don't see explicitly addressed in the rest of the document.
> It's a good goal. Is the intention here that the ModelUpdater can be
> interrogated to analyse the changes? Are you envisioning a logging
> mechanism of some sort here or something else?
I was thinking that the act of applying a ModelUpdater would optionally
produce a log in a standardized format, so you could see what actually
got done by a condenser. I also imagine that we will want to add Log
data to the data model eventually, so condensers can dump out analysis
data and have it show up in the log in the right place. The data model
as it stands is clearly super-simplified, and will evolve a lot, but
even this super-simplified version is enough to write condensers like
"turn lambda capture into inner classes."
> * In the "Data model" section, how are duplicate classes on the
> classpath handled? Are Containers representing JARs on the classpath
> explicitly ordered so as to linearize them to ensure the earliest
> definition of a class wins? Does the model need to expose the
> classpath ordering?
The application model gives us a list of modules and a list of classpath
entries. Since multi-valued attributes preserve their order, this
should be enough to preserve the existing story: that modules represent
a partition of packages, and we resolve conflicts on the classpath with
"first wins". (We can also have condensers that assert properties like
"no duplicates".)
> Is ContainerKind missing a "directory" type as well? It might be
> possible to pretend a filesystem directory on the CP is a JAR for
> model purposes, but that doesn't feel quite right. Did you consider
> directories when making the model? If they were deliberately excluded
> it might help to expand on why in the document.
I'll defer this one to Paul.
> Should the Data model include "classloader" as a member? With modules
> we can map which module will be loaded by which classloader and can
> guess for most classpath entries. For more complicated classloading
> schemes (including self-first), it might be beneficial to model the
> classloader network in the model as well. This may be something that
> a non-standard condenser could augment the model / analysis with.
Good question! Not sure yet.
> * In "Example: Lambda forms" contains the following sentence
> > After condensing we can update the java.base module on the file
> system from the updated application model.
>
> which should probably delegate the updating of the file system to the
> Distiller. Given the prohibition against filesystem access in the
> "The application model" section, talking about the filesystem here
> seems odd.
This seems like an error in the doc. Will investigate.
> I'm looking forward to seeing the prototype and trying to port my
> pregenerate lambdas jlink plugin to be a condenser. I think it should
> be a fairly smooth process.
Indeed, I think that will be a good validation test, and I expect it
will go smoothly.
More information about the leyden-dev
mailing list