deduplicating lambda methods

Mon Mar 26 23:14:34 UTC 2018

Hi Vicente,

I pushed an updated webrev to:
http://cr.openjdk.java.net/~cushon/lambdadedup/webrev.04/

The main changes are:
* I incorporated your patch for handling constant values.
* The change now includes a test harness for lambda deduplication.
* Deduplication is now skipped if debug info is enabled.

The test asserts different groups of lambdas in the test input are
deduplicated as expected, and has targeted coverage of diffing
and hashing for all pairwise combinations of lambdas in the test input.

I like Maurizio's idea of applying fuzzing here, but haven't yet
made progress on that. As an aside, there might be a separate larger
project here to investigate ways to fuzz javac APIs. I noticed there
was some work done to fuzz the class reader [1].

[1]
https://bugs.openjdk.java.net/browse/JDK-8054284?jql=labels%20%3D%20class_fuzzing

Re: deduplicating lambdas that declare locals,
e.g. : `() -> { int y = 42; return y; }`. I think that comparing locals by
name is correct, assuming that the diff is not used on trees
that can refer to locals declared in enclosing scopes (this is true for
desugared lambda bodies, but not in general). Another option is to have
the diff record local variable declarations it sees, and maintain a map
of symbols that can be treated as equivalent. The current lambda parameter
handling could be folded in to this--the lambda parameters would be initial
values in the map. This could also be generalized to other types of
declarations, but that becomes problematic for declarations whose identity
could escape the lambda. E.g. deduplicating two instances of
`x -> { class Foo {}; baz(Foo.class); }` could change behaviour.

Finally, I agree with Brian and Louis that we should have stronger
evidence that more sophisticated handling of no-ops would be a win before
adding complexity to support that. Inter-file handling of very common
lambdas
might be a more beneficial future improvement.

On Mon, Mar 26, 2018 at 6:21 AM Vicente Romero <vicente.romero at oracle.com>
wrote:

> Hi all,
>
> Although there are still areas under discussion I think that we should
> go for a 'final', pushable version of the patch, which I think should be
> based on the last iteration sent by Liam, plus code based in the changes
> I proposed to don't go deeper in the tree comparison as soon as a
> constant is found, plus regression tests. We can continue discussing
> about the no-op filtering proposed by Bernard and if we have data
> supporting that change, we can include it in a next iteration. I think
> that we all mostly agree on the consistency of the state I'm describing,
> so I think we should go for it,
>
> Thanks,
> Vicente
>
> On 03/02/2018 08:03 PM, Liam Miller-Cushon wrote:
> > Hello,
> >
> > I'm interested in adding support for deduplicating lambda methods to
> > javac. The idea is that if a compilation unit contains two lambdas
> > that are identical (including any captured state and the functional
> > interface they implement) we could re-use the same implementation
> > method for both.
> >
> > I understand there might have been some prior discussion about this.
> > Is there interest in investigating the feature? What sort of technical
> > considerations have been identified so far?
> >
> > I have been thinking about a couple of questions:
> >
> > 1) How to identifying duplicates: I have a prototype that runs during
> > lambda desugaring and identifies duplicates by diffing ASTs. Is that
> > the best place for deduplication, or it worth considering comparing
> > generated code instead of ASTs?
> >
> > 2) Debug info: the optimization is safe if line numbers are not being
> > emitted. If they are, is there a way to deduplicate the methods
> > without breaking debug info?
> >
> > Thanks,
> > Liam
>
>