deduplicating lambda methods

Sat Mar 17 01:48:32 UTC 2018

Hi,

An updated version is at:
http://cr.openjdk.java.net/~cushon/lambdadedup/webrev.02/

On Fri, Mar 16, 2018 at 11:46 AM Maurizio Cimadamore <
maurizio.cimadamore at oracle.com> wrote:

> Having a tree scanner accepting a visitor parameter would be good -
> although can be done outside the scope of this patch.
>

Sounds good, I'll add it to my list to investigate later.

Note that, in the meantime, it's rather easy to mimic the visitor
> parameter idiom - just define a 'scan' method like this:
>

Thanks! I switched to com.sun.tools.javac.comp.TreeScanner using your
approach.

On Fri, Mar 16, 2018 at 12:20 PM Vicente Romero <vicente.romero at oracle.com>
wrote:

> TreeHasher should skip
> parenthesis of expressions, to obtain the same hash for, for example:
> return (i + j); and return i + j;
>

Done, thanks.

> Also, as a bonus, consider constant expressions, we probably want:
> return 5; and return 2 + 3; to be equal.
>
> I think that these two use cases will need a "normalization" step to be
> applied to the tree and store in the map, set, only a normalized version
> of the tree. I understand that at least considering constants could be
> out of the scope of this patch, so I can see it as a follow up development.
>

Are you envisioning this as something that would happen as part of tree
comparision, or something that would fall out of other work around constant
folding? If support for comprehensive compile-time constant folding happens,
and it's performed prior to lambda deduplication, then your examples would
be supported for free without the need for a more sophisticated diff.

On Fri, Mar 16, 2018 at 1:07 PM Vicente Romero <vicente.romero at oracle.com>
wrote:

> I could be wrong but it seems to me that TreeHasher generates too many
> false positives as the hash number obtained won't have any topological
> information about the tree. In other words, it seems to me that the hash
> generated will be the same regardless of the way the tree is traversed.
> I think that we are interested in obtaining a hash that is a topological
> descriptor of the tree. This will probably imply making TreeHasher a
> full fledged visitor.
>

The hash function needs more work. I am still planning to collect data
on how often collisions are happening as part of quantifying the
compile-time performance overhead.

However, the collisions I am seeing right now are mostly from identifiers
and literals, and not from differences in the topology. I'm having trouble
thinking of uncontrived examples where the topology of two ASTs is
different, but the result of traversing all nodes and recording their tags
is the same. Do you think that's going to be common, and did you have
examples in mind?