deduplicating lambda methods

Louis Wasserman lowasser at google.com
Mon Mar 5 20:24:51 UTC 2018


Apparently PDF attachments don't work.  Here's a link:
https://drive.google.com/file/d/1abAR_bueU0Zxy4e9XfLVzVe2nwy_POQm/view?usp=sharing


On Mon, Mar 5, 2018 at 11:55 AM Louis Wasserman <lowasser at google.com> wrote:

> So, here's a start on some of the data questions that have been asked in
> this thread.  There's an attached PDF with a bunch of nice graphs and
> tables discussing all lambdas in Google's codebase.  This is a pretty
> simplistic analysis, deduplicating lambdas syntactically modulo parameter
> name and target type including generics, but a more subtle analysis might
> merge some more things -- e.g. Function.identity().  Some other interesting
> data points not in the PDF:
>
> Among nongenerated files having any lambdas at all,
>
>    - 16.5% have at least one syntactic duplicate in them.
>    - The average number of lambdas with at least one syntactic duplicate
>    is 0.24.
>    - The average number of synthetic methods you'd eliminate by
>    deduplicating within a file is 0.47.
>
> The six most common (target type including generics, syntactic method body
> modulo parameter naming) pairs across our entire codebase were:
> (Runnable) () -> {}  // 674
> (Predicate<String>) str -> !str.isEmpty() // 640
> (Function<String, String>) str -> str // 492
> (Callable<Void>) () -> null // 259
> (Predicate<String>) str -> !Strings.isNullOrEmpty(str) // 204
> (Function<Long, Long>) x -> x // 177
>
> (Worth mentioning explicitly: x -> x + 1 was a ways down, with only 56
> occurrences for UnaryOperator<Integer> as the most common type.)
>
> Liam and I are still working on collecting information on method
> references and on statefulness and serializability.
>
>
>> ---------- Forwarded message ----------
>>> From: Brian Goetz <brian.goetz at oracle.com>
>>> Date: Fri, Mar 2, 2018 at 5:17 PM
>>> Subject: Re: deduplicating lambda methods
>>> To: Liam Miller-Cushon <cushon at google.com>, amber-dev at openjdk.java.net,
>>> Vicente-Arturo Romero-Zaldivar <vicente.romero at oracle.com>
>>>
>>
>>
>> I know you have some good tools at Google for codebase statistics.  Maybe
>>> you could pull together data on how often lambdas are duplicated within a
>>> source file, and of the duplicated lambdas, what percentage are stateless
>>> and non-serializable?
>>
>>


More information about the amber-dev mailing list