deduplicating lambda methods

Louis Wasserman lowasser at google.com
Mon Mar 5 19:55:19 UTC 2018


So, here's a start on some of the data questions that have been asked in
this thread.  There's an attached PDF with a bunch of nice graphs and
tables discussing all lambdas in Google's codebase.  This is a pretty
simplistic analysis, deduplicating lambdas syntactically modulo parameter
name and target type including generics, but a more subtle analysis might
merge some more things -- e.g. Function.identity().  Some other interesting
data points not in the PDF:

Among nongenerated files having any lambdas at all,

   - 16.5% have at least one syntactic duplicate in them.
   - The average number of lambdas with at least one syntactic duplicate is
   0.24.
   - The average number of synthetic methods you'd eliminate by
   deduplicating within a file is 0.47.

The six most common (target type including generics, syntactic method body
modulo parameter naming) pairs across our entire codebase were:
(Runnable) () -> {}  // 674
(Predicate<String>) str -> !str.isEmpty() // 640
(Function<String, String>) str -> str // 492
(Callable<Void>) () -> null // 259
(Predicate<String>) str -> !Strings.isNullOrEmpty(str) // 204
(Function<Long, Long>) x -> x // 177

(Worth mentioning explicitly: x -> x + 1 was a ways down, with only 56
occurrences for UnaryOperator<Integer> as the most common type.)

Liam and I are still working on collecting information on method references
and on statefulness and serializability.


> ---------- Forwarded message ----------
>> From: Brian Goetz <brian.goetz at oracle.com>
>> Date: Fri, Mar 2, 2018 at 5:17 PM
>> Subject: Re: deduplicating lambda methods
>> To: Liam Miller-Cushon <cushon at google.com>, amber-dev at openjdk.java.net,
>> Vicente-Arturo Romero-Zaldivar <vicente.romero at oracle.com>
>>
>
>
> I know you have some good tools at Google for codebase statistics.  Maybe
>> you could pull together data on how often lambdas are duplicated within a
>> source file, and of the duplicated lambdas, what percentage are stateless
>> and non-serializable?
>
>


More information about the amber-dev mailing list