Post-transform and the standard Collectors
Brian Goetz
brian.goetz at oracle.com
Wed Jun 12 10:15:06 PDT 2013
I've done a pass on the standard Collectors to adapt them to the
post-transform. Significant changes:
- All factory methods that returned Collector<T,R> now return
Collector<T,?,R>. (It is good that no factory method leaks its internal
type.) We can continue to discuss mitigation plans on this, if
necessary, in a separate thread.
- The accumulator function in collector is now back to a BiConsumer
rather than a BiFunction. This simplified a number of implementations.
The STRICTLY_MUTATIVE characteristic goes away entirely.
- toList is now back to strict ArrayList, as Remi requested.
- toStringBuilder can now hide its StringBuilder, and collect to a
String instead. So I renamed it "concatenating" (and also extended it
to collect CharSequence instead of String.)
- toStringJoiner can similarly hide the internal StringJoiner, so was
renamed to "joining(delimiter)". (Confusion with database joins is
possible, open to a better name.) Also on the to-do list: Paul
suggested a way to support the full form of StringJoiner (with prefix
and postfix) so I'll add an overload for that.
- The various reducing collectors can now use a mutable internal box
class, and hide that as an implementation detail, eliminating the
internal boxing in sumBy().
- It would be nice to overload sumBy(mapper) with int, long, and
double versions, but unfortunately we have crossed the boundary of what
type inference can disambiguate. We have some choices here:
- Have a single sumBy(ToLongFunction<T>)
- Rename to summingXxx, allowing summingInt(ToIntFunction),
summingLong(ToLongFunction), ...
- I want to add averaging() collectors (and now can), which would have
to follow whatever naming choice we select above.
- Related, we have separately named toXxxSummaryStatistics which
follow the same pattern. If we go with summingInt/averagingInt, maybe
this becomes summarizingInt? We also have the opportunity now to make
the resulting statistics immutable on completion -- do we want to do that?
To put it all in one place, here are the advantages of this additional
feature:
- It is the first thing that nearly every users asks for when they see
Collector; its lack is a significant gap. We had wanted this from the
beginning, but earlier versions of Collector made it impossible, but
later evolutions made it possible again.
- It makes possible Collectors like averaging(), which people want and
which were previously not practical.
- It enables Collectors to enforce invariants in the final result that
cannot be enforced in the intermediate accumulation, such as tree
balancing, immutability, etc.
- It enables Collectors like "toStringBuilder" to not leak their
internal state (StringBuilder) into the user code, but instead provide
the result type that the user actually wants (String).
- It eliminates the complexity of STRICTLY_MUTATIVE.
- It eliminates the performance overhead of boxing during reduction.
In totality, I see these benefits as a huge step forward. I realize
there are some rough edges and we can continue to discuss how to file
them down, or whether we wish to live with them.
I'll be checking these into lambda shortly and posting a link to the
docs for more detailed review.
On 5/28/2013 6:23 PM, Brian Goetz wrote:
> Adding the ability to have a post-transform function raises some
> questions about how the standard collectors should change to
> accomodate them. These fall into two categories: - Should we? -
> How?
>
> For collectors like toStringBuilder, we can now collect to a String
> and not expose the intermediate StringBuilder type. This is both
> closer to what the user wants and allows for better implementation
> hiding:
>
> static Collector<String, ?, String> toStringBuilder() { ... }
>
> Of course, now the name is wrong. So it would need a new name.
> (Ditto for toStringJoiner.)
>
> It also makes sense to have a new combinator that can attach a
> post-transform to an existing Collector (name is just a
> placeholder):
>
> <T, I, R> Collector<T, I, R> transforming(Function<I, R>,
> Collector<T, ?, I>)
>
> A harder question is how much to introduce immutability. For
> example, one negative of the current toList() collector is that the
> returned list is sometimes, but not always, immutable. It would be
> nice to be able to commit to something. We could easily make it
> immutable with a post-transform of Collections::immutableList. At
> first, this seems a no-brainer. But after more thought, it's
> definitely a "should we?"
>
> Consider how this plays as a downstream collector. The simplest form
> of groupingBy -- groupingBy(f) -- expands to groupingBy(f, toList()).
> If we made toList always return an immutable List, then we would have
> to apply the post-transform to every value of the resulting map,
> likely via a (sequential) Map.replaceAll on the simplest groupingBy
> operation, even when the user didn't care about immutability. Making
> every groupingBy user pay for this seems like a lot. (Alternately,
> the default toList() could still return an immutable list, but the
> default groupingBy could use a different downstream collector.)
>
> One option is to have mutable and immutable versions of every
> Collection/Map-bearing Collector. But this is a 2x explosion of
> Collectors, after we did so much work to pare back the size of the
> Collector set. Another is to have combinators for adding
> immutability to Collection, List, Set, and Map. Then an immutable
> groupingBy would be:
>
> collect(asImmutableMap(groupingBy(f, asImmutableList(toList()))));
>
> Wordy, but not terrible, and probably better than imposing the costs
> on everyone?
>
>
>
More information about the lambda-libs-spec-experts
mailing list