Post-transform and the standard Collectors
Remi Forax
forax at univ-mlv.fr
Wed Jun 12 15:58:08 PDT 2013
On 06/13/2013 12:53 AM, Brian Goetz wrote:
> Easy:
>
> private static<T,A,R> Collector<T,A,R> wrapHelper(Collector<T,A,R> c) {
> return new CollectorImpl<>(
> () -> { print("hello"); return c.resultContainer(); },
> ...
> );
> }
>
> static<T,R> Collector<T,?,R> wrap(Collector<T,?,R> c) { return
> wrapHelper(c); }
>
> Now, what was the point of this exercise?
you cheat :)
the exercice is to take a Collector<T,?,R> not a Collector<T,A,R> as
parameter and return value.
Rémi
>
> On 6/12/2013 6:17 PM, Remi Forax wrote:
>> On 06/12/2013 10:59 PM, Brian Goetz wrote:
>>> I've posted a doc snapshot here:
>>> http://cr.openjdk.java.net/~briangoetz/doctmp/doc/
>>>
>>> As to the ? issue: looking at declarations like:
>>>
>>> static <T,K,D,A,M extends java.util.Map<K,D>>
>>> Collector<T,?,M> groupingBy(...)
>>>
>>> there's enough generics noise there that the additional question mark
>>> seems not the worst problem...
>>
>> I propose you an exercise, let say I want to write a static method that
>> take a Collector<T,?,M> as parameter
>> and returns a new one that will, for each methods of the collector
>> prints hello and delegate to the collector taken as parameter.
>>
>> Most of my students fail to write that code in the proper way (i.e.
>> without @SuppressWarnings everywhere).
>>
>> Rémi
>>
>>>
>>> On 6/12/2013 3:13 PM, Brian Goetz wrote:
>>>> A question this raises: it is now possible (wasn't before) for
>>>> Collectors like minBy to return Optional, like their stream
>>>> counterparts. However, it is far less likely that such a Collector
>>>> will
>>>> be invoked on an empty stream than Stream.minBy() will. Here's why:
>>>>
>>>> If all you're doing is getting the minima of a stream, you're more
>>>> likely to do
>>>>
>>>> stream.minBy(c)
>>>>
>>>> than
>>>>
>>>> stream.collect(Collectors.minBy(c))
>>>>
>>>> The more common cases where Collectors.minBy will be used is in the
>>>> downstream of a groupingBy:
>>>>
>>>> Map<Person, Txn> largestTxnBySeller =
>>>> txns.collect(groupingBy(Txn::seller,
>>>> maxBy(comparing(Txn::amount)));
>>>>
>>>> Here, we won't create a map key unless there is already one value.
>>>>
>>>> So there are arguments both for and against having these collectors
>>>> collect to Optional. (If we don't, we should document the value
>>>> associated with no results, which is almost certainly null for minBy,
>>>> maxBy, and reducing(op)).
>>>>
>>>> On 6/12/2013 1:15 PM, Brian Goetz wrote:
>>>>> I've done a pass on the standard Collectors to adapt them to the
>>>>> post-transform. Significant changes:
>>>>>
>>>>> - All factory methods that returned Collector<T,R> now return
>>>>> Collector<T,?,R>. (It is good that no factory method leaks its
>>>>> internal
>>>>> type.) We can continue to discuss mitigation plans on this, if
>>>>> necessary, in a separate thread.
>>>>>
>>>>> - The accumulator function in collector is now back to a BiConsumer
>>>>> rather than a BiFunction. This simplified a number of
>>>>> implementations.
>>>>> The STRICTLY_MUTATIVE characteristic goes away entirely.
>>>>>
>>>>> - toList is now back to strict ArrayList, as Remi requested.
>>>>>
>>>>> - toStringBuilder can now hide its StringBuilder, and collect to a
>>>>> String instead. So I renamed it "concatenating" (and also
>>>>> extended it
>>>>> to collect CharSequence instead of String.)
>>>>>
>>>>> - toStringJoiner can similarly hide the internal StringJoiner,
>>>>> so was
>>>>> renamed to "joining(delimiter)". (Confusion with database joins is
>>>>> possible, open to a better name.) Also on the to-do list: Paul
>>>>> suggested a way to support the full form of StringJoiner (with prefix
>>>>> and postfix) so I'll add an overload for that.
>>>>>
>>>>> - The various reducing collectors can now use a mutable internal
>>>>> box
>>>>> class, and hide that as an implementation detail, eliminating the
>>>>> internal boxing in sumBy().
>>>>>
>>>>> - It would be nice to overload sumBy(mapper) with int, long, and
>>>>> double versions, but unfortunately we have crossed the boundary of
>>>>> what
>>>>> type inference can disambiguate. We have some choices here:
>>>>> - Have a single sumBy(ToLongFunction<T>)
>>>>> - Rename to summingXxx, allowing summingInt(ToIntFunction),
>>>>> summingLong(ToLongFunction), ...
>>>>>
>>>>> - I want to add averaging() collectors (and now can), which would
>>>>> have
>>>>> to follow whatever naming choice we select above.
>>>>>
>>>>> - Related, we have separately named toXxxSummaryStatistics which
>>>>> follow the same pattern. If we go with summingInt/averagingInt,
>>>>> maybe
>>>>> this becomes summarizingInt? We also have the opportunity now to
>>>>> make
>>>>> the resulting statistics immutable on completion -- do we want to do
>>>>> that?
>>>>>
>>>>> To put it all in one place, here are the advantages of this
>>>>> additional
>>>>> feature:
>>>>>
>>>>> - It is the first thing that nearly every users asks for when they
>>>>> see
>>>>> Collector; its lack is a significant gap. We had wanted this from
>>>>> the
>>>>> beginning, but earlier versions of Collector made it impossible, but
>>>>> later evolutions made it possible again.
>>>>> - It makes possible Collectors like averaging(), which people want
>>>>> and
>>>>> which were previously not practical.
>>>>> - It enables Collectors to enforce invariants in the final result
>>>>> that
>>>>> cannot be enforced in the intermediate accumulation, such as tree
>>>>> balancing, immutability, etc.
>>>>> - It enables Collectors like "toStringBuilder" to not leak their
>>>>> internal state (StringBuilder) into the user code, but instead
>>>>> provide
>>>>> the result type that the user actually wants (String).
>>>>> - It eliminates the complexity of STRICTLY_MUTATIVE.
>>>>> - It eliminates the performance overhead of boxing during
>>>>> reduction.
>>>>>
>>>>> In totality, I see these benefits as a huge step forward. I realize
>>>>> there are some rough edges and we can continue to discuss how to file
>>>>> them down, or whether we wish to live with them.
>>>>>
>>>>> I'll be checking these into lambda shortly and posting a link to the
>>>>> docs for more detailed review.
>>>>>
>>>>> On 5/28/2013 6:23 PM, Brian Goetz wrote:
>>>>>> Adding the ability to have a post-transform function raises some
>>>>>> questions about how the standard collectors should change to
>>>>>> accomodate them. These fall into two categories: - Should we? -
>>>>>> How?
>>>>>>
>>>>>> For collectors like toStringBuilder, we can now collect to a String
>>>>>> and not expose the intermediate StringBuilder type. This is both
>>>>>> closer to what the user wants and allows for better implementation
>>>>>> hiding:
>>>>>>
>>>>>> static Collector<String, ?, String> toStringBuilder() { ... }
>>>>>>
>>>>>> Of course, now the name is wrong. So it would need a new name.
>>>>>> (Ditto for toStringJoiner.)
>>>>>>
>>>>>> It also makes sense to have a new combinator that can attach a
>>>>>> post-transform to an existing Collector (name is just a
>>>>>> placeholder):
>>>>>>
>>>>>> <T, I, R> Collector<T, I, R> transforming(Function<I, R>,
>>>>>> Collector<T, ?, I>)
>>>>>>
>>>>>> A harder question is how much to introduce immutability. For
>>>>>> example, one negative of the current toList() collector is that the
>>>>>> returned list is sometimes, but not always, immutable. It would be
>>>>>> nice to be able to commit to something. We could easily make it
>>>>>> immutable with a post-transform of Collections::immutableList. At
>>>>>> first, this seems a no-brainer. But after more thought, it's
>>>>>> definitely a "should we?"
>>>>>>
>>>>>> Consider how this plays as a downstream collector. The simplest
>>>>>> form
>>>>>> of groupingBy -- groupingBy(f) -- expands to groupingBy(f,
>>>>>> toList()).
>>>>>> If we made toList always return an immutable List, then we would
>>>>>> have
>>>>>> to apply the post-transform to every value of the resulting map,
>>>>>> likely via a (sequential) Map.replaceAll on the simplest groupingBy
>>>>>> operation, even when the user didn't care about immutability.
>>>>>> Making
>>>>>> every groupingBy user pay for this seems like a lot. (Alternately,
>>>>>> the default toList() could still return an immutable list, but the
>>>>>> default groupingBy could use a different downstream collector.)
>>>>>>
>>>>>> One option is to have mutable and immutable versions of every
>>>>>> Collection/Map-bearing Collector. But this is a 2x explosion of
>>>>>> Collectors, after we did so much work to pare back the size of the
>>>>>> Collector set. Another is to have combinators for adding
>>>>>> immutability to Collection, List, Set, and Map. Then an immutable
>>>>>> groupingBy would be:
>>>>>>
>>>>>> collect(asImmutableMap(groupingBy(f, asImmutableList(toList()))));
>>>>>>
>>>>>> Wordy, but not terrible, and probably better than imposing the costs
>>>>>> on everyone?
>>>>>>
>>>>>>
>>>>>>
>>
More information about the lambda-libs-spec-experts
mailing list