Post-transform and the standard Collectors

Wed Jun 12 15:58:08 PDT 2013

On 06/13/2013 12:53 AM, Brian Goetz wrote:
> Easy:
>
> private static<T,A,R> Collector<T,A,R> wrapHelper(Collector<T,A,R> c) {
>     return new CollectorImpl<>(
>         () -> { print("hello"); return c.resultContainer(); },
>         ...
>     );
> }
>
> static<T,R> Collector<T,?,R> wrap(Collector<T,?,R> c) { return 
> wrapHelper(c); }
>
> Now, what was the point of this exercise?

you cheat :)
the exercice is to take a Collector<T,?,R> not a Collector<T,A,R> as 
parameter and return value.

Rémi

>
> On 6/12/2013 6:17 PM, Remi Forax wrote:
>> On 06/12/2013 10:59 PM, Brian Goetz wrote:
>>> I've posted a doc snapshot here:
>>>   http://cr.openjdk.java.net/~briangoetz/doctmp/doc/
>>>
>>> As to the ? issue: looking at declarations like:
>>>
>>> static <T,K,D,A,M extends java.util.Map<K,D>>
>>> Collector<T,?,M> groupingBy(...)
>>>
>>> there's enough generics noise there that the additional question mark
>>> seems not the worst problem...
>>
>> I propose you an exercise, let say I want to write a static method that
>> take a Collector<T,?,M> as parameter
>> and returns a new one that will, for each methods of the collector
>> prints hello and delegate to the collector taken as parameter.
>>
>> Most of my students fail to write that code in the proper way (i.e.
>> without @SuppressWarnings everywhere).
>>
>> Rémi
>>
>>>
>>> On 6/12/2013 3:13 PM, Brian Goetz wrote:
>>>> A question this raises: it is now possible (wasn't before) for
>>>> Collectors like minBy to return Optional, like their stream
>>>> counterparts.  However, it is far less likely that such a Collector 
>>>> will
>>>> be invoked on an empty stream than Stream.minBy() will. Here's why:
>>>>
>>>> If all you're doing is getting the minima of a stream, you're more
>>>> likely to do
>>>>
>>>>    stream.minBy(c)
>>>>
>>>> than
>>>>
>>>>    stream.collect(Collectors.minBy(c))
>>>>
>>>> The more common cases where Collectors.minBy will be used is in the
>>>> downstream of a groupingBy:
>>>>
>>>>    Map<Person, Txn> largestTxnBySeller =
>>>>      txns.collect(groupingBy(Txn::seller,
>>>> maxBy(comparing(Txn::amount)));
>>>>
>>>> Here, we won't create a map key unless there is already one value.
>>>>
>>>> So there are arguments both for and against having these collectors
>>>> collect to Optional.  (If we don't, we should document the value
>>>> associated with no results, which is almost certainly null for minBy,
>>>> maxBy, and reducing(op)).
>>>>
>>>> On 6/12/2013 1:15 PM, Brian Goetz wrote:
>>>>> I've done a pass on the standard Collectors to adapt them to the
>>>>> post-transform.  Significant changes:
>>>>>
>>>>>   - All factory methods that returned Collector<T,R> now return
>>>>> Collector<T,?,R>.  (It is good that no factory method leaks its
>>>>> internal
>>>>> type.)  We can continue to discuss mitigation plans on this, if
>>>>> necessary, in a separate thread.
>>>>>
>>>>>   - The accumulator function in collector is now back to a BiConsumer
>>>>> rather than a BiFunction.  This simplified a number of 
>>>>> implementations.
>>>>>   The STRICTLY_MUTATIVE characteristic goes away entirely.
>>>>>
>>>>>   - toList is now back to strict ArrayList, as Remi requested.
>>>>>
>>>>>   - toStringBuilder can now hide its StringBuilder, and collect to a
>>>>> String instead.  So I renamed it "concatenating" (and also 
>>>>> extended it
>>>>> to collect CharSequence instead of String.)
>>>>>
>>>>>   - toStringJoiner can similarly hide the internal StringJoiner, 
>>>>> so was
>>>>> renamed to "joining(delimiter)".  (Confusion with database joins is
>>>>> possible, open to a better name.)  Also on the to-do list: Paul
>>>>> suggested a way to support the full form of StringJoiner (with prefix
>>>>> and postfix) so I'll add an overload for that.
>>>>>
>>>>>   - The various reducing collectors can now use a mutable internal 
>>>>> box
>>>>> class, and hide that as an implementation detail, eliminating the
>>>>> internal boxing in sumBy().
>>>>>
>>>>>   - It would be nice to overload sumBy(mapper) with int, long, and
>>>>> double versions, but unfortunately we have crossed the boundary of 
>>>>> what
>>>>> type inference can disambiguate.  We have some choices here:
>>>>>     - Have a single sumBy(ToLongFunction<T>)
>>>>>     - Rename to summingXxx, allowing summingInt(ToIntFunction),
>>>>> summingLong(ToLongFunction), ...
>>>>>
>>>>>   - I want to add averaging() collectors (and now can), which would
>>>>> have
>>>>> to follow whatever naming choice we select above.
>>>>>
>>>>>   - Related, we have separately named toXxxSummaryStatistics which
>>>>> follow the same pattern.  If we go with summingInt/averagingInt, 
>>>>> maybe
>>>>> this becomes summarizingInt?  We also have the opportunity now to 
>>>>> make
>>>>> the resulting statistics immutable on completion -- do we want to do
>>>>> that?
>>>>>
>>>>> To put it all in one place, here are the advantages of this 
>>>>> additional
>>>>> feature:
>>>>>
>>>>>   - It is the first thing that nearly every users asks for when they
>>>>> see
>>>>> Collector; its lack is a significant gap.  We had wanted this from 
>>>>> the
>>>>> beginning, but earlier versions of Collector made it impossible, but
>>>>> later evolutions made it possible again.
>>>>>   - It makes possible Collectors like averaging(), which people want
>>>>> and
>>>>> which were previously not practical.
>>>>>   - It enables Collectors to enforce invariants in the final result
>>>>> that
>>>>> cannot be enforced in the intermediate accumulation, such as tree
>>>>> balancing, immutability, etc.
>>>>>   - It enables Collectors like "toStringBuilder" to not leak their
>>>>> internal state (StringBuilder) into the user code, but instead 
>>>>> provide
>>>>> the result type that the user actually wants (String).
>>>>>   - It eliminates the complexity of STRICTLY_MUTATIVE.
>>>>>   - It eliminates the performance overhead of boxing during 
>>>>> reduction.
>>>>>
>>>>> In totality, I see these benefits as a huge step forward. I realize
>>>>> there are some rough edges and we can continue to discuss how to file
>>>>> them down, or whether we wish to live with them.
>>>>>
>>>>> I'll be checking these into lambda shortly and posting a link to the
>>>>> docs for more detailed review.
>>>>>
>>>>> On 5/28/2013 6:23 PM, Brian Goetz wrote:
>>>>>> Adding the ability to have a post-transform function raises some
>>>>>> questions about how the standard collectors should change to
>>>>>> accomodate them.  These fall into two categories: - Should we? -
>>>>>> How?
>>>>>>
>>>>>> For collectors like toStringBuilder, we can now collect to a String
>>>>>> and not expose the intermediate StringBuilder type. This is both
>>>>>> closer to what the user wants and allows for better implementation
>>>>>> hiding:
>>>>>>
>>>>>> static Collector<String, ?, String> toStringBuilder() { ... }
>>>>>>
>>>>>> Of course, now the name is wrong.  So it would need a new name.
>>>>>> (Ditto for toStringJoiner.)
>>>>>>
>>>>>> It also makes sense to have a new combinator that can attach a
>>>>>> post-transform to an existing Collector (name is just a
>>>>>> placeholder):
>>>>>>
>>>>>> <T, I, R> Collector<T, I, R> transforming(Function<I, R>,
>>>>>> Collector<T, ?, I>)
>>>>>>
>>>>>> A harder question is how much to introduce immutability. For
>>>>>> example, one negative of the current toList() collector is that the
>>>>>> returned list is sometimes, but not always, immutable. It would be
>>>>>> nice to be able to commit to something.  We could easily make it
>>>>>> immutable with a post-transform of Collections::immutableList.  At
>>>>>> first, this seems a no-brainer.  But after more thought, it's
>>>>>> definitely a "should we?"
>>>>>>
>>>>>> Consider how this plays as a downstream collector.  The simplest 
>>>>>> form
>>>>>> of groupingBy -- groupingBy(f) -- expands to groupingBy(f, 
>>>>>> toList()).
>>>>>> If we made toList always return an immutable List, then we would 
>>>>>> have
>>>>>> to apply the post-transform to every value of the resulting map,
>>>>>> likely via a (sequential) Map.replaceAll on the simplest groupingBy
>>>>>> operation, even when the user didn't care about immutability.  
>>>>>> Making
>>>>>> every groupingBy user pay for this seems like a lot. (Alternately,
>>>>>> the default toList() could still return an immutable list, but the
>>>>>> default groupingBy could use a different downstream collector.)
>>>>>>
>>>>>> One option is to have mutable and immutable versions of every
>>>>>> Collection/Map-bearing Collector.  But this is a 2x explosion of
>>>>>> Collectors, after we did so much work to pare back the size of the
>>>>>> Collector set.   Another is to have combinators for adding
>>>>>> immutability to Collection, List, Set, and Map.   Then an immutable
>>>>>> groupingBy would be:
>>>>>>
>>>>>> collect(asImmutableMap(groupingBy(f, asImmutableList(toList()))));
>>>>>>
>>>>>> Wordy, but not terrible, and probably better than imposing the costs
>>>>>> on everyone?
>>>>>>
>>>>>>
>>>>>>
>>