Post-transform and the standard Collectors

Brian Goetz brian.goetz at oracle.com
Wed Jun 12 15:53:02 PDT 2013


Easy:

private static<T,A,R> Collector<T,A,R> wrapHelper(Collector<T,A,R> c) {
     return new CollectorImpl<>(
         () -> { print("hello"); return c.resultContainer(); },
         ...
     );
}

static<T,R> Collector<T,?,R> wrap(Collector<T,?,R> c) { return 
wrapHelper(c); }

Now, what was the point of this exercise?

On 6/12/2013 6:17 PM, Remi Forax wrote:
> On 06/12/2013 10:59 PM, Brian Goetz wrote:
>> I've posted a doc snapshot here:
>>   http://cr.openjdk.java.net/~briangoetz/doctmp/doc/
>>
>> As to the ? issue: looking at declarations like:
>>
>> static <T,K,D,A,M extends java.util.Map<K,D>>
>> Collector<T,?,M> groupingBy(...)
>>
>> there's enough generics noise there that the additional question mark
>> seems not the worst problem...
>
> I propose you an exercise, let say I want to write a static method that
> take a Collector<T,?,M> as parameter
> and returns a new one that will, for each methods of the collector
> prints hello and delegate to the collector taken as parameter.
>
> Most of my students fail to write that code in the proper way (i.e.
> without @SuppressWarnings everywhere).
>
> Rémi
>
>>
>> On 6/12/2013 3:13 PM, Brian Goetz wrote:
>>> A question this raises: it is now possible (wasn't before) for
>>> Collectors like minBy to return Optional, like their stream
>>> counterparts.  However, it is far less likely that such a Collector will
>>> be invoked on an empty stream than Stream.minBy() will.  Here's why:
>>>
>>> If all you're doing is getting the minima of a stream, you're more
>>> likely to do
>>>
>>>    stream.minBy(c)
>>>
>>> than
>>>
>>>    stream.collect(Collectors.minBy(c))
>>>
>>> The more common cases where Collectors.minBy will be used is in the
>>> downstream of a groupingBy:
>>>
>>>    Map<Person, Txn> largestTxnBySeller =
>>>      txns.collect(groupingBy(Txn::seller,
>>> maxBy(comparing(Txn::amount)));
>>>
>>> Here, we won't create a map key unless there is already one value.
>>>
>>> So there are arguments both for and against having these collectors
>>> collect to Optional.  (If we don't, we should document the value
>>> associated with no results, which is almost certainly null for minBy,
>>> maxBy, and reducing(op)).
>>>
>>> On 6/12/2013 1:15 PM, Brian Goetz wrote:
>>>> I've done a pass on the standard Collectors to adapt them to the
>>>> post-transform.  Significant changes:
>>>>
>>>>   - All factory methods that returned Collector<T,R> now return
>>>> Collector<T,?,R>.  (It is good that no factory method leaks its
>>>> internal
>>>> type.)  We can continue to discuss mitigation plans on this, if
>>>> necessary, in a separate thread.
>>>>
>>>>   - The accumulator function in collector is now back to a BiConsumer
>>>> rather than a BiFunction.  This simplified a number of implementations.
>>>>   The STRICTLY_MUTATIVE characteristic goes away entirely.
>>>>
>>>>   - toList is now back to strict ArrayList, as Remi requested.
>>>>
>>>>   - toStringBuilder can now hide its StringBuilder, and collect to a
>>>> String instead.  So I renamed it "concatenating" (and also extended it
>>>> to collect CharSequence instead of String.)
>>>>
>>>>   - toStringJoiner can similarly hide the internal StringJoiner, so was
>>>> renamed to "joining(delimiter)".  (Confusion with database joins is
>>>> possible, open to a better name.)  Also on the to-do list: Paul
>>>> suggested a way to support the full form of StringJoiner (with prefix
>>>> and postfix) so I'll add an overload for that.
>>>>
>>>>   - The various reducing collectors can now use a mutable internal box
>>>> class, and hide that as an implementation detail, eliminating the
>>>> internal boxing in sumBy().
>>>>
>>>>   - It would be nice to overload sumBy(mapper) with int, long, and
>>>> double versions, but unfortunately we have crossed the boundary of what
>>>> type inference can disambiguate.  We have some choices here:
>>>>     - Have a single sumBy(ToLongFunction<T>)
>>>>     - Rename to summingXxx, allowing summingInt(ToIntFunction),
>>>> summingLong(ToLongFunction), ...
>>>>
>>>>   - I want to add averaging() collectors (and now can), which would
>>>> have
>>>> to follow whatever naming choice we select above.
>>>>
>>>>   - Related, we have separately named toXxxSummaryStatistics which
>>>> follow the same pattern.  If we go with summingInt/averagingInt, maybe
>>>> this becomes summarizingInt?  We also have the opportunity now to make
>>>> the resulting statistics immutable on completion -- do we want to do
>>>> that?
>>>>
>>>> To put it all in one place, here are the advantages of this additional
>>>> feature:
>>>>
>>>>   - It is the first thing that nearly every users asks for when they
>>>> see
>>>> Collector; its lack is a significant gap.  We had wanted this from the
>>>> beginning, but earlier versions of Collector made it impossible, but
>>>> later evolutions made it possible again.
>>>>   - It makes possible Collectors like averaging(), which people want
>>>> and
>>>> which were previously not practical.
>>>>   - It enables Collectors to enforce invariants in the final result
>>>> that
>>>> cannot be enforced in the intermediate accumulation, such as tree
>>>> balancing, immutability, etc.
>>>>   - It enables Collectors like "toStringBuilder" to not leak their
>>>> internal state (StringBuilder) into the user code, but instead provide
>>>> the result type that the user actually wants (String).
>>>>   - It eliminates the complexity of STRICTLY_MUTATIVE.
>>>>   - It eliminates the performance overhead of boxing during reduction.
>>>>
>>>> In totality, I see these benefits as a huge step forward.  I realize
>>>> there are some rough edges and we can continue to discuss how to file
>>>> them down, or whether we wish to live with them.
>>>>
>>>> I'll be checking these into lambda shortly and posting a link to the
>>>> docs for more detailed review.
>>>>
>>>> On 5/28/2013 6:23 PM, Brian Goetz wrote:
>>>>> Adding the ability to have a post-transform function raises some
>>>>> questions about how the standard collectors should change to
>>>>> accomodate them.  These fall into two categories: - Should we? -
>>>>> How?
>>>>>
>>>>> For collectors like toStringBuilder, we can now collect to a String
>>>>> and not expose the intermediate StringBuilder type.  This is both
>>>>> closer to what the user wants and allows for better implementation
>>>>> hiding:
>>>>>
>>>>> static Collector<String, ?, String> toStringBuilder() { ... }
>>>>>
>>>>> Of course, now the name is wrong.  So it would need a new name.
>>>>> (Ditto for toStringJoiner.)
>>>>>
>>>>> It also makes sense to have a new combinator that can attach a
>>>>> post-transform to an existing Collector (name is just a
>>>>> placeholder):
>>>>>
>>>>> <T, I, R> Collector<T, I, R> transforming(Function<I, R>,
>>>>> Collector<T, ?, I>)
>>>>>
>>>>> A harder question is how much to introduce immutability. For
>>>>> example, one negative of the current toList() collector is that the
>>>>> returned list is sometimes, but not always, immutable.  It would be
>>>>> nice to be able to commit to something.  We could easily make it
>>>>> immutable with a post-transform of Collections::immutableList.  At
>>>>> first, this seems a no-brainer.  But after more thought, it's
>>>>> definitely a "should we?"
>>>>>
>>>>> Consider how this plays as a downstream collector.  The simplest form
>>>>> of groupingBy -- groupingBy(f) -- expands to groupingBy(f, toList()).
>>>>> If we made toList always return an immutable List, then we would have
>>>>> to apply the post-transform to every value of the resulting map,
>>>>> likely via a (sequential) Map.replaceAll on the simplest groupingBy
>>>>> operation, even when the user didn't care about immutability.  Making
>>>>> every groupingBy user pay for this seems like a lot. (Alternately,
>>>>> the default toList() could still return an immutable list, but the
>>>>> default groupingBy could use a different downstream collector.)
>>>>>
>>>>> One option is to have mutable and immutable versions of every
>>>>> Collection/Map-bearing Collector.  But this is a 2x explosion of
>>>>> Collectors, after we did so much work to pare back the size of the
>>>>> Collector set.   Another is to have combinators for adding
>>>>> immutability to Collection, List, Set, and Map.   Then an immutable
>>>>> groupingBy would be:
>>>>>
>>>>> collect(asImmutableMap(groupingBy(f, asImmutableList(toList()))));
>>>>>
>>>>> Wordy, but not terrible, and probably better than imposing the costs
>>>>> on everyone?
>>>>>
>>>>>
>>>>>
>


More information about the lambda-libs-spec-observers mailing list