Collector / Collectors

Brian Goetz brian.goetz at oracle.com
Wed Jun 26 11:47:28 PDT 2013


Here's my proposal for (1).  Simply adds Characteristics... arguments to 
of().

     /**
      * Returns a new {@code Collector} described by the given {@code 
supplier},
      * {@code accumulator}, and {@code combiner} functions.  The resulting
      * {@code Collector} has the {@code 
Collector.Characteristics.IDENTITY_FINISH}
      * characteristic.
      *
      * @param supplier The supplier function for the new collector
      * @param accumulator The accumulator function for the new collector
      * @param combiner The combiner function for the new collector
      * @param characteristics The collector characteristics for the new
      *                        collector
      * @param <T> The type of input elements for the new collector
      * @param <R> The type of intermediate accumulation result, and 
final result,
      *           for the new collector
      * @return the new {@code Collector}
      */
     static<T, R> Collector<T, R, R> of(Supplier<R> supplier,
                                        BiConsumer<R, T> accumulator,
                                        BinaryOperator<R> combiner,
                                        Characteristics... 
characteristics) {
         Set<Characteristics> cs = (characteristics.length == 0)
                                   ? Collectors.CH_ID
                                   : 
Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.IDENTITY_FINISH,
 
     characteristics));
         return new Collectors.CollectorImpl<>(supplier, accumulator, 
combiner, cs);
     }

     /**
      * Returns a new {@code Collector} described by the given {@code 
supplier},
      * {@code accumulator}, {@code combiner}, and {@code finisher} 
functions.
      *
      * @param supplier The supplier function for the new collector
      * @param accumulator The accumulator function for the new collector
      * @param combiner The combiner function for the new collector
      * @param finisher The finisher function for the new collector
      * @param characteristics The collector characteristics for the new
      *                        collector
      * @param <T> The type of input elements for the new collector
      * @param <A> The intermediate accumulation type of the new collector
      * @param <R> The final result type of the new collector
      * @return the new {@code Collector}
      */
     static<T, A, R> Collector<T, A, R> of(Supplier<A> supplier,
                                           BiConsumer<A, T> accumulator,
                                           BinaryOperator<A> combiner,
                                           Function<A, R> finisher,
                                           Characteristics... 
characteristics) {
         Set<Characteristics> cs = Collectors.CH_NOID;
         if (characteristics.length > 0) {
             cs = EnumSet.noneOf(Characteristics.class);
             Collections.addAll(cs, characteristics);
             cs = Collections.unmodifiableSet(cs);
         }
         return new Collectors.CollectorImpl<>(supplier, accumulator, 
combiner, finisher, cs);
     }




On 6/26/2013 2:26 PM, Brian Goetz wrote:
> As promised, here are what I see as the two last loose ends in
> Collector(s), both of which came out of the spec review.
>
> 1.  Collector.of() doesn't let you set the characteristics.  This means
> you can't make a concurrent collector.
>
> This is easily fixed by adding Characteristic... arguments to the two
> static factory methods.  There would be some slight ambiguity if it were
> legal for the finisher to be null, but a null finisher is invalid, so
> this is not a real concern.
>
> So
>      static<T, A, R> Collector<T, A, R> of(Supplier<A> supplier,
>                                            BiConsumer<A, T> accumulator,
>                                            BinaryOperator<A> combiner,
>                                            Function<A, R> finisher) {
>
> would become
>
>      static<T, A, R> Collector<T, A, R> of(Supplier<A> supplier,
>                                            BiConsumer<A, T> accumulator,
>                                            BinaryOperator<A> combiner,
>                                            Function<A, R> finisher,
>                                            Characteristic...
> characteristics) {
>
>
> 2.  Making the one-arg reducing(), and minBy/maxBy, return Optional
> means that queries like "tallest person by city" end up with Optional in
> the value:
>
>    Map<City, Optional<Person>> m
>      = people.collect(groupingBy(Person::getCity,
>                                  maxBy(comparing(Person::getHeight)));
>
> Which is doubly bad because the optionals here will *always* be present,
> since otherwise there'd be no associated key.
>
> I can see a few ways to address this:
>   - Provide non-optional versions of minBy/maxBy/reducing, which would
> probably have to return null for "no elements".
>   - Provide a way to add a finisher to a Collector, which is more
> general (and could be used, say, to turn toList() into something that
> always collects to an immutable list.)
>
> The latter could, in turn, be done in one of two ways:
>
>   - Add a andThen(f) method to Collector.  Then the above would read:
>
>     groupingBy(Person::getCity,
>                maxBy(comparing(Person::getHeight))
>                  .andThen(Optional::get))
>
>   - Add a combinator:
>
>     groupingBy(Person::getCity,
>                finishing(maxBy(comparing(Person::getHeight)),
>                          Optional::get)));
>
> I prefer the former because it reads better in usage; using a combinator
> function feels a little "inside out."
>
>
> On 6/25/2013 4:26 PM, Brian Goetz wrote:
>> Here's the review comments we got.  Most have already been dealt with.
>> One additional suggestion for a new Collector (transforming) comes out
>> of this.
>>
>>  From Mike:
>>   - Need extra disclaimers on summingDouble/averagingDouble.  Done.
>>
>>  From Sam:
>>   - Fix the examples where things have changed to Optional.  Done.
>> See [1]
>>
>>  From Joe:
>>   - <code> tags leaking into output.  This turns out to be a Javadoc bug
>> when you have a @link to a class that is not present.  Will not appear
>> in final output.
>>   - Example section repeated between Collector and Collectors.  I think
>> this is OK; users may land on either first, and I think these examples
>> are important.  Sadly Javadoc has no normal form.
>>   - General spec nits (some addressed already, others will be addressed
>> in final pass).
>>
>>  From Remi:
>>   - I don't understand how to create a Collector instead of forEach in
>> this case:
>>      ConcurrentHashMap<String, LongAdder> map
>>          = new ConcurrentHashMap<>();
>>      Arrays.stream(args).parallel()
>>            .forEach( arg -> map.computeIfAbsent(arg, k -> new
>> LongAdder()).increment() );  See [2]
>>
>>
>> [1] After changing the example:
>>
>>   *     Map<Department, Employee> highestPaidByDept
>>   *         = employees.stream()
>>   * .collect(Collectors.groupingBy(Employee::getDepartment,
>>   * Collectors.maxBy(Comparators.comparing(Employee::getSalary))));
>>
>> to
>>
>>   *     Map<Department, Optional<Employee>> highestPaidByDept
>>   *         = employees.stream()...
>>
>> this suggests we might want a combinator to add a post-function to an
>> existing Collector, such as:
>>
>>    <T,A,R,U> Collector<T,A,U>
>>              transforming(Collector<T,A,R> collector,
>>                           Function<R,U> transformer)
>>
>> that can be used in the downstream of a groupingBy to transform away the
>> Optional at the end:
>>
>>    groupingBy(Employee::getDepartment,
>>               transforming(maxBy(...), Optional::get))
>>
>> THis way, the user can get a Map<Department, Employee> rather than a
>> Map<Departhment, Optional<Employee>>.
>>
>>
>> [2] This can be done, but it involves boxing.
>>
>> Collector<Long, LongAdder, LongAdder> c =
>>    Collectors.of(LongAdder::new,
>>                  (a, v) -> a.increment(),
>>                  (a, b) -> /* merge LongAdder tables by key */);
>>
>> ConcurrentMap<String, LongAdder> map
>>     = stream.collect(groupingByConcurrent(Function.identity(), c));
>>
>>
>>
>> On 6/25/2013 4:01 PM, Brian Goetz wrote:
>>> I've closed the survey.  You can see responses here:
>>>
>>> https://www.surveymonkey.com/sr.aspx?sm=fAIHxonslEapeh3et_2biuRHRGUer22Q8awZI1yaJrsT0_3d
>>>
>>>
>>>
>>>
>>> I'll be summarizing and responding to these soon.
>>>
>>> On 6/18/2013 11:58 AM, Brian Goetz wrote:
>>>> I think I may (finally!) have a final API candidate for Collector /
>>>> Collectors.  Updated docs here:
>>>>
>>>>    http://cr.openjdk.java.net/~briangoetz/doctmp/doc/
>>>>
>>>> Salient recent changes:
>>>>   - Incorporation of finishing function into Collector
>>>>   - Renamings in Collector API
>>>>   - Morph toStringBuilder, toStringJoiner into joining()
>>>>   - Add prefix/suffix version of joining()
>>>>   - Addition of summingBy{Int,Long,Double}
>>>>   - Addition of averagingBy{Int,Long,Double}
>>>>   - Rename toXxxSummaryStatistics to summarizingXxx
>>>>   - no-see reducing, and minBy/maxBy collectors return Optional
>>>>   - elimination of canned merger functions from API
>>>>
>>>> Please review the API and spec.  I have set up a SurveyMonkey with the
>>>> usual password to collect review comments at:
>>>>
>>>>    https://www.surveymonkey.com/s/3TTBJ7K
>>>>
>>>>
>>


More information about the lambda-libs-spec-experts mailing list