Proposal to enhance Stream.collect

Mon Mar 4 03:08:16 UTC 2019

Hello August.

I'm not supporting the proposed spec change, but I have few comments
about implementation (in case OpenJDK reviewers would support it).

1. joining(): Setting initial StringBuilder size to the number of
characters which is equivalent to the number of stream elements would
be a good idea only under assumption that every stream element has the
length 1. Which is wrong in 99.9% of real-life cases.
2. flatMapping(), filtering(): Bypassing the original size to the
downstream collector is plain wrong here as number of resulting
elements can be either bigger (in flatMapping case) or arbitrarily
smaller (in both cases) than the stream size. In the latter case you
may just waste big amount of memory: a stream of million elements
could be collapsed to 1-2 elements, but you would still allocate an
ArrayList of a million of elements.
3. toMap(): a HashMap constructor parameter is an initial capacity,
not the number of elements. Supplying it and ignoring the load factor
you may force the HashMap to resize the table once which is
suboptimal. You should pass an estimated number of elements divided by
default load factor (and check corner cases as well).
4. teeing(): you defer null-check to the actual call of supplier or
sizedSupplier. This is very bad as broken collector (e.g. returning
null from sizedSupplier) may not fail fast for a long time if it's
never used for sized stream.
5. ReduceOps: also you defer a null-check and capture not the
function, but Collector object itself. This produces subtle behavioral
change for no reason.
6. ReferencePipeline::collect: I don't see why such huge change is necessary.

With best regards,
Tagir Valeev.

On Mon, Mar 4, 2019 at 7:36 AM August Nagro <augustnagro at gmail.com> wrote:
>
> Hi everyone,
>
> My implementation is at https://github.com/AugustNagro/presize-collectors-bench and I have a webrev here: http://august.nagro.us/presized-collectors/webrev/.
>
> The changes that I made were:
>
> Add `default IntFunction<A> sizedSupplier()` to Collector
> Add 2 new Collector.of helper methods taking a sized supplier
> Update the applicable collectors in Collectors to provide a sizedSupplier
> Have ReferencePipeline & ReduceOps call sizedSupplier when appropriate
>
> Some notes/questions I have:
>
> I considered adding Collector.Characteristics.PRESIZEABLE, but figured it was not needed given that sizedSupplier should always delegate to supplier.
> Collector.sizedSupplier could be a LongFunction instead of IntFunction (since Sink.begin provides long sizes), but every Collection type I've looked at uses int for initialCapacity, so I think IntFunction makes more sense.
> StringJoiner should be updated to take an initalCapacity (should I commit this change in the webrev, or leave for another enhancement?)
> What tests should I add for this enhancement?
>
> There are some benchmarks in the github repo. I was initially surprised when I saw that my bare-bones parallel stream did not have a throughput improvement like the serial one, but after doing some debugging I think it is from Collector.combiner dominating the runtime.
>
> Regards,
>
> August
>
> On Thu, Feb 28, 2019 at 12:56 AM Tagir Valeev <amaembo at gmail.com> wrote:
>>
>> Hello!
>>
>> I wouldn't use presized HashSet, because you never know how many duplicates are expected. What if the input stream has a million of elements, but only two of them are distinct? Do you really want to allocate a hash table for million elements in advance?
>>
>> For toMap() without custom supplier and merger preallocation sounds reasonable as in this case duplicating key is an exceptional case, so we may expect a specific number of elements.
>>
>> чт, 28 февр. 2019 г., 11:38 August Nagro augustnagro at gmail.com:
>>>
>>> Tagir,
>>>
>>> Great to see the validating benchmarks.
>>>
>>> > I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier).
>>>
>>> I would like to see your benchmarks for that statement too. After all, Hashmap, HashSet, etc all have presizing constructors, aren't those there for a reason?
>>>
>>> So I am still convinced that presizing is important for other collector types, including custom collectors. Although I'm open to having my mind changed.
>>>
>>>  I'm happy to contribute an implementation as well, I was hoping that this this (smallish) patch would help me learn the OpenJdk process and make future contributions easier!
>>>
>>> - August
>>>
>>> On Wed, Feb 27, 2019 at 1:06 AM Tagir Valeev <amaembo at gmail.com> wrote:
>>>>
>>>> Hello!
>>>>
>>>> > A less intrusive API direction might be a version of Collector whose
>>>> > supplier function took a size-estimate argument; this might even help in
>>>> > parallel since it allows for intermediate results to start with a better
>>>> > initial size guess.  (And this could be implemented as a default that
>>>> > delegates to the existing supplier.)  Still, not really sure this
>>>> > carries its weight.
>>>>
>>>> It's interesting that such optimization is possible without API
>>>> change, using the "hidden knowledge".
>>>> While public API stays the same, the collect method implementation may
>>>> check if custom
>>>> non-public Collector implementation is supplied, and in this case may
>>>> use the sized supplier.
>>>> I think it's the best solution given the fact that very few collectors
>>>> may benefit from the exact
>>>> known size, and this benefit usually disappears when collectors are
>>>> composed (e.g. using groupingBy:
>>>> downstream toList() would not win anything if it provides a sizedSupplier).
>>>>
>>>> I created a quick patch and launched a quick benchmark like this:
>>>> return new SplittableRandom(0).ints(length, 0,
>>>> 100).boxed().collect(Collectors.toList());
>>>> For length = 100, 10000, 1000000
>>>>
>>>> Here's results for vanilla 13-ea+8:
>>>> length = 100, avg time = 1434,469 ± 27,147  ns/op, alloc rate = 1712 B/op
>>>> length = 10000, avg time = 155032,390 ± 2657,927  ns/op, alloc rate =
>>>> 169280 B/op
>>>> length = 1000000, avg time = 27099621,763 ± 1979054,500 ns/op, alloc
>>>> rate = 14586750 B/op
>>>>
>>>> Patched:
>>>> length = 100, avg time = 1414,480 ± 30,040  ns/op, alloc rate = 768 B/op
>>>> length = 10000, avg time = 137338,320 ± 1316,789  ns/op, alloc rate = 40368 B/op
>>>> length = 1000000, avg time = 17654635,871 ± 210607,441  ns/op, alloc
>>>> rate = 4000382 B/op
>>>>
>>>> As you can see, exact allocation really helps to reduce alloc pressure
>>>> and produces better performance for big lists.
>>>>
>>>> If such kind of patch is acceptable for Stream API, I can file a
>>>> ticket and submit a webrev.
>>>>
>>>> With best regards,
>>>> Tagir Valeev
>>>>
>>>> The patch follows:
>>>>
>>>> --- src/java.base/share/classes/java/util/stream/Collectors.java
>>>> (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a)
>>>> +++ src/java.base/share/classes/java/util/stream/Collectors.java
>>>> (revision 53626+:e2fc434b410a+)
>>>> @@ -50,6 +50,7 @@
>>>>  import java.util.function.BinaryOperator;
>>>>  import java.util.function.Consumer;
>>>>  import java.util.function.Function;
>>>> +import java.util.function.IntFunction;
>>>>  import java.util.function.Predicate;
>>>>  import java.util.function.Supplier;
>>>>  import java.util.function.ToDoubleFunction;
>>>> @@ -194,23 +195,34 @@
>>>>       */
>>>>      static class CollectorImpl<T, A, R> implements Collector<T, A, R> {
>>>>          private final Supplier<A> supplier;
>>>> +        private final IntFunction<A> sizedSupplier;
>>>>          private final BiConsumer<A, T> accumulator;
>>>>          private final BinaryOperator<A> combiner;
>>>>          private final Function<A, R> finisher;
>>>>          private final Set<Characteristics> characteristics;
>>>>
>>>>          CollectorImpl(Supplier<A> supplier,
>>>> +                      IntFunction<A> sizedSupplier,
>>>>                        BiConsumer<A, T> accumulator,
>>>>                        BinaryOperator<A> combiner,
>>>>                        Function<A,R> finisher,
>>>>                        Set<Characteristics> characteristics) {
>>>>              this.supplier = supplier;
>>>> +            this.sizedSupplier = sizedSupplier;
>>>>              this.accumulator = accumulator;
>>>>              this.combiner = combiner;
>>>>              this.finisher = finisher;
>>>>              this.characteristics = characteristics;
>>>>          }
>>>>
>>>> +        CollectorImpl(Supplier<A> supplier,
>>>> +                      BiConsumer<A, T> accumulator,
>>>> +                      BinaryOperator<A> combiner,
>>>> +                      Function<A,R> finisher,
>>>> +                      Set<Characteristics> characteristics) {
>>>> +            this(supplier, null, accumulator, combiner, finisher,
>>>> characteristics);
>>>> +        }
>>>> +
>>>>          CollectorImpl(Supplier<A> supplier,
>>>>                        BiConsumer<A, T> accumulator,
>>>>                        BinaryOperator<A> combiner,
>>>> @@ -228,6 +240,10 @@
>>>>              return supplier;
>>>>          }
>>>>
>>>> +        IntFunction<A> sizedSupplier() {
>>>> +            return sizedSupplier;
>>>> +        }
>>>> +
>>>>          @Override
>>>>          public BinaryOperator<A> combiner() {
>>>>              return combiner;
>>>> @@ -275,8 +291,11 @@
>>>>       */
>>>>      public static <T>
>>>>      Collector<T, ?, List<T>> toList() {
>>>> -        return new CollectorImpl<>((Supplier<List<T>>)
>>>> ArrayList::new, List::add,
>>>> +        return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new,
>>>> +                                   ArrayList::new,
>>>> +                                   List::add,
>>>>                                     (left, right) -> {
>>>> left.addAll(right); return left; },
>>>> +                                   castingIdentity(),
>>>>                                     CH_ID);
>>>>      }
>>>>
>>>> Index: src/java.base/share/classes/java/util/stream/ReduceOps.java
>>>> IDEA additional info:
>>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
>>>> <+>UTF-8
>>>> ===================================================================
>>>> --- src/java.base/share/classes/java/util/stream/ReduceOps.java
>>>> (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a)
>>>> +++ src/java.base/share/classes/java/util/stream/ReduceOps.java
>>>> (revision 53626+:e2fc434b410a+)
>>>> @@ -36,6 +36,7 @@
>>>>  import java.util.function.BinaryOperator;
>>>>  import java.util.function.DoubleBinaryOperator;
>>>>  import java.util.function.IntBinaryOperator;
>>>> +import java.util.function.IntFunction;
>>>>  import java.util.function.LongBinaryOperator;
>>>>  import java.util.function.ObjDoubleConsumer;
>>>>  import java.util.function.ObjIntConsumer;
>>>> @@ -155,13 +156,20 @@
>>>>      public static <T, I> TerminalOp<T, I>
>>>>      makeRef(Collector<? super T, I, ?> collector) {
>>>>          Supplier<I> supplier = Objects.requireNonNull(collector).supplier();
>>>> +        @SuppressWarnings("unchecked")
>>>> +        IntFunction<I> sizedSupplier = collector instanceof
>>>> Collectors.CollectorImpl ?
>>>> +                ((Collectors.CollectorImpl<?, I, ?>)
>>>> collector).sizedSupplier() : null;
>>>>          BiConsumer<I, ? super T> accumulator = collector.accumulator();
>>>>          BinaryOperator<I> combiner = collector.combiner();
>>>>          class ReducingSink extends Box<I>
>>>>                  implements AccumulatingSink<T, I, ReducingSink> {
>>>>              @Override
>>>>              public void begin(long size) {
>>>> -                state = supplier.get();
>>>> +                if (sizedSupplier != null && size >= 0 && size <=
>>>> Integer.MAX_VALUE) {
>>>> +                    state = sizedSupplier.apply((int) size);
>>>> +                } else {
>>>> +                    state = supplier.get();
>>>> +                }
>>>>              }
>>>>
>>>>              @Override