Proposal to enhance Stream.collect

Mon Mar 4 06:18:22 UTC 2019

Tagir,

I slapped my forehead when I saw that StringBuilder's initialCapacity is
the number of chars, and not added elements!

The CollectorOp class I added for ReferencePipeline.collect(Collector) is
so a concurrent + unordered collector may be pre-sized. I thought it was
needed for completeness (let concurrent collectors benefit from pre-sizing
just like serial). But maybe not worth the change?

I hope I can change your mind about the spec change. We agree that
pre-sizing collections is important, and the benchmarks show that Streams
supporting pre-sizing can reduce allocations (up to 60% for ArrayLists)
while often improving throughput. The question then is if the ends are
worth the means? Take a look at this (non-exhaustive) list of Collections
that stand to benefit from having Collector.sizedSupplier():

   - Eclipse Collections. The library provides its own Collectors2 utility
   class [1], and the authors have high performance / low memory usage as a
   goal.
      - MutableList
      - Used in Collectors2.toList(). Collector.sizedSupplier() could be
         specified with Lists.mutable.ofInitialCapacity(int).
      - BooleanArrayList(int initialCapacity)
      - HashBag(int size)
      - BooleanList, IntList, etc.
   - Guava. A quick google search found libraries like
   https://github.com/yanaga/guava-stream that aggregate Collectors for
   guava collections.
   - The FutureJoiner I shared earlier
   - JDK classes that don't have Collectors factory method (ex.
   ArrayDeque(int numElements))

Thanks Tagir for the comments, and I updated the webrev:
http://august.nagro.us/presized-collectors/webrev2/index.html

- August

[1]:
https://www.eclipse.org/collections/javadoc/9.2.0/org/eclipse/collections/impl/collector/Collectors2.html

On Sun, Mar 3, 2019 at 9:08 PM Tagir Valeev <amaembo at gmail.com> wrote:

> Hello August.
>
> I'm not supporting the proposed spec change, but I have few comments
> about implementation (in case OpenJDK reviewers would support it).
>
> 1. joining(): Setting initial StringBuilder size to the number of
> characters which is equivalent to the number of stream elements would
> be a good idea only under assumption that every stream element has the
> length 1. Which is wrong in 99.9% of real-life cases.
> 2. flatMapping(), filtering(): Bypassing the original size to the
> downstream collector is plain wrong here as number of resulting
> elements can be either bigger (in flatMapping case) or arbitrarily
> smaller (in both cases) than the stream size. In the latter case you
> may just waste big amount of memory: a stream of million elements
> could be collapsed to 1-2 elements, but you would still allocate an
> ArrayList of a million of elements.
> 3. toMap(): a HashMap constructor parameter is an initial capacity,
> not the number of elements. Supplying it and ignoring the load factor
> you may force the HashMap to resize the table once which is
> suboptimal. You should pass an estimated number of elements divided by
> default load factor (and check corner cases as well).
> 4. teeing(): you defer null-check to the actual call of supplier or
> sizedSupplier. This is very bad as broken collector (e.g. returning
> null from sizedSupplier) may not fail fast for a long time if it's
> never used for sized stream.
> 5. ReduceOps: also you defer a null-check and capture not the
> function, but Collector object itself. This produces subtle behavioral
> change for no reason.
> 6. ReferencePipeline::collect: I don't see why such huge change is
> necessary.
>
> With best regards,
> Tagir Valeev.
>
> On Mon, Mar 4, 2019 at 7:36 AM August Nagro <augustnagro at gmail.com> wrote:
> >
> > Hi everyone,
> >
> > My implementation is at
> https://github.com/AugustNagro/presize-collectors-bench and I have a
> webrev here: http://august.nagro.us/presized-collectors/webrev/.
> >
> > The changes that I made were:
> >
> > Add `default IntFunction<A> sizedSupplier()` to Collector
> > Add 2 new Collector.of helper methods taking a sized supplier
> > Update the applicable collectors in Collectors to provide a sizedSupplier
> > Have ReferencePipeline & ReduceOps call sizedSupplier when appropriate
> >
> > Some notes/questions I have:
> >
> > I considered adding Collector.Characteristics.PRESIZEABLE, but figured
> it was not needed given that sizedSupplier should always delegate to
> supplier.
> > Collector.sizedSupplier could be a LongFunction instead of IntFunction
> (since Sink.begin provides long sizes), but every Collection type I've
> looked at uses int for initialCapacity, so I think IntFunction makes more
> sense.
> > StringJoiner should be updated to take an initalCapacity (should I
> commit this change in the webrev, or leave for another enhancement?)
> > What tests should I add for this enhancement?
> >
> > There are some benchmarks in the github repo. I was initially surprised
> when I saw that my bare-bones parallel stream did not have a throughput
> improvement like the serial one, but after doing some debugging I think it
> is from Collector.combiner dominating the runtime.
> >
> > Regards,
> >
> > August
> >
> > On Thu, Feb 28, 2019 at 12:56 AM Tagir Valeev <amaembo at gmail.com> wrote:
> >>
> >> Hello!
> >>
> >> I wouldn't use presized HashSet, because you never know how many
> duplicates are expected. What if the input stream has a million of
> elements, but only two of them are distinct? Do you really want to allocate
> a hash table for million elements in advance?
> >>
> >> For toMap() without custom supplier and merger preallocation sounds
> reasonable as in this case duplicating key is an exceptional case, so we
> may expect a specific number of elements.
> >>
> >> чт, 28 февр. 2019 г., 11:38 August Nagro augustnagro at gmail.com:
> >>>
> >>> Tagir,
> >>>
> >>> Great to see the validating benchmarks.
> >>>
> >>> > I think it's the best solution given the fact that very few
> collectors may benefit from the exact known size, and this benefit usually
> disappears when collectors are composed (e.g. using groupingBy: downstream
> toList() would not win anything if it provides a sizedSupplier).
> >>>
> >>> I would like to see your benchmarks for that statement too. After all,
> Hashmap, HashSet, etc all have presizing constructors, aren't those there
> for a reason?
> >>>
> >>> So I am still convinced that presizing is important for other
> collector types, including custom collectors. Although I'm open to having
> my mind changed.
> >>>
> >>>  I'm happy to contribute an implementation as well, I was hoping that
> this this (smallish) patch would help me learn the OpenJdk process and make
> future contributions easier!
> >>>
> >>> - August
> >>>
> >>> On Wed, Feb 27, 2019 at 1:06 AM Tagir Valeev <amaembo at gmail.com>
> wrote:
> >>>>
> >>>> Hello!
> >>>>
> >>>> > A less intrusive API direction might be a version of Collector whose
> >>>> > supplier function took a size-estimate argument; this might even
> help in
> >>>> > parallel since it allows for intermediate results to start with a
> better
> >>>> > initial size guess.  (And this could be implemented as a default
> that
> >>>> > delegates to the existing supplier.)  Still, not really sure this
> >>>> > carries its weight.
> >>>>
> >>>> It's interesting that such optimization is possible without API
> >>>> change, using the "hidden knowledge".
> >>>> While public API stays the same, the collect method implementation may
> >>>> check if custom
> >>>> non-public Collector implementation is supplied, and in this case may
> >>>> use the sized supplier.
> >>>> I think it's the best solution given the fact that very few collectors
> >>>> may benefit from the exact
> >>>> known size, and this benefit usually disappears when collectors are
> >>>> composed (e.g. using groupingBy:
> >>>> downstream toList() would not win anything if it provides a
> sizedSupplier).
> >>>>
> >>>> I created a quick patch and launched a quick benchmark like this:
> >>>> return new SplittableRandom(0).ints(length, 0,
> >>>> 100).boxed().collect(Collectors.toList());
> >>>> For length = 100, 10000, 1000000
> >>>>
> >>>> Here's results for vanilla 13-ea+8:
> >>>> length = 100, avg time = 1434,469 ± 27,147  ns/op, alloc rate = 1712
> B/op
> >>>> length = 10000, avg time = 155032,390 ± 2657,927  ns/op, alloc rate =
> >>>> 169280 B/op
> >>>> length = 1000000, avg time = 27099621,763 ± 1979054,500 ns/op, alloc
> >>>> rate = 14586750 B/op
> >>>>
> >>>> Patched:
> >>>> length = 100, avg time = 1414,480 ± 30,040  ns/op, alloc rate = 768
> B/op
> >>>> length = 10000, avg time = 137338,320 ± 1316,789  ns/op, alloc rate =
> 40368 B/op
> >>>> length = 1000000, avg time = 17654635,871 ± 210607,441  ns/op, alloc
> >>>> rate = 4000382 B/op
> >>>>
> >>>> As you can see, exact allocation really helps to reduce alloc pressure
> >>>> and produces better performance for big lists.
> >>>>
> >>>> If such kind of patch is acceptable for Stream API, I can file a
> >>>> ticket and submit a webrev.
> >>>>
> >>>> With best regards,
> >>>> Tagir Valeev
> >>>>
> >>>> The patch follows:
> >>>>
> >>>> --- src/java.base/share/classes/java/util/stream/Collectors.java
> >>>> (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a)
> >>>> +++ src/java.base/share/classes/java/util/stream/Collectors.java
> >>>> (revision 53626+:e2fc434b410a+)
> >>>> @@ -50,6 +50,7 @@
> >>>>  import java.util.function.BinaryOperator;
> >>>>  import java.util.function.Consumer;
> >>>>  import java.util.function.Function;
> >>>> +import java.util.function.IntFunction;
> >>>>  import java.util.function.Predicate;
> >>>>  import java.util.function.Supplier;
> >>>>  import java.util.function.ToDoubleFunction;
> >>>> @@ -194,23 +195,34 @@
> >>>>       */
> >>>>      static class CollectorImpl<T, A, R> implements Collector<T, A,
> R> {
> >>>>          private final Supplier<A> supplier;
> >>>> +        private final IntFunction<A> sizedSupplier;
> >>>>          private final BiConsumer<A, T> accumulator;
> >>>>          private final BinaryOperator<A> combiner;
> >>>>          private final Function<A, R> finisher;
> >>>>          private final Set<Characteristics> characteristics;
> >>>>
> >>>>          CollectorImpl(Supplier<A> supplier,
> >>>> +                      IntFunction<A> sizedSupplier,
> >>>>                        BiConsumer<A, T> accumulator,
> >>>>                        BinaryOperator<A> combiner,
> >>>>                        Function<A,R> finisher,
> >>>>                        Set<Characteristics> characteristics) {
> >>>>              this.supplier = supplier;
> >>>> +            this.sizedSupplier = sizedSupplier;
> >>>>              this.accumulator = accumulator;
> >>>>              this.combiner = combiner;
> >>>>              this.finisher = finisher;
> >>>>              this.characteristics = characteristics;
> >>>>          }
> >>>>
> >>>> +        CollectorImpl(Supplier<A> supplier,
> >>>> +                      BiConsumer<A, T> accumulator,
> >>>> +                      BinaryOperator<A> combiner,
> >>>> +                      Function<A,R> finisher,
> >>>> +                      Set<Characteristics> characteristics) {
> >>>> +            this(supplier, null, accumulator, combiner, finisher,
> >>>> characteristics);
> >>>> +        }
> >>>> +
> >>>>          CollectorImpl(Supplier<A> supplier,
> >>>>                        BiConsumer<A, T> accumulator,
> >>>>                        BinaryOperator<A> combiner,
> >>>> @@ -228,6 +240,10 @@
> >>>>              return supplier;
> >>>>          }
> >>>>
> >>>> +        IntFunction<A> sizedSupplier() {
> >>>> +            return sizedSupplier;
> >>>> +        }
> >>>> +
> >>>>          @Override
> >>>>          public BinaryOperator<A> combiner() {
> >>>>              return combiner;
> >>>> @@ -275,8 +291,11 @@
> >>>>       */
> >>>>      public static <T>
> >>>>      Collector<T, ?, List<T>> toList() {
> >>>> -        return new CollectorImpl<>((Supplier<List<T>>)
> >>>> ArrayList::new, List::add,
> >>>> +        return new CollectorImpl<>((Supplier<List<T>>)
> ArrayList::new,
> >>>> +                                   ArrayList::new,
> >>>> +                                   List::add,
> >>>>                                     (left, right) -> {
> >>>> left.addAll(right); return left; },
> >>>> +                                   castingIdentity(),
> >>>>                                     CH_ID);
> >>>>      }
> >>>>
> >>>> Index: src/java.base/share/classes/java/util/stream/ReduceOps.java
> >>>> IDEA additional info:
> >>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> >>>> <+>UTF-8
> >>>> ===================================================================
> >>>> --- src/java.base/share/classes/java/util/stream/ReduceOps.java
> >>>> (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a)
> >>>> +++ src/java.base/share/classes/java/util/stream/ReduceOps.java
> >>>> (revision 53626+:e2fc434b410a+)
> >>>> @@ -36,6 +36,7 @@
> >>>>  import java.util.function.BinaryOperator;
> >>>>  import java.util.function.DoubleBinaryOperator;
> >>>>  import java.util.function.IntBinaryOperator;
> >>>> +import java.util.function.IntFunction;
> >>>>  import java.util.function.LongBinaryOperator;
> >>>>  import java.util.function.ObjDoubleConsumer;
> >>>>  import java.util.function.ObjIntConsumer;
> >>>> @@ -155,13 +156,20 @@
> >>>>      public static <T, I> TerminalOp<T, I>
> >>>>      makeRef(Collector<? super T, I, ?> collector) {
> >>>>          Supplier<I> supplier =
> Objects.requireNonNull(collector).supplier();
> >>>> +        @SuppressWarnings("unchecked")
> >>>> +        IntFunction<I> sizedSupplier = collector instanceof
> >>>> Collectors.CollectorImpl ?
> >>>> +                ((Collectors.CollectorImpl<?, I, ?>)
> >>>> collector).sizedSupplier() : null;
> >>>>          BiConsumer<I, ? super T> accumulator =
> collector.accumulator();
> >>>>          BinaryOperator<I> combiner = collector.combiner();
> >>>>          class ReducingSink extends Box<I>
> >>>>                  implements AccumulatingSink<T, I, ReducingSink> {
> >>>>              @Override
> >>>>              public void begin(long size) {
> >>>> -                state = supplier.get();
> >>>> +                if (sizedSupplier != null && size >= 0 && size <=
> >>>> Integer.MAX_VALUE) {
> >>>> +                    state = sizedSupplier.apply((int) size);
> >>>> +                } else {
> >>>> +                    state = supplier.get();
> >>>> +                }
> >>>>              }
> >>>>
> >>>>              @Override
>