Proposal to enhance Stream.collect
Calling Stream.collect(Collector) is a popular terminal stream operation. But because the collect methods provide no detail of the stream's characteristics, collectors are not as efficient as they could be. For example, consider a non-parallel, sized stream that is to be collected as a List. This is a very common case for streams with a Collection source. Because of the stream characteristics, the Collector.supplier() could initialize a list with initial size (since the merging function will never be called), but the current implementation prevents this. I should note that the characteristics important to collectors are those defined by Spliterator, like: Spliterator::characteristics, Spliterator::estimateSize, and Spliterator::getExactSizeIfKnown. One way this enhancement could be implemented is by adding a method Stream.collect(Function<ReadOnlySpliterator, Collector> collectorBuilder). ReadOnlySpliterator would implement the spliterator methods mentioned above, and Spliterator would be made to implement this interface. For example, here is a gist with what Collectors.toList could look like: https://gist.github.com/AugustNagro/e66a0ddf7d47b4f11fec8760281bb538 ReadOnlySpliterator may need to be replaced with some stream specific abstraction, however, since Stream.spliterator() does not return with the correct characteristics. The below code returns false, for example (is this a bug?): Stream.of(1,2,3).parallel().map(i -> i+1).spliterator().hasCharacteristics(Spliterator.CONCURRENT) Looking forward to your thoughts, - August Nagro
If you already know the size and are not going to parallelize your stream, you can simply use Collectors.toCollection: Stream.of(1,2,3) .map(i -> i+1) .collect(Collectors.toCollection(() -> new ArrayList<>(3))); It could perhaps be a bit easier if Collectors.toList would be overloaded to accept the initial size, but that would restrict the method to always use ArrayList (or another List with a predefined capacity). On 23/02/2019 23:27, August Nagro wrote:
Calling Stream.collect(Collector) is a popular terminal stream operation. But because the collect methods provide no detail of the stream's characteristics, collectors are not as efficient as they could be.
For example, consider a non-parallel, sized stream that is to be collected as a List. This is a very common case for streams with a Collection source. Because of the stream characteristics, the Collector.supplier() could initialize a list with initial size (since the merging function will never be called), but the current implementation prevents this.
I should note that the characteristics important to collectors are those defined by Spliterator, like: Spliterator::characteristics, Spliterator::estimateSize, and Spliterator::getExactSizeIfKnown.
One way this enhancement could be implemented is by adding a method Stream.collect(Function<ReadOnlySpliterator, Collector> collectorBuilder). ReadOnlySpliterator would implement the spliterator methods mentioned above, and Spliterator would be made to implement this interface.
For example, here is a gist with what Collectors.toList could look like: https://gist.github.com/AugustNagro/e66a0ddf7d47b4f11fec8760281bb538
ReadOnlySpliterator may need to be replaced with some stream specific abstraction, however, since Stream.spliterator() does not return with the correct characteristics. The below code returns false, for example (is this a bug?):
Stream.of(1,2,3).parallel().map(i -> i+1).spliterator().hasCharacteristics(Spliterator.CONCURRENT)
Looking forward to your thoughts,
- August Nagro
We did consider this problem when designing the Collector API; for example, it would have been nice if we could have a `toArray()` collector that had all the optimizations of `Stream::toArray`. When we looked into it, we found a number of concerning details that caused us to turn back (many of which you've already identified), such as the difficulty of managing parallelism, the intrusion into the API, etc. What we found is that all this additional complexity was basically in aid of only a few use cases -- such as collecting into a pre-sized ArrayList. Where are the next hundred use cases for such a mechanism, that would justify this incremental API complexity? We didn't see them, but maybe there are some. A less intrusive API direction might be a version of Collector whose supplier function took a size-estimate argument; this might even help in parallel since it allows for intermediate results to start with a better initial size guess. (And this could be implemented as a default that delegates to the existing supplier.) Still, not really sure this carries its weight.
The below code returns false, for example (is this a bug?):
Stream.of(1,2,3).parallel().map(i -> i+1).spliterator().hasCharacteristics(Spliterator.CONCURRENT)
Not a bug. The `Stream::spliterator` method (along with `iterator`) is provided as an "escape hatch" for operations that need to get at the elements but which cannot be easily expressed using the Stream API. This method makes a good-faith attempt to propagate a reasonable set of characteristics (for a stream with no intermediate ops, it does delegate to the underlying source for its spliterator), but, given that `Stream` is not in fact a data structure, when there is nontrivial computation on the actual source, a relatively bare-bones spliterator is provided. While this could probably be improved in specific cases, the return on effort (and risk) is likely to be low, because `Stream::spliterator` is already an infrequently used method, and it would only matter in a small fraction of those cases. So you're in "corner case of a corner case" territory. On 2/23/2019 5:27 PM, August Nagro wrote:
Calling Stream.collect(Collector) is a popular terminal stream operation. But because the collect methods provide no detail of the stream's characteristics, collectors are not as efficient as they could be.
For example, consider a non-parallel, sized stream that is to be collected as a List. This is a very common case for streams with a Collection source. Because of the stream characteristics, the Collector.supplier() could initialize a list with initial size (since the merging function will never be called), but the current implementation prevents this.
I should note that the characteristics important to collectors are those defined by Spliterator, like: Spliterator::characteristics, Spliterator::estimateSize, and Spliterator::getExactSizeIfKnown.
One way this enhancement could be implemented is by adding a method Stream.collect(Function<ReadOnlySpliterator, Collector> collectorBuilder). ReadOnlySpliterator would implement the spliterator methods mentioned above, and Spliterator would be made to implement this interface.
For example, here is a gist with what Collectors.toList could look like: https://gist.github.com/AugustNagro/e66a0ddf7d47b4f11fec8760281bb538
ReadOnlySpliterator may need to be replaced with some stream specific abstraction, however, since Stream.spliterator() does not return with the correct characteristics. The below code returns false, for example (is this a bug?):
Stream.of(1,2,3).parallel().map(i -> i+1).spliterator().hasCharacteristics(Spliterator.CONCURRENT)
Looking forward to your thoughts,
- August Nagro
Thanks for the feedback. I really like Brian's suggestion for a default Collector.supplier(sizeEstimate), since exposing the full stream characteristics is overkill. I don't think the use of this improvement is limited to pre-sizing ArrayLists, though. Some other Collectors that come to mind are StringJoiner (Collectors.joining) and my FutureJoiner [1], which collects a stream of CompletableFuture<T> into a CompletableFuture<List<T>>. And even if Collectors.toList was the only collector to benefit from pre-sizing, I think it is still worth the effort, as this is perhaps the most used Collector. [1]: https://gist.github.com/AugustNagro/e92c48e39f891f16500f36c33839383a On Sun, Feb 24, 2019 at 8:51 AM Brian Goetz <brian.goetz@oracle.com> wrote:
We did consider this problem when designing the Collector API; for example, it would have been nice if we could have a `toArray()` collector that had all the optimizations of `Stream::toArray`.
When we looked into it, we found a number of concerning details that caused us to turn back (many of which you've already identified), such as the difficulty of managing parallelism, the intrusion into the API, etc. What we found is that all this additional complexity was basically in aid of only a few use cases -- such as collecting into a pre-sized ArrayList. Where are the next hundred use cases for such a mechanism, that would justify this incremental API complexity? We didn't see them, but maybe there are some.
A less intrusive API direction might be a version of Collector whose supplier function took a size-estimate argument; this might even help in parallel since it allows for intermediate results to start with a better initial size guess. (And this could be implemented as a default that delegates to the existing supplier.) Still, not really sure this carries its weight.
The below code returns false, for example (is this a bug?):
Stream.of(1,2,3).parallel().map(i -> i+1).spliterator().hasCharacteristics(Spliterator.CONCURRENT)
Not a bug. The `Stream::spliterator` method (along with `iterator`) is provided as an "escape hatch" for operations that need to get at the elements but which cannot be easily expressed using the Stream API. This method makes a good-faith attempt to propagate a reasonable set of characteristics (for a stream with no intermediate ops, it does delegate to the underlying source for its spliterator), but, given that `Stream` is not in fact a data structure, when there is nontrivial computation on the actual source, a relatively bare-bones spliterator is provided.
While this could probably be improved in specific cases, the return on effort (and risk) is likely to be low, because `Stream::spliterator` is already an infrequently used method, and it would only matter in a small fraction of those cases. So you're in "corner case of a corner case" territory.
On 2/23/2019 5:27 PM, August Nagro wrote:
Calling Stream.collect(Collector) is a popular terminal stream operation. But because the collect methods provide no detail of the stream's characteristics, collectors are not as efficient as they could be.
For example, consider a non-parallel, sized stream that is to be collected as a List. This is a very common case for streams with a Collection source. Because of the stream characteristics, the Collector.supplier() could initialize a list with initial size (since the merging function will never be called), but the current implementation prevents this.
I should note that the characteristics important to collectors are those defined by Spliterator, like: Spliterator::characteristics, Spliterator::estimateSize, and Spliterator::getExactSizeIfKnown.
One way this enhancement could be implemented is by adding a method Stream.collect(Function<ReadOnlySpliterator, Collector> collectorBuilder). ReadOnlySpliterator would implement the spliterator methods mentioned above, and Spliterator would be made to implement this interface.
For example, here is a gist with what Collectors.toList could look like: https://gist.github.com/AugustNagro/e66a0ddf7d47b4f11fec8760281bb538
ReadOnlySpliterator may need to be replaced with some stream specific abstraction, however, since Stream.spliterator() does not return with the correct characteristics. The below code returns false, for example (is this a bug?):
Stream.of(1,2,3).parallel().map(i -> i+1).spliterator().hasCharacteristics(Spliterator.CONCURRENT)
Looking forward to your thoughts,
- August Nagro
Hello!
A less intrusive API direction might be a version of Collector whose supplier function took a size-estimate argument; this might even help in parallel since it allows for intermediate results to start with a better initial size guess. (And this could be implemented as a default that delegates to the existing supplier.) Still, not really sure this carries its weight.
It's interesting that such optimization is possible without API change, using the "hidden knowledge". While public API stays the same, the collect method implementation may check if custom non-public Collector implementation is supplied, and in this case may use the sized supplier. I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier). I created a quick patch and launched a quick benchmark like this: return new SplittableRandom(0).ints(length, 0, 100).boxed().collect(Collectors.toList()); For length = 100, 10000, 1000000 Here's results for vanilla 13-ea+8: length = 100, avg time = 1434,469 ± 27,147 ns/op, alloc rate = 1712 B/op length = 10000, avg time = 155032,390 ± 2657,927 ns/op, alloc rate = 169280 B/op length = 1000000, avg time = 27099621,763 ± 1979054,500 ns/op, alloc rate = 14586750 B/op Patched: length = 100, avg time = 1414,480 ± 30,040 ns/op, alloc rate = 768 B/op length = 10000, avg time = 137338,320 ± 1316,789 ns/op, alloc rate = 40368 B/op length = 1000000, avg time = 17654635,871 ± 210607,441 ns/op, alloc rate = 4000382 B/op As you can see, exact allocation really helps to reduce alloc pressure and produces better performance for big lists. If such kind of patch is acceptable for Stream API, I can file a ticket and submit a webrev. With best regards, Tagir Valeev The patch follows: --- src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626+:e2fc434b410a+) @@ -50,6 +50,7 @@ import java.util.function.BinaryOperator; import java.util.function.Consumer; import java.util.function.Function; +import java.util.function.IntFunction; import java.util.function.Predicate; import java.util.function.Supplier; import java.util.function.ToDoubleFunction; @@ -194,23 +195,34 @@ */ static class CollectorImpl<T, A, R> implements Collector<T, A, R> { private final Supplier<A> supplier; + private final IntFunction<A> sizedSupplier; private final BiConsumer<A, T> accumulator; private final BinaryOperator<A> combiner; private final Function<A, R> finisher; private final Set<Characteristics> characteristics; CollectorImpl(Supplier<A> supplier, + IntFunction<A> sizedSupplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, Function<A,R> finisher, Set<Characteristics> characteristics) { this.supplier = supplier; + this.sizedSupplier = sizedSupplier; this.accumulator = accumulator; this.combiner = combiner; this.finisher = finisher; this.characteristics = characteristics; } + CollectorImpl(Supplier<A> supplier, + BiConsumer<A, T> accumulator, + BinaryOperator<A> combiner, + Function<A,R> finisher, + Set<Characteristics> characteristics) { + this(supplier, null, accumulator, combiner, finisher, characteristics); + } + CollectorImpl(Supplier<A> supplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, @@ -228,6 +240,10 @@ return supplier; } + IntFunction<A> sizedSupplier() { + return sizedSupplier; + } + @Override public BinaryOperator<A> combiner() { return combiner; @@ -275,8 +291,11 @@ */ public static <T> Collector<T, ?, List<T>> toList() { - return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add, + return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, + ArrayList::new, + List::add, (left, right) -> { left.addAll(right); return left; }, + castingIdentity(), CH_ID); } Index: src/java.base/share/classes/java/util/stream/ReduceOps.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626+:e2fc434b410a+) @@ -36,6 +36,7 @@ import java.util.function.BinaryOperator; import java.util.function.DoubleBinaryOperator; import java.util.function.IntBinaryOperator; +import java.util.function.IntFunction; import java.util.function.LongBinaryOperator; import java.util.function.ObjDoubleConsumer; import java.util.function.ObjIntConsumer; @@ -155,13 +156,20 @@ public static <T, I> TerminalOp<T, I> makeRef(Collector<? super T, I, ?> collector) { Supplier<I> supplier = Objects.requireNonNull(collector).supplier(); + @SuppressWarnings("unchecked") + IntFunction<I> sizedSupplier = collector instanceof Collectors.CollectorImpl ? + ((Collectors.CollectorImpl<?, I, ?>) collector).sizedSupplier() : null; BiConsumer<I, ? super T> accumulator = collector.accumulator(); BinaryOperator<I> combiner = collector.combiner(); class ReducingSink extends Box<I> implements AccumulatingSink<T, I, ReducingSink> { @Override public void begin(long size) { - state = supplier.get(); + if (sizedSupplier != null && size >= 0 && size <= Integer.MAX_VALUE) { + state = sizedSupplier.apply((int) size); + } else { + state = supplier.get(); + } } @Override
Tagir, Great to see the validating benchmarks.
I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier).
I would like to see your benchmarks for that statement too. After all, Hashmap, HashSet, etc all have presizing constructors, aren't those there for a reason? So I am still convinced that presizing is important for other collector types, including custom collectors. Although I'm open to having my mind changed. I'm happy to contribute an implementation as well, I was hoping that this this (smallish) patch would help me learn the OpenJdk process and make future contributions easier! - August On Wed, Feb 27, 2019 at 1:06 AM Tagir Valeev <amaembo@gmail.com> wrote:
Hello!
A less intrusive API direction might be a version of Collector whose supplier function took a size-estimate argument; this might even help in parallel since it allows for intermediate results to start with a better initial size guess. (And this could be implemented as a default that delegates to the existing supplier.) Still, not really sure this carries its weight.
It's interesting that such optimization is possible without API change, using the "hidden knowledge". While public API stays the same, the collect method implementation may check if custom non-public Collector implementation is supplied, and in this case may use the sized supplier. I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier).
I created a quick patch and launched a quick benchmark like this: return new SplittableRandom(0).ints(length, 0, 100).boxed().collect(Collectors.toList()); For length = 100, 10000, 1000000
Here's results for vanilla 13-ea+8: length = 100, avg time = 1434,469 ± 27,147 ns/op, alloc rate = 1712 B/op length = 10000, avg time = 155032,390 ± 2657,927 ns/op, alloc rate = 169280 B/op length = 1000000, avg time = 27099621,763 ± 1979054,500 ns/op, alloc rate = 14586750 B/op
Patched: length = 100, avg time = 1414,480 ± 30,040 ns/op, alloc rate = 768 B/op length = 10000, avg time = 137338,320 ± 1316,789 ns/op, alloc rate = 40368 B/op length = 1000000, avg time = 17654635,871 ± 210607,441 ns/op, alloc rate = 4000382 B/op
As you can see, exact allocation really helps to reduce alloc pressure and produces better performance for big lists.
If such kind of patch is acceptable for Stream API, I can file a ticket and submit a webrev.
With best regards, Tagir Valeev
The patch follows:
--- src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626+:e2fc434b410a+) @@ -50,6 +50,7 @@ import java.util.function.BinaryOperator; import java.util.function.Consumer; import java.util.function.Function; +import java.util.function.IntFunction; import java.util.function.Predicate; import java.util.function.Supplier; import java.util.function.ToDoubleFunction; @@ -194,23 +195,34 @@ */ static class CollectorImpl<T, A, R> implements Collector<T, A, R> { private final Supplier<A> supplier; + private final IntFunction<A> sizedSupplier; private final BiConsumer<A, T> accumulator; private final BinaryOperator<A> combiner; private final Function<A, R> finisher; private final Set<Characteristics> characteristics;
CollectorImpl(Supplier<A> supplier, + IntFunction<A> sizedSupplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, Function<A,R> finisher, Set<Characteristics> characteristics) { this.supplier = supplier; + this.sizedSupplier = sizedSupplier; this.accumulator = accumulator; this.combiner = combiner; this.finisher = finisher; this.characteristics = characteristics; }
+ CollectorImpl(Supplier<A> supplier, + BiConsumer<A, T> accumulator, + BinaryOperator<A> combiner, + Function<A,R> finisher, + Set<Characteristics> characteristics) { + this(supplier, null, accumulator, combiner, finisher, characteristics); + } + CollectorImpl(Supplier<A> supplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, @@ -228,6 +240,10 @@ return supplier; }
+ IntFunction<A> sizedSupplier() { + return sizedSupplier; + } + @Override public BinaryOperator<A> combiner() { return combiner; @@ -275,8 +291,11 @@ */ public static <T> Collector<T, ?, List<T>> toList() { - return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add, + return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, + ArrayList::new, + List::add, (left, right) -> { left.addAll(right); return left; }, + castingIdentity(), CH_ID); }
Index: src/java.base/share/classes/java/util/stream/ReduceOps.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626+:e2fc434b410a+) @@ -36,6 +36,7 @@ import java.util.function.BinaryOperator; import java.util.function.DoubleBinaryOperator; import java.util.function.IntBinaryOperator; +import java.util.function.IntFunction; import java.util.function.LongBinaryOperator; import java.util.function.ObjDoubleConsumer; import java.util.function.ObjIntConsumer; @@ -155,13 +156,20 @@ public static <T, I> TerminalOp<T, I> makeRef(Collector<? super T, I, ?> collector) { Supplier<I> supplier = Objects.requireNonNull(collector).supplier(); + @SuppressWarnings("unchecked") + IntFunction<I> sizedSupplier = collector instanceof Collectors.CollectorImpl ? + ((Collectors.CollectorImpl<?, I, ?>) collector).sizedSupplier() : null; BiConsumer<I, ? super T> accumulator = collector.accumulator(); BinaryOperator<I> combiner = collector.combiner(); class ReducingSink extends Box<I> implements AccumulatingSink<T, I, ReducingSink> { @Override public void begin(long size) { - state = supplier.get(); + if (sizedSupplier != null && size >= 0 && size <= Integer.MAX_VALUE) { + state = sizedSupplier.apply((int) size); + } else { + state = supplier.get(); + } }
@Override
Hello! I wouldn't use presized HashSet, because you never know how many duplicates are expected. What if the input stream has a million of elements, but only two of them are distinct? Do you really want to allocate a hash table for million elements in advance? For toMap() without custom supplier and merger preallocation sounds reasonable as in this case duplicating key is an exceptional case, so we may expect a specific number of elements. чт, 28 февр. 2019 г., 11:38 August Nagro augustnagro@gmail.com:
Tagir,
Great to see the validating benchmarks.
I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier).
I would like to see your benchmarks for that statement too. After all, Hashmap, HashSet, etc all have presizing constructors, aren't those there for a reason?
So I am still convinced that presizing is important for other collector types, including custom collectors. Although I'm open to having my mind changed.
I'm happy to contribute an implementation as well, I was hoping that this this (smallish) patch would help me learn the OpenJdk process and make future contributions easier!
- August
On Wed, Feb 27, 2019 at 1:06 AM Tagir Valeev <amaembo@gmail.com> wrote:
Hello!
A less intrusive API direction might be a version of Collector whose supplier function took a size-estimate argument; this might even help in parallel since it allows for intermediate results to start with a better initial size guess. (And this could be implemented as a default that delegates to the existing supplier.) Still, not really sure this carries its weight.
It's interesting that such optimization is possible without API change, using the "hidden knowledge". While public API stays the same, the collect method implementation may check if custom non-public Collector implementation is supplied, and in this case may use the sized supplier. I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier).
I created a quick patch and launched a quick benchmark like this: return new SplittableRandom(0).ints(length, 0, 100).boxed().collect(Collectors.toList()); For length = 100, 10000, 1000000
Here's results for vanilla 13-ea+8: length = 100, avg time = 1434,469 ± 27,147 ns/op, alloc rate = 1712 B/op length = 10000, avg time = 155032,390 ± 2657,927 ns/op, alloc rate = 169280 B/op length = 1000000, avg time = 27099621,763 ± 1979054,500 ns/op, alloc rate = 14586750 B/op
Patched: length = 100, avg time = 1414,480 ± 30,040 ns/op, alloc rate = 768 B/op length = 10000, avg time = 137338,320 ± 1316,789 ns/op, alloc rate = 40368 B/op length = 1000000, avg time = 17654635,871 ± 210607,441 ns/op, alloc rate = 4000382 B/op
As you can see, exact allocation really helps to reduce alloc pressure and produces better performance for big lists.
If such kind of patch is acceptable for Stream API, I can file a ticket and submit a webrev.
With best regards, Tagir Valeev
The patch follows:
--- src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626+:e2fc434b410a+) @@ -50,6 +50,7 @@ import java.util.function.BinaryOperator; import java.util.function.Consumer; import java.util.function.Function; +import java.util.function.IntFunction; import java.util.function.Predicate; import java.util.function.Supplier; import java.util.function.ToDoubleFunction; @@ -194,23 +195,34 @@ */ static class CollectorImpl<T, A, R> implements Collector<T, A, R> { private final Supplier<A> supplier; + private final IntFunction<A> sizedSupplier; private final BiConsumer<A, T> accumulator; private final BinaryOperator<A> combiner; private final Function<A, R> finisher; private final Set<Characteristics> characteristics;
CollectorImpl(Supplier<A> supplier, + IntFunction<A> sizedSupplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, Function<A,R> finisher, Set<Characteristics> characteristics) { this.supplier = supplier; + this.sizedSupplier = sizedSupplier; this.accumulator = accumulator; this.combiner = combiner; this.finisher = finisher; this.characteristics = characteristics; }
+ CollectorImpl(Supplier<A> supplier, + BiConsumer<A, T> accumulator, + BinaryOperator<A> combiner, + Function<A,R> finisher, + Set<Characteristics> characteristics) { + this(supplier, null, accumulator, combiner, finisher, characteristics); + } + CollectorImpl(Supplier<A> supplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, @@ -228,6 +240,10 @@ return supplier; }
+ IntFunction<A> sizedSupplier() { + return sizedSupplier; + } + @Override public BinaryOperator<A> combiner() { return combiner; @@ -275,8 +291,11 @@ */ public static <T> Collector<T, ?, List<T>> toList() { - return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add, + return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, + ArrayList::new, + List::add, (left, right) -> { left.addAll(right); return left; }, + castingIdentity(), CH_ID); }
Index: src/java.base/share/classes/java/util/stream/ReduceOps.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626+:e2fc434b410a+) @@ -36,6 +36,7 @@ import java.util.function.BinaryOperator; import java.util.function.DoubleBinaryOperator; import java.util.function.IntBinaryOperator; +import java.util.function.IntFunction; import java.util.function.LongBinaryOperator; import java.util.function.ObjDoubleConsumer; import java.util.function.ObjIntConsumer; @@ -155,13 +156,20 @@ public static <T, I> TerminalOp<T, I> makeRef(Collector<? super T, I, ?> collector) { Supplier<I> supplier = Objects.requireNonNull(collector).supplier(); + @SuppressWarnings("unchecked") + IntFunction<I> sizedSupplier = collector instanceof Collectors.CollectorImpl ? + ((Collectors.CollectorImpl<?, I, ?>) collector).sizedSupplier() : null; BiConsumer<I, ? super T> accumulator = collector.accumulator(); BinaryOperator<I> combiner = collector.combiner(); class ReducingSink extends Box<I> implements AccumulatingSink<T, I, ReducingSink> { @Override public void begin(long size) { - state = supplier.get(); + if (sizedSupplier != null && size >= 0 && size <= Integer.MAX_VALUE) { + state = sizedSupplier.apply((int) size); + } else { + state = supplier.get(); + } }
@Override
Hi everyone, My implementation is at https://github.com/AugustNagro/presize-collectors-bench and I have a webrev here: http://august.nagro.us/presized-collectors/webrev/. The changes that I made were: - Add `default IntFunction<A> sizedSupplier()` to Collector - Add 2 new Collector.of helper methods taking a sized supplier - Update the applicable collectors in Collectors to provide a sizedSupplier - Have ReferencePipeline & ReduceOps call sizedSupplier when appropriate Some notes/questions I have: - I considered adding Collector.Characteristics.PRESIZEABLE, but figured it was not needed given that sizedSupplier should always delegate to supplier. - Collector.sizedSupplier could be a LongFunction instead of IntFunction (since Sink.begin provides long sizes), but every Collection type I've looked at uses int for initialCapacity, so I think IntFunction makes more sense. - StringJoiner should be updated to take an initalCapacity (should I commit this change in the webrev, or leave for another enhancement?) - What tests should I add for this enhancement? There are some benchmarks in the github repo. I was initially surprised when I saw that my bare-bones parallel stream did not have a throughput improvement like the serial one, but after doing some debugging I think it is from Collector.combiner dominating the runtime. Regards, August On Thu, Feb 28, 2019 at 12:56 AM Tagir Valeev <amaembo@gmail.com> wrote:
Hello!
I wouldn't use presized HashSet, because you never know how many duplicates are expected. What if the input stream has a million of elements, but only two of them are distinct? Do you really want to allocate a hash table for million elements in advance?
For toMap() without custom supplier and merger preallocation sounds reasonable as in this case duplicating key is an exceptional case, so we may expect a specific number of elements.
чт, 28 февр. 2019 г., 11:38 August Nagro augustnagro@gmail.com:
Tagir,
Great to see the validating benchmarks.
I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier).
I would like to see your benchmarks for that statement too. After all, Hashmap, HashSet, etc all have presizing constructors, aren't those there for a reason?
So I am still convinced that presizing is important for other collector types, including custom collectors. Although I'm open to having my mind changed.
I'm happy to contribute an implementation as well, I was hoping that this this (smallish) patch would help me learn the OpenJdk process and make future contributions easier!
- August
On Wed, Feb 27, 2019 at 1:06 AM Tagir Valeev <amaembo@gmail.com> wrote:
Hello!
A less intrusive API direction might be a version of Collector whose supplier function took a size-estimate argument; this might even help in parallel since it allows for intermediate results to start with a better initial size guess. (And this could be implemented as a default that delegates to the existing supplier.) Still, not really sure this carries its weight.
It's interesting that such optimization is possible without API change, using the "hidden knowledge". While public API stays the same, the collect method implementation may check if custom non-public Collector implementation is supplied, and in this case may use the sized supplier. I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier).
I created a quick patch and launched a quick benchmark like this: return new SplittableRandom(0).ints(length, 0, 100).boxed().collect(Collectors.toList()); For length = 100, 10000, 1000000
Here's results for vanilla 13-ea+8: length = 100, avg time = 1434,469 ± 27,147 ns/op, alloc rate = 1712 B/op length = 10000, avg time = 155032,390 ± 2657,927 ns/op, alloc rate = 169280 B/op length = 1000000, avg time = 27099621,763 ± 1979054,500 ns/op, alloc rate = 14586750 B/op
Patched: length = 100, avg time = 1414,480 ± 30,040 ns/op, alloc rate = 768 B/op length = 10000, avg time = 137338,320 ± 1316,789 ns/op, alloc rate = 40368 B/op length = 1000000, avg time = 17654635,871 ± 210607,441 ns/op, alloc rate = 4000382 B/op
As you can see, exact allocation really helps to reduce alloc pressure and produces better performance for big lists.
If such kind of patch is acceptable for Stream API, I can file a ticket and submit a webrev.
With best regards, Tagir Valeev
The patch follows:
--- src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626+:e2fc434b410a+) @@ -50,6 +50,7 @@ import java.util.function.BinaryOperator; import java.util.function.Consumer; import java.util.function.Function; +import java.util.function.IntFunction; import java.util.function.Predicate; import java.util.function.Supplier; import java.util.function.ToDoubleFunction; @@ -194,23 +195,34 @@ */ static class CollectorImpl<T, A, R> implements Collector<T, A, R> { private final Supplier<A> supplier; + private final IntFunction<A> sizedSupplier; private final BiConsumer<A, T> accumulator; private final BinaryOperator<A> combiner; private final Function<A, R> finisher; private final Set<Characteristics> characteristics;
CollectorImpl(Supplier<A> supplier, + IntFunction<A> sizedSupplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, Function<A,R> finisher, Set<Characteristics> characteristics) { this.supplier = supplier; + this.sizedSupplier = sizedSupplier; this.accumulator = accumulator; this.combiner = combiner; this.finisher = finisher; this.characteristics = characteristics; }
+ CollectorImpl(Supplier<A> supplier, + BiConsumer<A, T> accumulator, + BinaryOperator<A> combiner, + Function<A,R> finisher, + Set<Characteristics> characteristics) { + this(supplier, null, accumulator, combiner, finisher, characteristics); + } + CollectorImpl(Supplier<A> supplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, @@ -228,6 +240,10 @@ return supplier; }
+ IntFunction<A> sizedSupplier() { + return sizedSupplier; + } + @Override public BinaryOperator<A> combiner() { return combiner; @@ -275,8 +291,11 @@ */ public static <T> Collector<T, ?, List<T>> toList() { - return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add, + return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, + ArrayList::new, + List::add, (left, right) -> { left.addAll(right); return left; }, + castingIdentity(), CH_ID); }
Index: src/java.base/share/classes/java/util/stream/ReduceOps.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626+:e2fc434b410a+) @@ -36,6 +36,7 @@ import java.util.function.BinaryOperator; import java.util.function.DoubleBinaryOperator; import java.util.function.IntBinaryOperator; +import java.util.function.IntFunction; import java.util.function.LongBinaryOperator; import java.util.function.ObjDoubleConsumer; import java.util.function.ObjIntConsumer; @@ -155,13 +156,20 @@ public static <T, I> TerminalOp<T, I> makeRef(Collector<? super T, I, ?> collector) { Supplier<I> supplier = Objects.requireNonNull(collector).supplier(); + @SuppressWarnings("unchecked") + IntFunction<I> sizedSupplier = collector instanceof Collectors.CollectorImpl ? + ((Collectors.CollectorImpl<?, I, ?>) collector).sizedSupplier() : null; BiConsumer<I, ? super T> accumulator = collector.accumulator(); BinaryOperator<I> combiner = collector.combiner(); class ReducingSink extends Box<I> implements AccumulatingSink<T, I, ReducingSink> { @Override public void begin(long size) { - state = supplier.get(); + if (sizedSupplier != null && size >= 0 && size <= Integer.MAX_VALUE) { + state = sizedSupplier.apply((int) size); + } else { + state = supplier.get(); + } }
@Override
Hello August. I'm not supporting the proposed spec change, but I have few comments about implementation (in case OpenJDK reviewers would support it). 1. joining(): Setting initial StringBuilder size to the number of characters which is equivalent to the number of stream elements would be a good idea only under assumption that every stream element has the length 1. Which is wrong in 99.9% of real-life cases. 2. flatMapping(), filtering(): Bypassing the original size to the downstream collector is plain wrong here as number of resulting elements can be either bigger (in flatMapping case) or arbitrarily smaller (in both cases) than the stream size. In the latter case you may just waste big amount of memory: a stream of million elements could be collapsed to 1-2 elements, but you would still allocate an ArrayList of a million of elements. 3. toMap(): a HashMap constructor parameter is an initial capacity, not the number of elements. Supplying it and ignoring the load factor you may force the HashMap to resize the table once which is suboptimal. You should pass an estimated number of elements divided by default load factor (and check corner cases as well). 4. teeing(): you defer null-check to the actual call of supplier or sizedSupplier. This is very bad as broken collector (e.g. returning null from sizedSupplier) may not fail fast for a long time if it's never used for sized stream. 5. ReduceOps: also you defer a null-check and capture not the function, but Collector object itself. This produces subtle behavioral change for no reason. 6. ReferencePipeline::collect: I don't see why such huge change is necessary. With best regards, Tagir Valeev. On Mon, Mar 4, 2019 at 7:36 AM August Nagro <augustnagro@gmail.com> wrote:
Hi everyone,
My implementation is at https://github.com/AugustNagro/presize-collectors-bench and I have a webrev here: http://august.nagro.us/presized-collectors/webrev/.
The changes that I made were:
Add `default IntFunction<A> sizedSupplier()` to Collector Add 2 new Collector.of helper methods taking a sized supplier Update the applicable collectors in Collectors to provide a sizedSupplier Have ReferencePipeline & ReduceOps call sizedSupplier when appropriate
Some notes/questions I have:
I considered adding Collector.Characteristics.PRESIZEABLE, but figured it was not needed given that sizedSupplier should always delegate to supplier. Collector.sizedSupplier could be a LongFunction instead of IntFunction (since Sink.begin provides long sizes), but every Collection type I've looked at uses int for initialCapacity, so I think IntFunction makes more sense. StringJoiner should be updated to take an initalCapacity (should I commit this change in the webrev, or leave for another enhancement?) What tests should I add for this enhancement?
There are some benchmarks in the github repo. I was initially surprised when I saw that my bare-bones parallel stream did not have a throughput improvement like the serial one, but after doing some debugging I think it is from Collector.combiner dominating the runtime.
Regards,
August
On Thu, Feb 28, 2019 at 12:56 AM Tagir Valeev <amaembo@gmail.com> wrote:
Hello!
I wouldn't use presized HashSet, because you never know how many duplicates are expected. What if the input stream has a million of elements, but only two of them are distinct? Do you really want to allocate a hash table for million elements in advance?
For toMap() without custom supplier and merger preallocation sounds reasonable as in this case duplicating key is an exceptional case, so we may expect a specific number of elements.
чт, 28 февр. 2019 г., 11:38 August Nagro augustnagro@gmail.com:
Tagir,
Great to see the validating benchmarks.
I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier).
I would like to see your benchmarks for that statement too. After all, Hashmap, HashSet, etc all have presizing constructors, aren't those there for a reason?
So I am still convinced that presizing is important for other collector types, including custom collectors. Although I'm open to having my mind changed.
I'm happy to contribute an implementation as well, I was hoping that this this (smallish) patch would help me learn the OpenJdk process and make future contributions easier!
- August
On Wed, Feb 27, 2019 at 1:06 AM Tagir Valeev <amaembo@gmail.com> wrote:
Hello!
A less intrusive API direction might be a version of Collector whose supplier function took a size-estimate argument; this might even help in parallel since it allows for intermediate results to start with a better initial size guess. (And this could be implemented as a default that delegates to the existing supplier.) Still, not really sure this carries its weight.
It's interesting that such optimization is possible without API change, using the "hidden knowledge". While public API stays the same, the collect method implementation may check if custom non-public Collector implementation is supplied, and in this case may use the sized supplier. I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier).
I created a quick patch and launched a quick benchmark like this: return new SplittableRandom(0).ints(length, 0, 100).boxed().collect(Collectors.toList()); For length = 100, 10000, 1000000
Here's results for vanilla 13-ea+8: length = 100, avg time = 1434,469 ± 27,147 ns/op, alloc rate = 1712 B/op length = 10000, avg time = 155032,390 ± 2657,927 ns/op, alloc rate = 169280 B/op length = 1000000, avg time = 27099621,763 ± 1979054,500 ns/op, alloc rate = 14586750 B/op
Patched: length = 100, avg time = 1414,480 ± 30,040 ns/op, alloc rate = 768 B/op length = 10000, avg time = 137338,320 ± 1316,789 ns/op, alloc rate = 40368 B/op length = 1000000, avg time = 17654635,871 ± 210607,441 ns/op, alloc rate = 4000382 B/op
As you can see, exact allocation really helps to reduce alloc pressure and produces better performance for big lists.
If such kind of patch is acceptable for Stream API, I can file a ticket and submit a webrev.
With best regards, Tagir Valeev
The patch follows:
--- src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626+:e2fc434b410a+) @@ -50,6 +50,7 @@ import java.util.function.BinaryOperator; import java.util.function.Consumer; import java.util.function.Function; +import java.util.function.IntFunction; import java.util.function.Predicate; import java.util.function.Supplier; import java.util.function.ToDoubleFunction; @@ -194,23 +195,34 @@ */ static class CollectorImpl<T, A, R> implements Collector<T, A, R> { private final Supplier<A> supplier; + private final IntFunction<A> sizedSupplier; private final BiConsumer<A, T> accumulator; private final BinaryOperator<A> combiner; private final Function<A, R> finisher; private final Set<Characteristics> characteristics;
CollectorImpl(Supplier<A> supplier, + IntFunction<A> sizedSupplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, Function<A,R> finisher, Set<Characteristics> characteristics) { this.supplier = supplier; + this.sizedSupplier = sizedSupplier; this.accumulator = accumulator; this.combiner = combiner; this.finisher = finisher; this.characteristics = characteristics; }
+ CollectorImpl(Supplier<A> supplier, + BiConsumer<A, T> accumulator, + BinaryOperator<A> combiner, + Function<A,R> finisher, + Set<Characteristics> characteristics) { + this(supplier, null, accumulator, combiner, finisher, characteristics); + } + CollectorImpl(Supplier<A> supplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, @@ -228,6 +240,10 @@ return supplier; }
+ IntFunction<A> sizedSupplier() { + return sizedSupplier; + } + @Override public BinaryOperator<A> combiner() { return combiner; @@ -275,8 +291,11 @@ */ public static <T> Collector<T, ?, List<T>> toList() { - return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add, + return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, + ArrayList::new, + List::add, (left, right) -> { left.addAll(right); return left; }, + castingIdentity(), CH_ID); }
Index: src/java.base/share/classes/java/util/stream/ReduceOps.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626+:e2fc434b410a+) @@ -36,6 +36,7 @@ import java.util.function.BinaryOperator; import java.util.function.DoubleBinaryOperator; import java.util.function.IntBinaryOperator; +import java.util.function.IntFunction; import java.util.function.LongBinaryOperator; import java.util.function.ObjDoubleConsumer; import java.util.function.ObjIntConsumer; @@ -155,13 +156,20 @@ public static <T, I> TerminalOp<T, I> makeRef(Collector<? super T, I, ?> collector) { Supplier<I> supplier = Objects.requireNonNull(collector).supplier(); + @SuppressWarnings("unchecked") + IntFunction<I> sizedSupplier = collector instanceof Collectors.CollectorImpl ? + ((Collectors.CollectorImpl<?, I, ?>) collector).sizedSupplier() : null; BiConsumer<I, ? super T> accumulator = collector.accumulator(); BinaryOperator<I> combiner = collector.combiner(); class ReducingSink extends Box<I> implements AccumulatingSink<T, I, ReducingSink> { @Override public void begin(long size) { - state = supplier.get(); + if (sizedSupplier != null && size >= 0 && size <= Integer.MAX_VALUE) { + state = sizedSupplier.apply((int) size); + } else { + state = supplier.get(); + } }
@Override
Tagir, I slapped my forehead when I saw that StringBuilder's initialCapacity is the number of chars, and not added elements! The CollectorOp class I added for ReferencePipeline.collect(Collector) is so a concurrent + unordered collector may be pre-sized. I thought it was needed for completeness (let concurrent collectors benefit from pre-sizing just like serial). But maybe not worth the change? I hope I can change your mind about the spec change. We agree that pre-sizing collections is important, and the benchmarks show that Streams supporting pre-sizing can reduce allocations (up to 60% for ArrayLists) while often improving throughput. The question then is if the ends are worth the means? Take a look at this (non-exhaustive) list of Collections that stand to benefit from having Collector.sizedSupplier(): - Eclipse Collections. The library provides its own Collectors2 utility class [1], and the authors have high performance / low memory usage as a goal. - MutableList - Used in Collectors2.toList(). Collector.sizedSupplier() could be specified with Lists.mutable.ofInitialCapacity(int). - BooleanArrayList(int initialCapacity) - HashBag(int size) - BooleanList, IntList, etc. - Guava. A quick google search found libraries like https://github.com/yanaga/guava-stream that aggregate Collectors for guava collections. - The FutureJoiner I shared earlier - JDK classes that don't have Collectors factory method (ex. ArrayDeque(int numElements)) Thanks Tagir for the comments, and I updated the webrev: http://august.nagro.us/presized-collectors/webrev2/index.html - August [1]: https://www.eclipse.org/collections/javadoc/9.2.0/org/eclipse/collections/im... On Sun, Mar 3, 2019 at 9:08 PM Tagir Valeev <amaembo@gmail.com> wrote:
Hello August.
I'm not supporting the proposed spec change, but I have few comments about implementation (in case OpenJDK reviewers would support it).
1. joining(): Setting initial StringBuilder size to the number of characters which is equivalent to the number of stream elements would be a good idea only under assumption that every stream element has the length 1. Which is wrong in 99.9% of real-life cases. 2. flatMapping(), filtering(): Bypassing the original size to the downstream collector is plain wrong here as number of resulting elements can be either bigger (in flatMapping case) or arbitrarily smaller (in both cases) than the stream size. In the latter case you may just waste big amount of memory: a stream of million elements could be collapsed to 1-2 elements, but you would still allocate an ArrayList of a million of elements. 3. toMap(): a HashMap constructor parameter is an initial capacity, not the number of elements. Supplying it and ignoring the load factor you may force the HashMap to resize the table once which is suboptimal. You should pass an estimated number of elements divided by default load factor (and check corner cases as well). 4. teeing(): you defer null-check to the actual call of supplier or sizedSupplier. This is very bad as broken collector (e.g. returning null from sizedSupplier) may not fail fast for a long time if it's never used for sized stream. 5. ReduceOps: also you defer a null-check and capture not the function, but Collector object itself. This produces subtle behavioral change for no reason. 6. ReferencePipeline::collect: I don't see why such huge change is necessary.
With best regards, Tagir Valeev.
On Mon, Mar 4, 2019 at 7:36 AM August Nagro <augustnagro@gmail.com> wrote:
Hi everyone,
My implementation is at
https://github.com/AugustNagro/presize-collectors-bench and I have a webrev here: http://august.nagro.us/presized-collectors/webrev/.
The changes that I made were:
Add `default IntFunction<A> sizedSupplier()` to Collector Add 2 new Collector.of helper methods taking a sized supplier Update the applicable collectors in Collectors to provide a sizedSupplier Have ReferencePipeline & ReduceOps call sizedSupplier when appropriate
Some notes/questions I have:
I considered adding Collector.Characteristics.PRESIZEABLE, but figured
it was not needed given that sizedSupplier should always delegate to supplier.
Collector.sizedSupplier could be a LongFunction instead of IntFunction (since Sink.begin provides long sizes), but every Collection type I've looked at uses int for initialCapacity, so I think IntFunction makes more sense. StringJoiner should be updated to take an initalCapacity (should I commit this change in the webrev, or leave for another enhancement?) What tests should I add for this enhancement?
There are some benchmarks in the github repo. I was initially surprised when I saw that my bare-bones parallel stream did not have a throughput improvement like the serial one, but after doing some debugging I think it is from Collector.combiner dominating the runtime.
Regards,
August
On Thu, Feb 28, 2019 at 12:56 AM Tagir Valeev <amaembo@gmail.com> wrote:
Hello!
I wouldn't use presized HashSet, because you never know how many
duplicates are expected. What if the input stream has a million of elements, but only two of them are distinct? Do you really want to allocate a hash table for million elements in advance?
For toMap() without custom supplier and merger preallocation sounds
reasonable as in this case duplicating key is an exceptional case, so we may expect a specific number of elements.
чт, 28 февр. 2019 г., 11:38 August Nagro augustnagro@gmail.com:
Tagir,
Great to see the validating benchmarks.
I think it's the best solution given the fact that very few
collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier).
I would like to see your benchmarks for that statement too. After all,
Hashmap, HashSet, etc all have presizing constructors, aren't those there for a reason?
So I am still convinced that presizing is important for other
collector types, including custom collectors. Although I'm open to having my mind changed.
I'm happy to contribute an implementation as well, I was hoping that
this this (smallish) patch would help me learn the OpenJdk process and make future contributions easier!
- August
On Wed, Feb 27, 2019 at 1:06 AM Tagir Valeev <amaembo@gmail.com>
wrote:
Hello!
A less intrusive API direction might be a version of Collector whose supplier function took a size-estimate argument; this might even
help in
parallel since it allows for intermediate results to start with a better initial size guess. (And this could be implemented as a default that delegates to the existing supplier.) Still, not really sure this carries its weight.
It's interesting that such optimization is possible without API change, using the "hidden knowledge". While public API stays the same, the collect method implementation may check if custom non-public Collector implementation is supplied, and in this case may use the sized supplier. I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier).
I created a quick patch and launched a quick benchmark like this: return new SplittableRandom(0).ints(length, 0, 100).boxed().collect(Collectors.toList()); For length = 100, 10000, 1000000
Here's results for vanilla 13-ea+8: length = 100, avg time = 1434,469 ± 27,147 ns/op, alloc rate = 1712 B/op length = 10000, avg time = 155032,390 ± 2657,927 ns/op, alloc rate = 169280 B/op length = 1000000, avg time = 27099621,763 ± 1979054,500 ns/op, alloc rate = 14586750 B/op
Patched: length = 100, avg time = 1414,480 ± 30,040 ns/op, alloc rate = 768 B/op length = 10000, avg time = 137338,320 ± 1316,789 ns/op, alloc rate = 40368 B/op length = 1000000, avg time = 17654635,871 ± 210607,441 ns/op, alloc rate = 4000382 B/op
As you can see, exact allocation really helps to reduce alloc pressure and produces better performance for big lists.
If such kind of patch is acceptable for Stream API, I can file a ticket and submit a webrev.
With best regards, Tagir Valeev
The patch follows:
--- src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626+:e2fc434b410a+) @@ -50,6 +50,7 @@ import java.util.function.BinaryOperator; import java.util.function.Consumer; import java.util.function.Function; +import java.util.function.IntFunction; import java.util.function.Predicate; import java.util.function.Supplier; import java.util.function.ToDoubleFunction; @@ -194,23 +195,34 @@ */ static class CollectorImpl<T, A, R> implements Collector<T, A, R> { private final Supplier<A> supplier; + private final IntFunction<A> sizedSupplier; private final BiConsumer<A, T> accumulator; private final BinaryOperator<A> combiner; private final Function<A, R> finisher; private final Set<Characteristics> characteristics;
CollectorImpl(Supplier<A> supplier, + IntFunction<A> sizedSupplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, Function<A,R> finisher, Set<Characteristics> characteristics) { this.supplier = supplier; + this.sizedSupplier = sizedSupplier; this.accumulator = accumulator; this.combiner = combiner; this.finisher = finisher; this.characteristics = characteristics; }
+ CollectorImpl(Supplier<A> supplier, + BiConsumer<A, T> accumulator, + BinaryOperator<A> combiner, + Function<A,R> finisher, + Set<Characteristics> characteristics) { + this(supplier, null, accumulator, combiner, finisher, characteristics); + } + CollectorImpl(Supplier<A> supplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, @@ -228,6 +240,10 @@ return supplier; }
+ IntFunction<A> sizedSupplier() { + return sizedSupplier; + } + @Override public BinaryOperator<A> combiner() { return combiner; @@ -275,8 +291,11 @@ */ public static <T> Collector<T, ?, List<T>> toList() { - return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add, + return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, + ArrayList::new, + List::add, (left, right) -> { left.addAll(right); return left; }, + castingIdentity(), CH_ID); }
Index: src/java.base/share/classes/java/util/stream/ReduceOps.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626+:e2fc434b410a+) @@ -36,6 +36,7 @@ import java.util.function.BinaryOperator; import java.util.function.DoubleBinaryOperator; import java.util.function.IntBinaryOperator; +import java.util.function.IntFunction; import java.util.function.LongBinaryOperator; import java.util.function.ObjDoubleConsumer; import java.util.function.ObjIntConsumer; @@ -155,13 +156,20 @@ public static <T, I> TerminalOp<T, I> makeRef(Collector<? super T, I, ?> collector) { Supplier<I> supplier = Objects.requireNonNull(collector).supplier(); + @SuppressWarnings("unchecked") + IntFunction<I> sizedSupplier = collector instanceof Collectors.CollectorImpl ? + ((Collectors.CollectorImpl<?, I, ?>) collector).sizedSupplier() : null; BiConsumer<I, ? super T> accumulator = collector.accumulator(); BinaryOperator<I> combiner = collector.combiner(); class ReducingSink extends Box<I> implements AccumulatingSink<T, I, ReducingSink> { @Override public void begin(long size) { - state = supplier.get(); + if (sizedSupplier != null && size >= 0 && size <= Integer.MAX_VALUE) { + state = sizedSupplier.apply((int) size); + } else { + state = supplier.get(); + } }
@Override
So, unfortunately I think this particular API evolution idiom is kind of questionable. There is a new method, sizedSupplier, with a default that delegates to the unsized supplier. But, that leaves implementors in a position where they don’t really know which one to implement, and the use of characteristics to hint at which one to call is kind of convoluted. You’re focused on the benefits — that in some cases, collection can be faster. That’s good. But one of the big costs here is that an interfaces that was simple is now considerably less so. That’s a cost; not convinced the two are in balance.
On Mar 4, 2019, at 12:35 AM, August Nagro <augustnagro@gmail.com> wrote:
Hi everyone,
My implementation is at https://github.com/AugustNagro/presize-collectors-bench <https://github.com/AugustNagro/presize-collectors-bench> and I have a webrev here: http://august.nagro.us/presized-collectors/webrev/ <http://august.nagro.us/presized-collectors/webrev/>.
The changes that I made were: Add `default IntFunction<A> sizedSupplier()` to Collector Add 2 new Collector.of helper methods taking a sized supplier Update the applicable collectors in Collectors to provide a sizedSupplier Have ReferencePipeline & ReduceOps call sizedSupplier when appropriate Some notes/questions I have: I considered adding Collector.Characteristics.PRESIZEABLE, but figured it was not needed given that sizedSupplier should always delegate to supplier. Collector.sizedSupplier could be a LongFunction instead of IntFunction (since Sink.begin provides long sizes), but every Collection type I've looked at uses int for initialCapacity, so I think IntFunction makes more sense. StringJoiner should be updated to take an initalCapacity (should I commit this change in the webrev, or leave for another enhancement?) What tests should I add for this enhancement? There are some benchmarks in the github repo. I was initially surprised when I saw that my bare-bones parallel stream did not have a throughput improvement like the serial one, but after doing some debugging I think it is from Collector.combiner dominating the runtime.
Regards,
August
On Thu, Feb 28, 2019 at 12:56 AM Tagir Valeev <amaembo@gmail.com <mailto:amaembo@gmail.com>> wrote: Hello!
I wouldn't use presized HashSet, because you never know how many duplicates are expected. What if the input stream has a million of elements, but only two of them are distinct? Do you really want to allocate a hash table for million elements in advance?
For toMap() without custom supplier and merger preallocation sounds reasonable as in this case duplicating key is an exceptional case, so we may expect a specific number of elements.
чт, 28 февр. 2019 г., 11:38 August Nagro augustnagro@gmail.com <mailto:augustnagro@gmail.com>: Tagir,
Great to see the validating benchmarks.
I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier).
I would like to see your benchmarks for that statement too. After all, Hashmap, HashSet, etc all have presizing constructors, aren't those there for a reason?
So I am still convinced that presizing is important for other collector types, including custom collectors. Although I'm open to having my mind changed.
I'm happy to contribute an implementation as well, I was hoping that this this (smallish) patch would help me learn the OpenJdk process and make future contributions easier!
- August
On Wed, Feb 27, 2019 at 1:06 AM Tagir Valeev <amaembo@gmail.com <mailto:amaembo@gmail.com>> wrote: Hello!
A less intrusive API direction might be a version of Collector whose supplier function took a size-estimate argument; this might even help in parallel since it allows for intermediate results to start with a better initial size guess. (And this could be implemented as a default that delegates to the existing supplier.) Still, not really sure this carries its weight.
It's interesting that such optimization is possible without API change, using the "hidden knowledge". While public API stays the same, the collect method implementation may check if custom non-public Collector implementation is supplied, and in this case may use the sized supplier. I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier).
I created a quick patch and launched a quick benchmark like this: return new SplittableRandom(0).ints(length, 0, 100).boxed().collect(Collectors.toList()); For length = 100, 10000, 1000000
Here's results for vanilla 13-ea+8: length = 100, avg time = 1434,469 ± 27,147 ns/op, alloc rate = 1712 B/op length = 10000, avg time = 155032,390 ± 2657,927 ns/op, alloc rate = 169280 B/op length = 1000000, avg time = 27099621,763 ± 1979054,500 ns/op, alloc rate = 14586750 B/op
Patched: length = 100, avg time = 1414,480 ± 30,040 ns/op, alloc rate = 768 B/op length = 10000, avg time = 137338,320 ± 1316,789 ns/op, alloc rate = 40368 B/op length = 1000000, avg time = 17654635,871 ± 210607,441 ns/op, alloc rate = 4000382 B/op
As you can see, exact allocation really helps to reduce alloc pressure and produces better performance for big lists.
If such kind of patch is acceptable for Stream API, I can file a ticket and submit a webrev.
With best regards, Tagir Valeev
The patch follows:
--- src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626+:e2fc434b410a+) @@ -50,6 +50,7 @@ import java.util.function.BinaryOperator; import java.util.function.Consumer; import java.util.function.Function; +import java.util.function.IntFunction; import java.util.function.Predicate; import java.util.function.Supplier; import java.util.function.ToDoubleFunction; @@ -194,23 +195,34 @@ */ static class CollectorImpl<T, A, R> implements Collector<T, A, R> { private final Supplier<A> supplier; + private final IntFunction<A> sizedSupplier; private final BiConsumer<A, T> accumulator; private final BinaryOperator<A> combiner; private final Function<A, R> finisher; private final Set<Characteristics> characteristics;
CollectorImpl(Supplier<A> supplier, + IntFunction<A> sizedSupplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, Function<A,R> finisher, Set<Characteristics> characteristics) { this.supplier = supplier; + this.sizedSupplier = sizedSupplier; this.accumulator = accumulator; this.combiner = combiner; this.finisher = finisher; this.characteristics = characteristics; }
+ CollectorImpl(Supplier<A> supplier, + BiConsumer<A, T> accumulator, + BinaryOperator<A> combiner, + Function<A,R> finisher, + Set<Characteristics> characteristics) { + this(supplier, null, accumulator, combiner, finisher, characteristics); + } + CollectorImpl(Supplier<A> supplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, @@ -228,6 +240,10 @@ return supplier; }
+ IntFunction<A> sizedSupplier() { + return sizedSupplier; + } + @Override public BinaryOperator<A> combiner() { return combiner; @@ -275,8 +291,11 @@ */ public static <T> Collector<T, ?, List<T>> toList() { - return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add, + return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, + ArrayList::new, + List::add, (left, right) -> { left.addAll(right); return left; }, + castingIdentity(), CH_ID); }
Index: src/java.base/share/classes/java/util/stream/ReduceOps.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626+:e2fc434b410a+) @@ -36,6 +36,7 @@ import java.util.function.BinaryOperator; import java.util.function.DoubleBinaryOperator; import java.util.function.IntBinaryOperator; +import java.util.function.IntFunction; import java.util.function.LongBinaryOperator; import java.util.function.ObjDoubleConsumer; import java.util.function.ObjIntConsumer; @@ -155,13 +156,20 @@ public static <T, I> TerminalOp<T, I> makeRef(Collector<? super T, I, ?> collector) { Supplier<I> supplier = Objects.requireNonNull(collector).supplier(); + @SuppressWarnings("unchecked") + IntFunction<I> sizedSupplier = collector instanceof Collectors.CollectorImpl ? + ((Collectors.CollectorImpl<?, I, ?>) collector).sizedSupplier() : null; BiConsumer<I, ? super T> accumulator = collector.accumulator(); BinaryOperator<I> combiner = collector.combiner(); class ReducingSink extends Box<I> implements AccumulatingSink<T, I, ReducingSink> { @Override public void begin(long size) { - state = supplier.get(); + if (sizedSupplier != null && size >= 0 && size <= Integer.MAX_VALUE) { + state = sizedSupplier.apply((int) size); + } else { + state = supplier.get(); + } }
@Override
I think documentation could help this (and maybe removing the two added `of` helpers so an explicit collector must be created), but you're right about the cost. If the consensus is not to change Collector, then maybe Tagir's suggestion for internal-only change can be implemented? On Mon, Mar 4, 2019 at 2:53 PM Brian Goetz <brian.goetz@oracle.com> wrote:
So, unfortunately I think this particular API evolution idiom is kind of questionable. There is a new method, sizedSupplier, with a default that delegates to the unsized supplier. But, that leaves implementors in a position where they don’t really know which one to implement, and the use of characteristics to hint at which one to call is kind of convoluted.
You’re focused on the benefits — that in some cases, collection can be faster. That’s good. But one of the big costs here is that an interfaces that was simple is now considerably less so. That’s a cost; not convinced the two are in balance.
On Mar 4, 2019, at 12:35 AM, August Nagro <augustnagro@gmail.com> wrote:
Hi everyone,
My implementation is at https://github.com/AugustNagro/presize-collectors-bench and I have a webrev here: http://august.nagro.us/presized-collectors/webrev/.
The changes that I made were:
- Add `default IntFunction<A> sizedSupplier()` to Collector - Add 2 new Collector.of helper methods taking a sized supplier - Update the applicable collectors in Collectors to provide a sizedSupplier - Have ReferencePipeline & ReduceOps call sizedSupplier when appropriate
Some notes/questions I have:
- I considered adding Collector.Characteristics.PRESIZEABLE, but figured it was not needed given that sizedSupplier should always delegate to supplier. - Collector.sizedSupplier could be a LongFunction instead of IntFunction (since Sink.begin provides long sizes), but every Collection type I've looked at uses int for initialCapacity, so I think IntFunction makes more sense. - StringJoiner should be updated to take an initalCapacity (should I commit this change in the webrev, or leave for another enhancement?) - What tests should I add for this enhancement?
There are some benchmarks in the github repo. I was initially surprised when I saw that my bare-bones parallel stream did not have a throughput improvement like the serial one, but after doing some debugging I think it is from Collector.combiner dominating the runtime.
Regards,
August
On Thu, Feb 28, 2019 at 12:56 AM Tagir Valeev <amaembo@gmail.com> wrote:
Hello!
I wouldn't use presized HashSet, because you never know how many duplicates are expected. What if the input stream has a million of elements, but only two of them are distinct? Do you really want to allocate a hash table for million elements in advance?
For toMap() without custom supplier and merger preallocation sounds reasonable as in this case duplicating key is an exceptional case, so we may expect a specific number of elements.
чт, 28 февр. 2019 г., 11:38 August Nagro augustnagro@gmail.com:
Tagir,
Great to see the validating benchmarks.
I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier).
I would like to see your benchmarks for that statement too. After all, Hashmap, HashSet, etc all have presizing constructors, aren't those there for a reason?
So I am still convinced that presizing is important for other collector types, including custom collectors. Although I'm open to having my mind changed.
I'm happy to contribute an implementation as well, I was hoping that this this (smallish) patch would help me learn the OpenJdk process and make future contributions easier!
- August
On Wed, Feb 27, 2019 at 1:06 AM Tagir Valeev <amaembo@gmail.com> wrote:
Hello!
A less intrusive API direction might be a version of Collector whose supplier function took a size-estimate argument; this might even help in parallel since it allows for intermediate results to start with a better initial size guess. (And this could be implemented as a default that delegates to the existing supplier.) Still, not really sure this carries its weight.
It's interesting that such optimization is possible without API change, using the "hidden knowledge". While public API stays the same, the collect method implementation may check if custom non-public Collector implementation is supplied, and in this case may use the sized supplier. I think it's the best solution given the fact that very few collectors may benefit from the exact known size, and this benefit usually disappears when collectors are composed (e.g. using groupingBy: downstream toList() would not win anything if it provides a sizedSupplier).
I created a quick patch and launched a quick benchmark like this: return new SplittableRandom(0).ints(length, 0, 100).boxed().collect(Collectors.toList()); For length = 100, 10000, 1000000
Here's results for vanilla 13-ea+8: length = 100, avg time = 1434,469 ± 27,147 ns/op, alloc rate = 1712 B/op length = 10000, avg time = 155032,390 ± 2657,927 ns/op, alloc rate = 169280 B/op length = 1000000, avg time = 27099621,763 ± 1979054,500 ns/op, alloc rate = 14586750 B/op
Patched: length = 100, avg time = 1414,480 ± 30,040 ns/op, alloc rate = 768 B/op length = 10000, avg time = 137338,320 ± 1316,789 ns/op, alloc rate = 40368 B/op length = 1000000, avg time = 17654635,871 ± 210607,441 ns/op, alloc rate = 4000382 B/op
As you can see, exact allocation really helps to reduce alloc pressure and produces better performance for big lists.
If such kind of patch is acceptable for Stream API, I can file a ticket and submit a webrev.
With best regards, Tagir Valeev
The patch follows:
--- src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/Collectors.java (revision 53626+:e2fc434b410a+) @@ -50,6 +50,7 @@ import java.util.function.BinaryOperator; import java.util.function.Consumer; import java.util.function.Function; +import java.util.function.IntFunction; import java.util.function.Predicate; import java.util.function.Supplier; import java.util.function.ToDoubleFunction; @@ -194,23 +195,34 @@ */ static class CollectorImpl<T, A, R> implements Collector<T, A, R> { private final Supplier<A> supplier; + private final IntFunction<A> sizedSupplier; private final BiConsumer<A, T> accumulator; private final BinaryOperator<A> combiner; private final Function<A, R> finisher; private final Set<Characteristics> characteristics;
CollectorImpl(Supplier<A> supplier, + IntFunction<A> sizedSupplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, Function<A,R> finisher, Set<Characteristics> characteristics) { this.supplier = supplier; + this.sizedSupplier = sizedSupplier; this.accumulator = accumulator; this.combiner = combiner; this.finisher = finisher; this.characteristics = characteristics; }
+ CollectorImpl(Supplier<A> supplier, + BiConsumer<A, T> accumulator, + BinaryOperator<A> combiner, + Function<A,R> finisher, + Set<Characteristics> characteristics) { + this(supplier, null, accumulator, combiner, finisher, characteristics); + } + CollectorImpl(Supplier<A> supplier, BiConsumer<A, T> accumulator, BinaryOperator<A> combiner, @@ -228,6 +240,10 @@ return supplier; }
+ IntFunction<A> sizedSupplier() { + return sizedSupplier; + } + @Override public BinaryOperator<A> combiner() { return combiner; @@ -275,8 +291,11 @@ */ public static <T> Collector<T, ?, List<T>> toList() { - return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add, + return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, + ArrayList::new, + List::add, (left, right) -> { left.addAll(right); return left; }, + castingIdentity(), CH_ID); }
Index: src/java.base/share/classes/java/util/stream/ReduceOps.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) +++ src/java.base/share/classes/java/util/stream/ReduceOps.java (revision 53626+:e2fc434b410a+) @@ -36,6 +36,7 @@ import java.util.function.BinaryOperator; import java.util.function.DoubleBinaryOperator; import java.util.function.IntBinaryOperator; +import java.util.function.IntFunction; import java.util.function.LongBinaryOperator; import java.util.function.ObjDoubleConsumer; import java.util.function.ObjIntConsumer; @@ -155,13 +156,20 @@ public static <T, I> TerminalOp<T, I> makeRef(Collector<? super T, I, ?> collector) { Supplier<I> supplier = Objects.requireNonNull(collector).supplier(); + @SuppressWarnings("unchecked") + IntFunction<I> sizedSupplier = collector instanceof Collectors.CollectorImpl ? + ((Collectors.CollectorImpl<?, I, ?>) collector).sizedSupplier() : null; BiConsumer<I, ? super T> accumulator = collector.accumulator(); BinaryOperator<I> combiner = collector.combiner(); class ReducingSink extends Box<I> implements AccumulatingSink<T, I, ReducingSink> { @Override public void begin(long size) { - state = supplier.get(); + if (sizedSupplier != null && size >= 0 && size <= Integer.MAX_VALUE) { + state = sizedSupplier.apply((int) size); + } else { + state = supplier.get(); + } }
@Override
participants (4)
-
August Nagro
-
Brian Goetz
-
Rob Spoor
-
Tagir Valeev