Custom spliterator for Collections.nCopies(n, obj).stream()

newer
RFR - 8027634: Support @argfiles...

Tagir F. Valeev

28 Jul 2015 28 Jul '15

8:37 a.m.

Hello! Current implementation of Collections.nCopies().stream() is as follows: http://hg.openjdk.java.net/jdk9/dev/jdk/file/f160dec9a350/src/java.base/shar... public Stream<E> stream() { return IntStream.range(0, n).mapToObj(i -> element); } @Override public Stream<E> parallelStream() { return IntStream.range(0, n).parallel().mapToObj(i -> element); } The problem is that it adds a lambda expression to the RangeIntSpliterator type profile which can be polluted by some other code and vice versa: using nCopies().stream() may pollute the type profile for other code making it slower. Another thing which is missing in current implementation is unordered mode. This collection is unordered by nature, its stream is similar to Stream.generate(), so to my opinion it should be unordered which may improve the parallel reduction in some cases. This can be improved by introducing the custom spliterator class which is quite simple: https://gist.github.com/amaembo/62f3efee9923b1468e86 On pre-polluted type profile with simple mapping and reduction using custom spliterator is about 25-30% faster in both parallel and sequential cases as benchmarking shows (performed on 4-core cpu). What do you think? With best regards, Tagir Valeev.

Show replies by date

Paul Sandoz

28 Jul 28 Jul

1:01 p.m.

Hi Tagir, I would agree with you out about changing to use unordered() except that it is a List that is returned, whose Spliterator is specified to report ORDERED. I don’t particular want to add a special spliterator for this case to avoid some profile pollution. Will it not just push the pollution further down the road to Spliterator.forEachRemaining? or to within other code? Alas i think profile pollution is current fact of JDK life when inverting control with lambdas. What we really require is a better way to specialise the hot loops. Paul. On 28 Jul 2015, at 10:37, Tagir F. Valeev <amaembo@gmail.com> wrote:

...

Hello!

Current implementation of Collections.nCopies().stream() is as follows:

http://hg.openjdk.java.net/jdk9/dev/jdk/file/f160dec9a350/src/java.base/shar...

public Stream<E> stream() { return IntStream.range(0, n).mapToObj(i -> element); }

@Override public Stream<E> parallelStream() { return IntStream.range(0, n).parallel().mapToObj(i -> element); }

The problem is that it adds a lambda expression to the RangeIntSpliterator type profile which can be polluted by some other code and vice versa: using nCopies().stream() may pollute the type profile for other code making it slower.

Another thing which is missing in current implementation is unordered mode. This collection is unordered by nature, its stream is similar to Stream.generate(), so to my opinion it should be unordered which may improve the parallel reduction in some cases.

This can be improved by introducing the custom spliterator class which is quite simple: https://gist.github.com/amaembo/62f3efee9923b1468e86

On pre-polluted type profile with simple mapping and reduction using custom spliterator is about 25-30% faster in both parallel and sequential cases as benchmarking shows (performed on 4-core cpu).

What do you think?

With best regards, Tagir Valeev.

Tagir F. Valeev

30 Jul 30 Jul

6:08 a.m.

Hello! PS> I don’t particular want to add a special spliterator for this PS> case to avoid some profile pollution. Will it not just push the PS> pollution further down the road to Spliterator.forEachRemaining? or to within other code? I just thought that the current idea is to create specialized spliterators for most methods returning streams. If not, why String.chars()/AbstractStringBuilder.chars() in JDK9 use new IntCharArraySpliterator instead of already existing CharBuffer.wrap(this).chars() which produce similar performance in both sequential and parallel cases? Also for String an alternative would be return IntStream.range(0, value.length).map(i -> value[i]); Which is actually similar to Collections.nCopies().stream(). Also note that Collections class already contains singletonSpliterator which creates an anonymous class. With my proposed change it can be replaced with new ConstantSpliterator(1, element) without performance drop, so actual number of classes will not increase. At very least why creating two distinct lambdas in CopiesList.stream() and CopiesList.parallelStream()? They duplicate "i -> element", for which javac creates two separate methods (like lambda$stream$95(int) and lambda$parallelStream$96(int)) and in runtime two distinct anonymous classes may be created. It could be written instead public Stream<E> parallelStream() { return stream().parallel(); } With best regards, Tagir Valeev. PS> Alas i think profile pollution is current fact of JDK life when PS> inverting control with lambdas. What we really require is a better way to specialise the hot loops. PS> Paul. PS> On 28 Jul 2015, at 10:37, Tagir F. Valeev <amaembo@gmail.com> wrote:

...

...
Hello!

Current implementation of Collections.nCopies().stream() is as follows:

http://hg.openjdk.java.net/jdk9/dev/jdk/file/f160dec9a350/src/java.base/shar...

public Stream<E> stream() { return IntStream.range(0, n).mapToObj(i -> element); }

@Override public Stream<E> parallelStream() { return IntStream.range(0, n).parallel().mapToObj(i -> element); }

The problem is that it adds a lambda expression to the RangeIntSpliterator type profile which can be polluted by some other code and vice versa: using nCopies().stream() may pollute the type profile for other code making it slower.

Another thing which is missing in current implementation is unordered mode. This collection is unordered by nature, its stream is similar to Stream.generate(), so to my opinion it should be unordered which may improve the parallel reduction in some cases.

This can be improved by introducing the custom spliterator class which is quite simple: https://gist.github.com/amaembo/62f3efee9923b1468e86

On pre-polluted type profile with simple mapping and reduction using custom spliterator is about 25-30% faster in both parallel and sequential cases as benchmarking shows (performed on 4-core cpu).

What do you think?

With best regards, Tagir Valeev.

Paul Sandoz

7:42 a.m.

On 30 Jul 2015, at 08:08, Tagir F. Valeev <amaembo@gmail.com> wrote:

...

Hello!

PS> I don’t particular want to add a special spliterator for this PS> case to avoid some profile pollution. Will it not just push the PS> pollution further down the road to Spliterator.forEachRemaining? or to within other code?

I just thought that the current idea is to create specialized spliterators for most methods returning streams.

Not without some evaluation of the benefits compared to the increased cost of code.

...

If not, why String.chars()/AbstractStringBuilder.chars() in JDK9 use new IntCharArraySpliterator instead of already existing CharBuffer.wrap(this).chars() which produce similar performance in both sequential and parallel cases? Also for String an alternative would be

return IntStream.range(0, value.length).map(i -> value[i]);

Since String is a commonly used class i opted for the most efficient solution (both for characters and code points, and shared across String and the builders).

...

Which is actually similar to Collections.nCopies().stream().

Yes, but i doubt it is as commonly used as String and thus I don’t think the argument of profile pollution is sufficient reason to justify a specific implementation. In this case convenience won over more code.

...

Also note that Collections class already contains singletonSpliterator which creates an anonymous class. With my proposed change it can be replaced with new ConstantSpliterator(1, element) without performance drop, so actual number of classes will not increase.

With reuse it becomes more compelling :-) In both cases of singleton/nCopies the spliterator characteristics can be the same and that of the already existing singleton spliterator implementation. I would be happy to accept a patch (with tests, if existing tests do not cover this already, i suspect they might but we still need to check). Have you signed the OCA [1]. If so i can accept a patch from you and publish as a webrev for review.

...

At very least why creating two distinct lambdas in CopiesList.stream() and CopiesList.parallelStream()? They duplicate "i -> element", for which javac creates two separate methods (like lambda$stream$95(int) and lambda$parallelStream$96(int)) and in runtime two distinct anonymous classes may be created. It could be written instead

public Stream<E> parallelStream() { return stream().parallel(); }

Yes, good point. Thanks, Paul. [1] http://www.oracle.com/technetwork/community/oca-486395.html

Tagir F. Valeev

22 Aug 22 Aug

9:17 a.m.

New subject: [PATCH] Re: Custom spliterator for Collections.nCopies(n, obj).stream()

Hello! PS> With reuse it becomes more compelling :-) In both cases of PS> singleton/nCopies the spliterator characteristics can be the same PS> and that of the already existing singleton spliterator implementation. The only difference is the DISTINCT characteristic. I think it's good to report it based on the list size, so Collections.nCopies(1, obj).spliterator() can report DISTINCT as well. PS> I would be happy to accept a patch (with tests, if existing tests PS> do not cover this already, i suspect they might but we still need PS> to check). Have you signed the OCA [1]. If so i can accept a patch PS> from you and publish as a webrev for review. Here's my patch to the Collections class. Implementation of the ConstantSpliterator is added, singletonSpliterator method now uses it as well as CopiesList::spliterator. CopiesList::stream and CopiesList::parallelStream methods are removed as unnecessary. The resulting bytecode is roughly 750 bytes less after applying my patch. As for tests, it seems that test/java/util/Spliterator/SpliteratorTraversingAndSplittingTest.java covers both singletonSpliterator and nCopies().spliterator() pretty well. I checked that these tests succeed with my changes while failed when some mistake in ConstantSpliterator is introduced. If you think that more tests are necessary, please suggest what exactly should be tested and where to put them. With best regards, Tagir Valeev.

Tagir F. Valeev

2:27 p.m.

New subject: [PATCH] Re: Custom spliterator for Collections.nCopies(n, obj).stream()

Seems that the patch attachment was filtered out. Just for the case here it is: diff --git a/src/java.base/share/classes/java/util/Collections.java b/src/java.base/share/classes/java/util/Collections.java --- a/src/java.base/share/classes/java/util/Collections.java +++ b/src/java.base/share/classes/java/util/Collections.java @@ -4702,43 +4702,7 @@ * @return A singleton {@code Spliterator} */ static <T> Spliterator<T> singletonSpliterator(final T element) { - return new Spliterator<T>() { - long est = 1; - - @Override - public Spliterator<T> trySplit() { - return null; - } - - @Override - public boolean tryAdvance(Consumer<? super T> consumer) { - Objects.requireNonNull(consumer); - if (est > 0) { - est--; - consumer.accept(element); - return true; - } - return false; - } - - @Override - public void forEachRemaining(Consumer<? super T> consumer) { - tryAdvance(consumer); - } - - @Override - public long estimateSize() { - return est; - } - - @Override - public int characteristics() { - int value = (element != null) ? Spliterator.NONNULL : 0; - - return value | Spliterator.SIZED | Spliterator.SUBSIZED | Spliterator.IMMUTABLE | - Spliterator.DISTINCT | Spliterator.ORDERED; - } - }; + return new ConstantSpliterator<>(1, element); } /** @@ -5061,20 +5025,64 @@ return new CopiesList<>(toIndex - fromIndex, element); } - // Override default methods in Collection - @Override - public Stream<E> stream() { - return IntStream.range(0, n).mapToObj(i -> element); - } - - @Override - public Stream<E> parallelStream() { - return IntStream.range(0, n).parallel().mapToObj(i -> element); - } - @Override public Spliterator<E> spliterator() { - return stream().spliterator(); + return new ConstantSpliterator<>(n, element); + } + } + + private static final class ConstantSpliterator<T> implements Spliterator<T> { + long est; + T element; + + public ConstantSpliterator(long est, T element) { + this.est = est; + this.element = element; + } + + @Override + public Spliterator<T> trySplit() { + long est = this.est; + if (est >= 2) { + est >>= 1; + Spliterator<T> prefix = new ConstantSpliterator<>(est, element); + this.est -= est; + return prefix; + } + return null; + } + + @Override + public boolean tryAdvance(Consumer<? super T> action) { + Objects.requireNonNull(action); + if (est <= 0) + return false; + action.accept(element); + est--; + return true; + } + + @Override + public void forEachRemaining(Consumer<? super T> action) { + Objects.requireNonNull(action); + T element = this.element; + for (long r = est; r > 0; r--) { + action.accept(element); + } + est = 0; + } + + @Override + public long estimateSize() { + return est; + } + + @Override + public int characteristics() { + int nonNull = (element != null) ? NONNULL : 0; + int distinct = (est <= 1) ? DISTINCT : 0; + + return SIZED | SUBSIZED | IMMUTABLE | ORDERED | nonNull | distinct; } } With best regards, Tagir Valeev. TFV> Hello! PS>> With reuse it becomes more compelling :-) In both cases of PS>> singleton/nCopies the spliterator characteristics can be the same PS>> and that of the already existing singleton spliterator implementation. TFV> The only difference is the DISTINCT characteristic. I think it's good TFV> to report it based on the list size, so TFV> Collections.nCopies(1, obj).spliterator() can report DISTINCT as well. PS>> I would be happy to accept a patch (with tests, if existing tests PS>> do not cover this already, i suspect they might but we still need PS>> to check). Have you signed the OCA [1]. If so i can accept a patch PS>> from you and publish as a webrev for review. TFV> Here's my patch to the Collections class. Implementation of the TFV> ConstantSpliterator is added, singletonSpliterator method now uses it TFV> as well as CopiesList::spliterator. CopiesList::stream and TFV> CopiesList::parallelStream methods are removed as unnecessary. TFV> The resulting bytecode is roughly 750 bytes less after applying my TFV> patch. TFV> As for tests, it seems that TFV> test/java/util/Spliterator/SpliteratorTraversingAndSplittingTest.java TFV> covers both singletonSpliterator and nCopies().spliterator() pretty TFV> well. I checked that these tests succeed with my changes while failed TFV> when some mistake in ConstantSpliterator is introduced. If you think TFV> that more tests are necessary, please suggest what exactly should be TFV> tested and where to put them. TFV> With best regards, TFV> Tagir Valeev.

3852

Age (days ago)

3877

Last active (days ago)

List overview

Download

5 comments

2 participants

participants (2)

Paul Sandoz
Tagir F. Valeev