Speed optimization of Spliterators.spliteratorUnknownSize for parallel case

Fri Jul 10 18:01:36 UTC 2015

Hello!

The problem of ineffective parallelization of spliteratorUnknownSize
for low-N/high-Q tasks was already discussed in the thread "Exploiting
concurrency with IteratorSpliterator of unknown size"
[ http://mail.openjdk.java.net/pipermail/lambda-dev/2014-March/011968.html ]
It was proposed to modify some heuristics and/or add some parameters
to control them.

I have a simpler idea which allows to run in parallel high-Q tasks for
N <~ 3000 about twice faster without any API changes while having no
visible impact for high-N or sequential cases. Here's the gist with
proof-of-concept implementation (UnknownSizeSpliterator.java), JMH
benchmark (IteratorTest.java) and benchmarking results (performed
on Quad-Core i5-3340, Win7 64bit):
https://gist.github.com/amaembo/781c64a3b4f48fdeb196

The problem which I noticed is the following. When trySplit is
requested and JDK IteratorSpliterator fills the buffer, it may
suddenly discover that the source iterator has no more elements. At
this point the IteratorSpliterator splits in very uneven manner.
Namely it dumps everything into the ArraySpliterator leaving no
elements for itself. As it did not report the size previously, the
pipeline engine assumes that both parts are roughly equal which
result in very poor performance.

I merged both IteratorSpliterator and ArraySpliterator into single
class UnknownSizeSpliterator. When it sees that input spliterator has
no more elements, it converts itself into array-mode and splits right
here to two equal parts, thus the first call of trySplit for N < 1024
performs actual split.

Note that I did not change the heuristics at all, thus for big-N
tasks the results should be the same. My implementation is a little
bit simplified (no additional characteristics, no precondition
checks, no primitive implementations), but I guess current version is
ok for preliminary testing to decide whether it's good or bad.

This implementation might improve the performance of existing methods
like BufferedReader.lines, Files.lines, Pattern.splitAsStream when
using the parallel stream as well as user-defined iterator-based
spliterators. Also it maybe useful for JDK-8072727 feature
[ https://bugs.openjdk.java.net/browse/JDK-8072727 ].

What do you think?

With best regards,
Tagir Valeev.