Exploiting concurrency with IteratorSpliterator of unknown size

Mon Mar 24 14:12:26 UTC 2014

Hi Paul,

> You did the right thing and wrote your own Spliterator, but why did you
have
> to copy the ArraySpliterator code, can use the factory method on
Spliterators?

Yes, I realized this soon after posting here. I was initially following the
original
IteratorSpliterator implementation, which makes a direct call to the
constructor.

> The approach to extract parallelism from a sequential source was designed
to
> not particularly favour one scenario over another (cost per element, #
elements,
> reduction), and we did not want to expose any controls like the
arithmetic
> progression properties since most people won't know what to do with these
> (in hindsight this might have been OK on the
Spliterators.AbstractSpliterator
> implementation).

What I would generally like to find in the library is some sort of a
template
(abstract class) to make a custom spliterator policy easier to implement. I
realize it is not trivial to get such an API right, making it serve as many
needs
as possible without an explosion of complexity.

In the specific example of BufferedReader.lines there is the additional
issue of
the anonymous Iterator implementation which is burried in the lib.

On a more general level, the problem may be with the way a Spliterator
couples
the concerns of splitting and iteration. A Spliterator based on the classic
Iterator
solves a part of this problem because iterating is delegated to a separate
object;
however the Iterator interface itself is quite cumbersome because it forces
us to
prefetch the next item and cache it when hasNext is called.

So if there was a partial Spliterator interface (which Spliterator itself
would then
extend) which had only tryAdvance and forEachRemaining, and if an
implementation of that could be directly obtained from BufferedReader and
other similar objects,
then perhaps providing our own Spliterator implementation in terms of that
would be
nicer.

> Unfortunately in your case i am assuming the cost-per-line is a dominant
> factor over number lines to be processed.

Yes, that is what's happening. There is one other thing I'm wondering about:
couldn't the ArraySpliterator returned from IteratorSpliterator be further
split into
smaller chunks? Generally, how do the "recursive decomposition" mechanics
play
out in our scenario here?

Regards,
Marko Topolnik