Exploiting concurrency with IteratorSpliterator of unknown size

Paul Sandoz paul.sandoz at oracle.com
Mon Mar 24 09:31:31 UTC 2014


Hi Marko,

You did the right thing and wrote your own Spliterator, but why did you have to copy the ArraySpliterator code, can use the factory method on Spliterators?

    public static <T> Spliterator<T> spliterator(Object[] array,
                                                 int additionalCharacteristics)

The approach to extract parallelism from a sequential source was designed to not particularly favour one scenario over another (cost per element, # elements, reduction), and we did not want to expose any controls like the arithmetic progression properties since most people won't know what to do with these (in hindsight this might have been OK on the Spliterators.AbstractSpliterator implementation).

Unfortunately in your case i am assuming the cost-per-line is a dominant factor over number lines to be processed.

Paul.


On Mar 22, 2014, at 10:31 PM, Marko Topolnik <marko.topolnik at gmail.com> wrote:

> I have a use case where I process a BufferedReader#lines() and each line takes a substantial amount of time (say 20 ms). The processing is easily parallelizable, however for smaller input sizes, little or no parallelization is attempted due to the batch size step of 1024 hardcoded into IteratorSpliterator when there is no size estimate.
> 
> As a workaround I have coded a modified IteratorSpliterator which takes the batch size as a parameter and keeps it fixed (no arithmetic increasing). With a batch size of 100 I achieve full load on all four cores on my laptop.
> 
> Since such an approach is far from elegant (taking more than 100 lines of code, which include a copy-paste of the private ArraySpliterator and the anonymous Iterator over BufferedReader's lines), I was motivated to address this mailing list in a search of a better, more idiomatic way towards achieving good parallelism for my scenario. What could I do instead of reimplementing a Spliterator from scratch?
> 
> Regards,
> Marko Topolnik
> 
> 



More information about the lambda-dev mailing list