RFR: 8280915: Better parallelization for AbstractSpliterator and IteratorSpliterator when size is unknown [v5]

Thu Mar 3 23:52:05 UTC 2022

On Thu, 10 Feb 2022 04:30:34 GMT, Tagir F. Valeev <tvaleev at openjdk.org> wrote:

>> See the bug description for details.
>> 
>> I propose a simple solution. Let's allow ArraySpliterator to be non-SIZED and report artificial estimatedSize(), much bigger than the real one. This will allow AbstractSpliterator and IteratorSpliterator to produce prefix whose size is comparable to Long.MAX_VALUE (say, starting with Long.MAX_VALUE/2), and this will enable further splitting of the prefix. This change will drastically improve parallel streaming for affected streams of size <= 1024 and significantly improve for streams of size 1025..20000. The cost is higher-grained splitting for huge streams of unknown size. This might add a minor overhead for such scenarios which, I believe, is completely tolerable.
>> 
>> No public API changes are necessary, sequential processing should not be affected, except an extra field in ArraySpliterator which increases a footprint by 8 bytes.
>> 
>> I added a simple test using an artificial collector to ensure that at least two non-empty parts are created when parallelizing Stream.iterate source. More testing ideas are welcome.
>
> Tagir F. Valeev has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Update copyright year
>  - Cosmetic fixes

For unknown-sized iterators we have to bias towards some strategy (recognizing that the source is a poor choice for parallelism).

Currently the "peeled" split containing an array of elements (copied from the iterator) is never split, since the array length will never in practice be greater than the size threshold. The parallelism is derived from the splits of the iterator, and this is biased towards a large number of elements.

In your approach each  "peeled" split effectively gets to use half of the total parallelism. Over subsequent splits of the source that's gonna over provision tasks, specifically after two splits (or 2 * 2^10 + 2^10 elements), and this is biased towards a smaller number of elements.

In the issue you point out a number of relevant stream sources. In practice they are unlikely to have millions of elements, and probably more likely fitting into the first few steps of the arithmetic splitting sequence. If the cost-per-element is reasonably high that would overcome any over provisioning.

Overall it looks reasonable. I ran it through tier 1/2 testing and there were no failures. If you don't mind I would like to let this percolate a little in my mind (i.e. sleep on it for a day or two).

-------------

PR: https://git.openjdk.java.net/jdk/pull/7279