8072784: Better spliterator implementation for BitSet.stream()

Tue Nov 22 04:00:15 UTC 2016

Hello!

> Hi Tagir,
>
> In the original issue i was pondering using SIZED for smaller bit sets and non-SIZED for larger bit sets, since we stride over the longs themselves when calculating the size (should be intrinsic to count the set bits, at least on x86). Supporting both is fairly simple, but you are correct not reporting SIZED would simplify things further and it may not matter in practice.
>
> Can you log an issue? I will follow up with some performance analysis.

Sure, here it is:
https://bugs.openjdk.java.net/browse/JDK-8170159

Probably it would be optimal not to call cardinality until it's
actually queried. Unfortunately current stream implementation always
queries getExactSizeIfKnown() (passing it into sink.begin()) even if
it's not actually used (which is most of the cases).

With best regards,
Tagir Valeev.

>
> Thanks,
> Paul.
>
>> On 21 Nov 2016, at 05:59, Tagir F. Valeev <amaembo at gmail.com> wrote:
>>
>> Hello!
>>
>> Is it necessary to report SIZED characteristic and calculate the
>> cardinality in advance making two-pass traversal? This improves
>> bitSet.stream().toArray() performance (also bitSet.stream().count(),
>> but it's rare case in practice). However this is just waste of time
>> for any other stream operation. It's possible to remove SIZED
>> characteristic using "wordsInUse << ADDRESS_BITS_PER_WORD" as initial
>> size estimation (or better getFence() result). This would make
>> implementation simpler and a little bit faster for all scenarios
>> excluding toArray().
>>
>> What do you think?
>>
>> With best regards,
>> Tagir Valeev
>>
>>
>> PS> Hi,
>>
>> PS> Please review this patch implementing a spliterator for BitSet:
>>
>> PS>
>> PS> http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8072784-bitset-stream-spliterator/webrev/
>>
>> PS> The spliterator is SIZED but not SUBSIZED, the bit set’s
>> PS> cardinality is used (same for the iterator) as the size of the
>> PS> root spliterator, and estimates from that are derived thereafter
>> PS> for splits. Splitting attempts to balance the small vs. large, dense vs. sparse cases.
>>
>> PS> Most of the complexity is dealing with the edge case of there
>> PS> being a bit set at Integer.MAX_VALUE.
>>
>>
>> PS> Testing-wise i have leveraged the existing tests. It would be
>> PS> nice to consider placing the spliterator testing functionality
>> PS> into a separate library for reuse as
>> PS> SpliteratorTraversingAndSplittingTest is getting large. Likewise
>> PS> for more formally for streams, which is possible to reuse but a
>> PS> little clunky. I can log issues for those.
>>
>> PS> Paul.
>>
>