Gatherer: spliterator characteristics are not propagated

forax at univ-mlv.fr forax at univ-mlv.fr
Mon Jan 22 18:56:40 UTC 2024


> From: "Viktor Klang" <viktor.klang at oracle.com>
> To: "Remi Forax" <forax at univ-mlv.fr>
> Cc: "core-libs-dev" <core-libs-dev at openjdk.java.net>, "Paul Sandoz"
> <psandoz at openjdk.java.net>
> Sent: Monday, January 22, 2024 4:22:11 PM
> Subject: Re: Gatherer: spliterator characteristics are not propagated

> Hi Rémi,

Hello, 

> For instance, stateless is neither recessive nor dominant, since the composition
> of two stateless operations is only ever stateless if they both are greedy as
> well: [
> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/Gatherers.java#L588
> |
> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/Gatherers.java#L588
> ]

Okay, so choosing SEQUENTIAL vs PARALELLIZABLE is not that important given that the combination is ad-hoc, reflecting the characterristics is enough. 

> So even if making it represented as ints (more like Spliterator, rather than
> Collector) makes things faster, it's still both work to track, propagate, and
> also becomes a side-channel that needs to remain in sync with the actual
> implementation of the logic.

The flags are in sync with the implementation because the only way to create a Gatherer if through the factory methods and those factory methods (and only them) compute the proper combination of SEQUENTIAL | STATELESS | GREEDY. A user should not be able to set those flags. Only the flags KEEP_* are settable by a user. 

> One could argue that logic such as: someCollection.stream().map(…).count() is a
> performance bug/inefficiency in an of itself as it would be faster to do
> someCollection.size().

The stream implementation has a whole mechanism in place to propagate/preverse flags like SIZED, DISTINCT or SORTED. For me, discussing about the merit of this mechanism seems a little off topic. I would prefer the Gatherer to be a good citizen and works seemlessly with the other intermediary operations. 

> Cheers,
>
regards, 
Rémi 

> Viktor Klang
> Software Architect, Java Platform Group
> Oracle

> From: forax at univ-mlv.fr <forax at univ-mlv.fr>
> Sent: Saturday, 20 January 2024 17:40
> To: Viktor Klang <viktor.klang at oracle.com>
> Cc: core-libs-dev <core-libs-dev at openjdk.java.net>; Paul Sandoz
> <psandoz at openjdk.java.net>
> Subject: [External] : Re: Gatherer: spliterator characteristics are not
> propagated

>> From: "Viktor Klang" <viktor.klang at oracle.com>
>> To: "Remi Forax" <forax at univ-mlv.fr>
>> Cc: "core-libs-dev" <core-libs-dev at openjdk.java.net>, "Paul Sandoz"
>> <psandoz at openjdk.java.net>
>> Sent: Thursday, January 18, 2024 5:14:38 PM
>> Subject: Re: Gatherer: spliterator characteristics are not propagated

>>> And for A.andThen(B), A.flags & B.flags should work, the stream is sorted if
>>> both gatherers keep it sorted.

>> That is unfortunately not the case. That would presume that you can implement
>> the composition such that it can preserve all the common flags. Some flags
>> could be "dominant" and some "recessive" to use genetics nomenclature.

> Some flags of the stream pipeline are "recessive", mainly SHORT_CIRCUIT, but not
> the characteristics of the Gatherer which can have the corresponding "dominant"
> flag, GREEDY, in this case.
> And the same for sequential, the flag should be PARALELIZABLE and not
> SEQUENTIAL.

> The idea is that the Gatherer characteristics can have the same bit set at the
> same position as the stream op flags (as defined by StreamOpFlag).
> So KEEP_DISTINCT is in position 0, KEEP_SORTED in in position 2 and KEEP_SIZED
> is in position 3.
> For GREEDY, we use the same position as SHORT_CIRCUIT and we will flip the bit
> (using XOR) when we want to extract the stream op flags from the
> characteristics

> All the factory methods call the generic of() with a combination of
> PARALELIZABLE and STATELESS and the user can adds the characteristics GREEDY,
> KEEP_DISTINCT, KEEP_SORTED and KEEP_SIZED (otherwise an exception should be
> raised).

> In StreamOpFlag, there are two unused positions (14 and 15), that's perfect for
> our two new states PARALELIZABLE and STATELESS, so no problem here (technically
> we can also reuse positions of the Spliterator characteristic given that those
> flags are masked before being sent to the GathererOp super constructor).

> The way to transform a Gatherer characteristics op to a stream flags op is to
> flip the bits corresponding to SHORT_CIRCUIT, add the highter bit of all flags
> but SHORT-CIRCUIT (because stream op flags are encoded using 2 bits) and mask
> to only retain SHORT_CIRCUIT, KEEP_DISTINCT, KEEP_SORTED and KEEP_SIZED.

> public static int toOpFlags ( int characteristics ) {
> return (( characteristics ^ SHORT_CIRCUIT_MASK ) | HIGHER_BITS ) &
> STREAM_OP_MASK ;
> }

> see below for a full script.

>>> I suppose that if those flags already exist, it's because they have a purpose
>>> and i do not understand how it can make the other operations slower.

>> Extra invocations, extra storage, and extra composition overhead is not free.
>> Since Stream is one-shot you need to include the construction cost with the
>> execution cost. For something like an empty Stream construction cost scan be
>> 90+% of the total costs.

> If you create a Gatherer, the characteristics is a constant (so the validity
> check is removed, it's just a mask and a test) so the result of calling
> toOpFlags() is a constant too.

> If the factory method is not inlined, the cost is 3 bitwise operations which is
> I believe faster than the "instanceof Greedy" used in the current code.

>> Cheers,
>>
> regards,
> Rémi

> ---
> public class GathererFlag {
> // cut and paste from StreamOpFlag
> /**
> * The bit pattern for setting/injecting a flag.
> */
> private static final int SET_BITS = 0b01 ;

> /**
> * The bit pattern for clearing a flag.
> */
> private static final int CLEAR_BITS = 0b10 ;

> /**
> * The bit pattern for preserving a flag.
> */
> private static final int PRESERVE_BITS = 0b11 ;

> private static int position ( int opFlagSet ) {
> return Integer . numberOfTrailingZeros ( opFlagSet ) >> 1 ;
> }

> private static final int DISTINCT_POSITION = position ( StreamOpFlag .
> IS_DISTINCT );
> private static final int SORTED_POSITION = position ( StreamOpFlag . IS_SORTED
> );
> private static final int SIZED_POSITION = position ( StreamOpFlag . IS_SIZED );

> private static final int SHORT_CIRCUIT_POSITION = position ( StreamOpFlag .
> IS_SHORT_CIRCUIT );
> private static final int STATELESS_POSITION = 14 ;
> private static final int PARELLIZABLE_POSITION = 15 ;

> public static final int PARELLIZABLE = SET_BITS << ( PARELLIZABLE_POSITION << 1
> );
> public static final int STATELESS = SET_BITS << ( STATELESS_POSITION << 1 );
> public static final int GREEDY = SET_BITS << ( SHORT_CIRCUIT_POSITION << 1 );

> public static final int KEEP_DISTINCT = SET_BITS << ( DISTINCT_POSITION << 1 );
> public static final int KEEP_SORTED = SET_BITS << ( SORTED_POSITION << 1 );
> public static final int KEEP_SIZED = SET_BITS << ( SIZED_POSITION << 1 );

> private static final int SHORT_CIRCUIT_MASK = SET_BITS << (
> SHORT_CIRCUIT_POSITION << 1 );

> // no GREEDY here
> private static final int HIGHER_BITS = ( PARELLIZABLE | STATELESS |
> KEEP_DISTINCT | KEEP_SORTED | KEEP_SIZED ) << 1 ;

> private static final int STREAM_OP_MASK =
> (( GREEDY | KEEP_DISTINCT | KEEP_SORTED | KEEP_SIZED ) << 1 ) |
> GREEDY | KEEP_DISTINCT | KEEP_SORTED | KEEP_SIZED ;

> public static String toString ( int characteristics ) {
> return Stream . of ( characteristics )
> .< String >mapMulti(( f , consumer ) -> {
> if (( f & PARELLIZABLE ) == PARELLIZABLE ) {
> consumer .accept( "PARELLIZABLE" );
> }
> if (( f & STATELESS ) == STATELESS ) {
> consumer .accept( "STATELESS" );
> }
> if (( f & GREEDY ) == GREEDY ) {
> consumer .accept( "GREEDY" );
> }
> if (( f & KEEP_DISTINCT ) == KEEP_DISTINCT ) {
> consumer .accept( "KEEP_DISTINCT" );
> }
> if (( f & KEEP_SORTED ) == KEEP_SORTED ) {
> consumer .accept( "KEEP_SORTED" );
> }
> if (( f & KEEP_SIZED ) == KEEP_SIZED ) {
> consumer .accept( "KEEP_SIZED" );
> }
> })
> .collect( Collectors . joining ( ", " ));
> }

> public static int toOpFlags ( int characteristics ) {
> return (( characteristics ^ SHORT_CIRCUIT_MASK ) | HIGHER_BITS ) &
> STREAM_OP_MASK ;
> }

> public static String toOpFlagsString ( int opFlags ) {
> return Arrays . stream ( StreamOpFlag . values ())
> .map( op -> {
> if ( op .isPreserved( opFlags )) {
> return "preserved " + op ;
> }
> if ( op .isCleared( opFlags )) {
> return "cleared " + op ;
> }
> if ( op .isKnown( opFlags )) {
> return "set " + op ;
> }
> return "?? " + op ;
> })
> .collect( Collectors . joining ( ", " ));
> }

> void main () {
> var characteristics = PARELLIZABLE | STATELESS | GREEDY | KEEP_DISTINCT |
> KEEP_SORTED | KEEP_SIZED ;
> System . out .println( toOpFlagsString ( toOpFlags ( characteristics )));

> var characteristics2 = STATELESS | KEEP_DISTINCT | KEEP_SIZED ;
> System . out .println( toOpFlagsString ( toOpFlags ( characteristics2 )));
> }
> }

>> Viktor Klang
>> Software Architect, Java Platform Group
>> Oracle

>> From: forax at univ-mlv.fr <forax at univ-mlv.fr>
>> Sent: Thursday, 18 January 2024 16:17
>> To: Viktor Klang <viktor.klang at oracle.com>
>> Cc: core-libs-dev <core-libs-dev at openjdk.java.net>; Paul Sandoz
>> <psandoz at openjdk.java.net>
>> Subject: [External] : Re: Gatherer: spliterator characteristics are not
>> propagated

>>> From: "Viktor Klang" <viktor.klang at oracle.com>
>>> To: "Remi Forax" <forax at univ-mlv.fr>
>>> Cc: "core-libs-dev" <core-libs-dev at openjdk.java.net>, "Paul Sandoz"
>>> <psandoz at openjdk.java.net>
>>> Sent: Thursday, January 18, 2024 3:36:07 PM
>>> Subject: Re: Gatherer: spliterator characteristics are not propagated

>>> I suspect that it is a rather slippery slope, once KEEP-flags are added, then
>>> others will want to be able to have INJECT-flags, and then people might have
>>> different opinions w.r.t. the default should be to clear all flags etc.

>>> And that's even before one looks at the composition-part of it, what are the
>>> flags for A.andThen(B)? (then extrapolate to N compositions and the available
>>> set of flags always approaches 0)

>>> I spent quite a bit of time on this and in the end tracking all this info, and
>>> making sure that the flags of implementations correspond to the actual
>>> behavior, just ended up costing performance for most streams and introduced an
>>> extra dimension to creation and maintenance which I had a hard time justifying.

>> It can be a slippery slope if we were designing from the ground up but the
>> stream implementation already exists and SORTED, DISTINCT and SIZED are the
>> flags that are already tracked by the current implementation.

>> Currently only SHORT_CIRCUIT is set (if not greedy),
>> see [
>> https://urldefense.com/v3/__https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/GathererOp.java*L209__;Iw!!ACWV5N9M2RV99hQ!PhMxqlDzLWPRuwYc7ECRKNPVs0BtnoE-RdT-Jdkng7S-iFuERAHYcWvJ-OMKGLrkPdSrUl3xj1R9ypyeqeWI$
>> |
>> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/GathererOp.java#L209
>> ]

>> And for A.andThen(B), A.flags & B.flags should work, the stream is sorted if
>> both gatherers keep it sorted.

>>> Making specific, rare, combinations of operations faster at the expense of
>>> making 99% of all others slower is a hard pill for most to swallow.

>> I suppose that if those flags already exist, it's because they have a purpose
>> and i do not understand how it can make the other operations slower.

>>> Cheers,
>>>
>> regards,
>> Rémi

>>> Viktor Klang
>>> Software Architect, Java Platform Group
>>> Oracle

>>> From: forax at univ-mlv.fr <forax at univ-mlv.fr>
>>> Sent: Thursday, 18 January 2024 10:28
>>> To: Viktor Klang <viktor.klang at oracle.com>
>>> Cc: core-libs-dev <core-libs-dev at openjdk.java.net>; Paul Sandoz
>>> <psandoz at openjdk.java.net>
>>> Subject: [External] : Re: Gatherer: spliterator characteristics are not
>>> propagated

>>>> From: "Viktor Klang" <viktor.klang at oracle.com>
>>>> To: "Remi Forax" <forax at univ-mlv.fr>, "core-libs-dev"
>>>> <core-libs-dev at openjdk.java.net>
>>>> Sent: Wednesday, January 17, 2024 8:48:07 PM
>>>> Subject: Re: Gatherer: spliterator characteristics are not propagated

>>>> Hi Rémi,

>>>> You can find some of my benches here: [
>>>> https://urldefense.com/v3/__https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/util/stream/ops/ref__;!!ACWV5N9M2RV99hQ!JJy6F9NoL6wKZQK5158up_fTRvH8X7F6JK8T7Euuf8vzbSQbr23eWa9S_yb61ksONVrLrdesCF_au5zyje2l$
>>>> |
>>>> https://github.com/openjdk/jdk/tree/master/test/micro/org/openjdk/bench/java/util/stream/ops/ref
>>>> ]

>>>> Initially I had Characteristics such as ORDERED etc on Gatherer but it just
>>>> didn't end up worth it when looking at the bench results over a wide array of
>>>> stream sizes and number of operations.

>>> I think there are 3 gatherer characteristics that make sense: KEEP_SORTED,
>>> KEEP_DISTINCT and KEEP_SIZED,
>>> all of them say that if the stream was sorted/distinct/sized then the stream
>>> returned by a call to gather() is still sorted (with the same comparator),
>>> distinct or sized.

>>> As examples, map() is KEEP_SIZED, filter() is KEEP_SORTED | KEEP_DISTINCT and
>>> windowFixed is KEEP_DISTINCT.

>>> [CC Paul, so he can correct me if i'm saying something stupid]

>>> Now for the benchmarks, it depends what you want to measure, benchmarking
>>> streams is tricky. This is what i know about benchmarking streams.
>>> First, the JIT has two ways to profile types at runtime,
>>> Either a method takes a function as parameter
>>> void map(Function function) {
>>> function.apply(...)
>>> }
>>> and when map is called with a subtype of Function, the JIT will propagate the
>>> exact type when map is inlined,
>>> Or a method use a field
>>> class Op {
>>> Function function;

>>> void map() {
>>> function.apply(...)
>>> }
>>> }
>>> in that case, the VM records the profile of function.apply() and if there are
>>> more than two different profiles, the VM declare profile poluttion and do not
>>> try to optimize.

>>> The Stream implementation tries very hard to use only parameters instead of
>>> fields, that's why it does not use classical Iterator that are pull iterator (a
>>> filter iterator requires a field) but a Spliterator which is a push iterator,
>>> the element is sent as parameter of the consumer.That's also why collect does
>>> not use the builder pattern (that accumulate values in fields) but a Collector
>>> that publish the functions to be called as parameter.

>>> Obvisously, this is more complex than that, a Collector stores the functions in
>>> fields so it should not work well but the implementation uses a record that
>>> plays well with escape analysis. Escape analysis see the fields of an instance
>>> as parameters so the functions of a Collector are correctly propagated (if the
>>> escape analysis works). And lambdas are using invokedynamic, and the VM tries
>>> really hard to inline invokedynamic, so lambdas (that captures value or not)
>>> are routinely fully inlined with the intermediate operation of a stream.

>>> In your tests, i've not seen comparaisons between an existing method like map()
>>> or filter() followed by a sorted()/distinct()/count()/toList(), i.e. operations
>>> where the characteristics KEEP_* have an impact and their equivalent using a
>>> Gatherer.

>>>> Cheers,
>>>>
>>> regards,
>>> Rémi

>>>> Viktor Klang
>>>> Software Architect, Java Platform Group
>>>> Oracle

>>>> From: core-libs-dev <core-libs-dev-retn at openjdk.org> on behalf of Remi Forax
>>>> <forax at univ-mlv.fr>
>>>> Sent: Wednesday, 17 January 2024 16:48
>>>> To: core-libs-dev <core-libs-dev at openjdk.java.net>
>>>> Subject: Gatherer: spliterator characteristics are not propagated
>>>> While doing some benchmarking of the Gatherer API, i've found that the
>>>> characteristics of the spliterator was not propagated by the method
>>>> Stream.gather() making the stream slower than it should.

>>>> As an example, there is no way when reimplementing map() using a Gatherer to say
>>>> that this intermediate operation keep the size, which is important if the
>>>> terminal operation is toList() because if the size is known, toList() will
>>>> presize the List and avoid the creation of an intermediary ArrayList.

>>>> See [
>>>> https://urldefense.com/v3/__https://github.com/forax/we_are_all_to_gather/blob/master/src/main/java/com/gihtub/forax/wearealltogather/bench/MapGathererBenchmark.java__;!!ACWV5N9M2RV99hQ!JJy6F9NoL6wKZQK5158up_fTRvH8X7F6JK8T7Euuf8vzbSQbr23eWa9S_yb61ksONVrLrdesCF_auzwTY8aB$
>>>> |
>>>> https://github.com/forax/we_are_all_to_gather/blob/master/src/main/java/com/gihtub/forax/wearealltogather/bench/MapGathererBenchmark.java
>>>> ]

>>>> I think that adding a way to propagate the spliterator characteristics through a
>>>> Gatherer would greatly improve the performance of commons streams (at least all
>>>> the ones that end with a call to toList).

>>>> I have some idea of how to do that, but I prefer first to hear if i've overlook
>>>> something and if improving the performance is something worth changing the API.

>>>> regards,
>>>> Rémi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240122/668a6036/attachment-0001.htm>


More information about the core-libs-dev mailing list