EnumeratedStream
ІП-24 Олександр Ротань
rotan.olexandr at gmail.com
Sun Apr 21 01:29:47 UTC 2024
Gatherers could be effectively terminal, but I don't think Gatherer API
designers intended it to be. In related JEP, gatherers are described as a
way to declare custom intermediate operations, and introducing "terminal"
gatherers would be misleading.
Talking about performance, not even considering gather() method itself,
creating an instance of Indexed object for each value in stream is costy
and might turn into a nightmare for infinite streams.
As for internals of Stream API right now, I am not aware about its current
state, so not much to say here except that adding a new type of streams
that just slightly extends existing functionality might not be that
harmful, I guess.
Regarding API exposure, I don't think that moving it from stream directly
to Gatherers factory would be much of a deal since to see index-aware
methods user must explicitly convert stream to enumerated.
I also think the I didn't express my point about index consistency clear
enough. Consider following:
List.of(1,2,3).stream().enumerated().filter((idx, i) -> i*idx <
2).map((idx, val) -> idx * val)
Result : (4, 9)
With your approach
List.of(1,2,3).stream().gather(Gatherers.filterIndexed((idx, val) ->
idx*val < 2)).gather(Gatherers.mapIndexed((idx, val) -> idx * val))
Result : (2, 6)
Not only second option is much more verbose, but also indexes of stream are
inconsistent between operations.
PS: regarding findIndex, I did some benchmarking today, you might like to
take a look. On average list-based version outperform collector and
gatherer-based in more then 10 times. And also lists in stdlib doesn't have
any hashcode-based implementations. I wrote 19 implementations in list
subclasses and none of them had anything but simple traversing logic inside
indexOf. But that's topic of another thread
On Sun, Apr 21, 2024, 04:03 - <liangchenblue at gmail.com> wrote:
> On Sat, Apr 20, 2024 at 7:44 PM ІП-24 Олександр Ротань <
> rotan.olexandr at gmail.com> wrote:
>
>> Also enumerated stream should also support index-aware terminal
>> operations, which getherers are incapable of, so it will also require to
>> create index-aware collectors. I am not aware if this is even possible, but
>> this looks like another separate functionality in different place, and some
>> developers might just don't be aware of its existence. I think that we as
>> language devs should think not only about what is possible in language, but
>> also about is it comfortable and is it obvious for user
>>
> Gatherers can become many-to-one; notice it has a state A, it can totally
> choose to only emit a single element R in its finisher (i.e. its integrator
> only touches state and element and ignores the downstream), then you can
> use findAny().orElseThrow() to access that single collector result. That
> said, the factory I proposed can try to wrap Collectors the same way it
> wraps Gatherers. See conclusion below.
>
>> On Sun, Apr 21, 2024, 03:36 ІП-24 Олександр Ротань <
>> rotan.olexandr at gmail.com> wrote:
>>
>>> Yes, I think every possible intermediate operation could be made index
>>> aware using gatherers. The point is: should it be turned?
>>>
>>> As a developers of jdk itself, we are not limited in a ways we could
>>> provide tools for Java users, especially when it comes to adding completely
>>> new features and not modifying existing apis.
>>>
>> Creating a new type of pipeline is only going to blow up the complexity
> of Streams; having IntStream, LongStream, DoubleStream and Stream, together
> with the 4-way spliterators (Spliterator.OfInt etc.) and iterators
> (PrimitiveIterator.OfInt etc.), is already an API nightmare. And the
> indexing will multiply all the intermediate and terminal operations by 2,
> further blowing up the API, which I don't believe is the right direction to
> go.
>
>>
>>>
>> Gatherer-based approach looks like we are developers of third party
>>> library that has to look for workarounds instead of directly adding
>>> features we need. It's syntax is more wordy without any payoff in
>>> flexibility, and obviously would be slower and memory-costy.
>>>
>> Gatherer is not a "third party hook", but an essential API that
> represents all possible stream operations, including Collector. Gatherer
> would not be slow; it already supports short-circuiting and should not add
> extra overheads, as Gatherer is like an API for all possible stream
> operations.
>
>>
>>> For me seems that implementing this using gatherer would only introduce
>>> unnecessary intermediate steps in operation internal pipeline without any
>>> visible payoff.
>>>
>> Implementing with Gatherer would reduce useless API exposure, as indexed
> operations are't that frequently used and are useless in parallel
> scenarios. Especially that these indexed operations aren't optimizable by
> stream subclasses, much like how findIndex is not helpful in Lists as
> Predicates can't be easily decoded like Object equivalence/hashCode, which
> some Lists can use to speed up indexOf.
>
>>
>>> Also, indexes created this way will be inconsistent between operations,
>>> and I am not sure if that is what we are looking for.
>>>
>> We declare an index-aware gatherer and my said factory converts it to an
> index-unaware, sequential gatherer; the factory gatherer prepares indices
> before calling our index-aware gatherer.
>
> For my factory, if you think my 4-line syntax above is too verbose, we can
> encapsulate those to become
> public static <T> Gatherer<T, ?, T> filter(Predicate<Indexed<T>> predicate)
> etc.
>
> And the primary methods will be:
> public static <T, A, R> Gatherer<T, ?, R> indexed(Gatherer<Indexed<T>, A,
> R> gatherer)
> public static <T, A, R> Collector<T, ?, R> indexed(Collector<Indexed<T>,
> A, R> collector)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240421/db895e8a/attachment.htm>
More information about the core-libs-dev
mailing list