JEP 473: Proposal for new built-in gatherer `indexed`

Viktor Klang viktor.klang at oracle.com
Thu Dec 5 17:45:58 UTC 2024


Hi!

I've been thinking a bit about it [the problem] and arrived somewhere along the lines of it not necessarily being worth adding a new type to represent an index and an element (or trying to repurpose something existing), as I think the ergonomics of providing a mapper BiFunction is better, somewhere along the lines of:

public static <T, R> Gatherer<T, ?, R> mapIndexed(BiFunction<Long, ? super T, ? extends R> mapper) {
   class Index { long at = 0; }
   return Gatherer.ofSequential(
       Index::new,
       (idx, e, d) -> d.push(mapper.apply(idx.at++, e))
   );
}

(Using longs for indexing seems sensible for something which could have unbounded length)

Which would mean that if you have your own Pair class, or want to represent it as a Map.Entry, it's pretty straight-forward to do:

stream.gather(mapIndexed(Pair::of))...

It is unfortunate that parallelization takes a hit in this use-case, but knowing what indicies a sub-segment of the Stream has depends on the known size of the stream—and I wouldn't be surprised that out-of-order processing of indices can be surprising to people, so perhaps an ofSequential(…) isn't all that bad.

With that being said, including new Gatherers in the stdlib is important to be done only after thorough evaluation of need.

Cheers,
√


Viktor Klang
Software Architect, Java Platform Group
Oracle
________________________________
From: core-libs-dev <core-libs-dev-retn at openjdk.org> on behalf of Olexandr Rotan <rotanolexandr842 at gmail.com>
Sent: Thursday, 5 December 2024 18:20
To: Henrik Wall <xehpuk.dev at gmail.com>
Cc: core-libs-dev <core-libs-dev at openjdk.org>
Subject: Re: JEP 473: Proposal for new built-in gatherer `indexed`


Hi. There has been a proposal from me (that Chen mentioned), approximately half a year ago. At the time I have insisted on creating stream sub interface, and even got a working prototype for sequential streams, but there have been such a huge complexity blowup in parallisation that I have just decided to drop it. Gatherers can be used pretty easily for this task, but using ofSequential, sacrificing parallelism. So basically, parallelism is a pain point here (or performance). I am not saying that it is impossible to console enumeration and prallelisation, but it will require huge efforts and invasive changes in current *Pipeline implementations, or enormous amounts of code duplication

On Thu, Dec 5, 2024, 18:48 Henrik Wall <xehpuk.dev at gmail.com<mailto:xehpuk.dev at gmail.com>> wrote:
Hey folks,

Not having access to the index of an element of a stream is often a
reason to fall back to a traditional loop, at least for me. I'd love
to have `Gatherers.indexed()` that looks something like this:

public static <TR> Gatherer<TR, ?, Map.Entry<Integer, TR>> indexed() {
    return Gatherer.ofSequential(
            () -> new int[1],
            Gatherer.Integrator.ofGreedy((state, element, downstream) ->
                    downstream.push(Map.entry(state[0]++, element)))
    );
}

(Potentially with a custom pair class to avoid auto-boxing.)

In other popular languages like Python or Rust, this is also called `enumerate`.

Any chance to get that in a future release?

Regards,
Henrik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20241205/9435248c/attachment-0001.htm>


More information about the core-libs-dev mailing list