[External] : Re: Stream Gatherers (JEP 473) feedback

Archie Cobbs archie.cobbs at gmail.com
Mon Sep 23 16:53:48 UTC 2024


Hi Viktor,

Thanks for the response. I concur with all of your points below - nice
summary/analysis. Looking forward to this new use case.

-Archie

On Mon, Sep 23, 2024 at 10:38 AM Viktor Klang <viktor.klang at oracle.com>
wrote:

> Hi Archie!
>
> >I recently encountered a real-world use case for Stream Gatherers and
> thought I'd mention it (apologies if you've seen these ideas already).
> These might even be worth considering including as standard offerings.
>
> No need to preface your email with an apology 🙂 I'm happy to get
> thoughts, feedback, and ideas!
>
> You have a pretty interesting use-case, and performing a "sub-group
> reorder" is an interesting use-case.
> Pulling on that string a bit more, it would seem like its primary
> use-cases would lie in scenarios where you're consuming things like data
> files (think CSVs etc where there's a predefined order/grouping to the
> entries) or consuming things like database results, would that be a fair
> characterization?
>
> Interestingly, one of the exercises I went through during the design of
> Gatherer was to reimplement the java.util.stream.Stream interface only
> using Gatherer and Collector, so I took a stab at things like sorted() etc.
> As is common in the situation of operating on top of pre-ordered data,
> parallelization tends to be costly to implement (either by having to forego
> it, or by needing to perform much of the operation in the finisher after
> aggregating elements in the integrator).
>
> Quickly thinking about the operation, I presume you'd need to be able to
> track the current "primary group", collect elements into something like an
> ArrayList, and upon switching to a new primary group, you sort the list
> according to the secondary group, emit all elements for as long as
> Downstream::push allows you to, clear the list, enqueue the new element
> which had the new primary group. Then in the finisher you perform the same
> operation, presuming that that's the signal that there's no more elements
> in the last primary group.
>
> But I think you can relax the requirement that the primary group must be
> Comparable, you only need to have a BiPredicate and test (prev, next) for
> sameness, and if you get false, you presume that `next` belongs to a new
> primary group. But since you're going to sort the group on the secondary,
> that'll need to be comparable.
>
> Cheers,
>>
>
> *Viktor Klang*
> Software Architect, Java Platform Group
> Oracle
>
> ------------------------------
> *From:* Archie Cobbs <archie.cobbs at gmail.com>
> *Sent:* Friday, 20 September 2024 19:20
> *To:* Viktor Klang <viktor.klang at oracle.com>
> *Cc:* core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>
> *Subject:* [External] : Re: Stream Gatherers (JEP 473) feedback
>
> Hi Viktor,
>
> I recently encountered a real-world use case for Stream Gatherers and
> thought I'd mention it (apologies if you've seen these ideas already).
> These might even be worth considering including as standard offerings.
>
> Motivating example: Suppose you have a large list of people sorted by
> LastName. Your goal is to print the list ordered by LastName, then by
> FirstName. The list is too large to re-sort the whole thing.
>
> Instead, you only need to "fixup" the ordering, by re-sorting just the
> groups of people with the same last name.
>
> If you happen to know that there are no more than 100 people with the
> exact same last nam, then one option would be a Gatherer that re-sorts a
> stream, but only up to some maximum window size (if two elements were out
> of order and separated by more than the window size, then sorted output
> would no longer be guaranteed).
>
> In the above example, you would set the window size to 100 and it might
> look something like this:
>
> listOfPeople.stream()   // sorted by lastName only
>   .gather(Gatherers.windowSort(100,
> Comparator.comparing(Person::lastName).thenComparing(Person::firstName)));
>
> Another (probably better) option would be a "secondary sort" Gatherer.
> This would take an already sorted stream and then apply a secondary sort.
> It would collect items having the same primary sort key (however many there
> were), and then re-sort those groups all at once using the secondary key:
>
> listOfPeople.stream()   // sorted by lastName only
>   .gather(Gatherers.secondarySort(Comparator.comparing(Person::lastName),
> Comparator.comparing(Person::firstName)));
>
> This kind of thing is common when querying a database that doesn't have
> the required composite index corresponding to a desired composite sort,
> e.g., there is an index on LastName and an index on FirstName, but no
> composite index on LastName+FirstName, etc.
>
>
-- 
Archie L. Cobbs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240923/adf7efef/attachment.htm>


More information about the core-libs-dev mailing list