[External] : Re: Stream Gatherers (JEP 473) feedback

Viktor Klang viktor.klang at oracle.com
Mon Sep 23 15:38:28 UTC 2024


Hi Archie!

>I recently encountered a real-world use case for Stream Gatherers and thought I'd mention it (apologies if you've seen these ideas already). These might even be worth considering including as standard offerings.

No need to preface your email with an apology 🙂 I'm happy to get thoughts, feedback, and ideas!

You have a pretty interesting use-case, and performing a "sub-group reorder" is an interesting use-case.
Pulling on that string a bit more, it would seem like its primary use-cases would lie in scenarios where you're consuming things like data files (think CSVs etc where there's a predefined order/grouping to the entries) or consuming things like database results, would that be a fair characterization?

Interestingly, one of the exercises I went through during the design of Gatherer was to reimplement the java.util.stream.Stream interface only using Gatherer and Collector, so I took a stab at things like sorted() etc. As is common in the situation of operating on top of pre-ordered data, parallelization tends to be costly to implement (either by having to forego it, or by needing to perform much of the operation in the finisher after aggregating elements in the integrator).

Quickly thinking about the operation, I presume you'd need to be able to track the current "primary group", collect elements into something like an ArrayList, and upon switching to a new primary group, you sort the list according to the secondary group, emit all elements for as long as Downstream::push allows you to, clear the list, enqueue the new element which had the new primary group. Then in the finisher you perform the same operation, presuming that that's the signal that there's no more elements in the last primary group.

But I think you can relax the requirement that the primary group must be Comparable, you only need to have a BiPredicate and test (prev, next) for sameness, and if you get false, you presume that `next` belongs to a new primary group. But since you're going to sort the group on the secondary, that'll need to be comparable.

Cheers,
√


Viktor Klang
Software Architect, Java Platform Group
Oracle

________________________________
From: Archie Cobbs <archie.cobbs at gmail.com>
Sent: Friday, 20 September 2024 19:20
To: Viktor Klang <viktor.klang at oracle.com>
Cc: core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>
Subject: [External] : Re: Stream Gatherers (JEP 473) feedback

Hi Viktor,

I recently encountered a real-world use case for Stream Gatherers and thought I'd mention it (apologies if you've seen these ideas already). These might even be worth considering including as standard offerings.

Motivating example: Suppose you have a large list of people sorted by LastName. Your goal is to print the list ordered by LastName, then by FirstName. The list is too large to re-sort the whole thing.

Instead, you only need to "fixup" the ordering, by re-sorting just the groups of people with the same last name.

If you happen to know that there are no more than 100 people with the exact same last nam, then one option would be a Gatherer that re-sorts a stream, but only up to some maximum window size (if two elements were out of order and separated by more than the window size, then sorted output would no longer be guaranteed).

In the above example, you would set the window size to 100 and it might look something like this:

listOfPeople.stream()   // sorted by lastName only
  .gather(Gatherers.windowSort(100, Comparator.comparing(Person::lastName).thenComparing(Person::firstName)));

Another (probably better) option would be a "secondary sort" Gatherer. This would take an already sorted stream and then apply a secondary sort. It would collect items having the same primary sort key (however many there were), and then re-sort those groups all at once using the secondary key:

listOfPeople.stream()   // sorted by lastName only
  .gather(Gatherers.secondarySort(Comparator.comparing(Person::lastName), Comparator.comparing(Person::firstName)));

This kind of thing is common when querying a database that doesn't have the required composite index corresponding to a desired composite sort, e.g., there is an index on LastName and an index on FirstName, but no composite index on LastName+FirstName, etc.

-Archie

--
Archie L. Cobbs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240923/a71f863d/attachment-0001.htm>


More information about the core-libs-dev mailing list