Improving the Gatherer public API

Viktor Klang viktor.klang at oracle.com
Wed Jan 17 17:02:46 UTC 2024


Hi Rémi,

Thanks for your thoughts!

Improving the documentation of Gatherer / Gatherers is definitely something I am intending to have inspired by feedback received as people try it out, so your thoughts here are most welcome.

For the longest time Gatherer had Characteristics (just like Collector, which Gatherer borrows its design from) but in the end it didn't carry its weight and tracking the characteristics separately from the actual behavior turned out to be a source of both poor performance (since characteristics need to be composed) and a source of subtle defects.

The concern around the identity-equals of a lambda is a bit of a non-issue since we are comparing instances of the interfaces, and the default values are not lambdaform:https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/stream/Gatherers.java#L453

PS. Forgetting ofGreedy shouldn't affect semantics, it's just an evaluation hint which can result in higher performance.

Cheers,
√

Cheers,
√


Viktor Klang
Software Architect, Java Platform Group
Oracle
________________________________
From: core-libs-dev <core-libs-dev-retn at openjdk.org> on behalf of Remi Forax <forax at univ-mlv.fr>
Sent: Wednesday, 17 January 2024 16:38
To: core-libs-dev <core-libs-dev at openjdk.java.net>
Subject: Improving the Gatherer public API

Hello,
i've played quite a lot with the Gatherer API and overall, i quite like how things work conceptually.
But i think the public API can be improved.


First, the documentation is not fully clear that a Gatherer has 3 characteristics,
1/ Is it sequential or parallelizable (can be parallel is ask)
2/ Is it stateless or stateful
3/ Is the integrator greedy or not.

There is also, is there a finisher or not, but this is less important.

When creating a Gatherer, a user can find the right method "Gatherer.of" by following this state diagram

    // if sequential
      // if stateless
        // if greedy
        Gatherer.ofSequential(Gatherer.Integrator.ofGreedy(integrator), finisher?)
        // otherwise short-circuit
        Gatherer.ofSequential(integrator, finisher?)
      // otherwise statefull
        // if greedy
        Gatherer.ofSequential(initializer, Gatherer.Integrator.ofGreedy(integrator), finisher?)
        // otherwise short-circuit
        Gatherer.ofSequential(initializer, integrator, finisher?)
    // otherwise parallelizable
      // if stateless
        // if greedy
        Gatherer.of(Gatherer.Integrator.ofGreedy(integrator), finisher?)
        // otherwise short-circuit
        Gatherer.of(integrator, finisher?)
      // otherwise stateful
        // if greedy
        Gatherer.of(initializer, Gatherer.Integrator.ofGreedy(integrator), combiner, finisher)
        // otherwise short-circuit
        Gatherer.of(initializer, integrator, combiner, finisher)

The first issue I stumble several time is that i've forgotten to use Integrator.ofGreedy(). I'm not the only one, at least both Viktor and Nicolai Parlog had the same issue. It's to easy to forget to call Integrator.ofGreedy().

I think the API should be slightly modified to force the user to make a choice between greedy or short-cuircuit.

I'm proposing to add a new parameter for all the factory methods, in front of the integrator, forcing the user to make a choice
  enum IntegratorKind {
    GREEDY, SHORT_CIRCUIT
  }

so the state diagram becomes

    // if sequential
      // if stateless
        // if greedy
        Gatherer.ofSequential(GREEDY, integrator, finisher?)
        // otherwise short-circuit
        Gatherer.ofSequential(SHORT_CIRCUIT, integrator, finisher?)
      // otherwise stateful
        // if greedy
        Gatherer.ofSequential(initializer, GREEDY, integrator, finisher?)
        // otherwise short-circuit
        Gatherer.ofSequential(initializer, SHORT_CIRCUIT, integrator, finisher?)
    // otherwise parallelizable
      // if stateless
        // if greedy
        Gatherer.of(GREEDY, integrator, finisher?)
        // otherwise short-circuit
        Gatherer.of(SHORT_CIRCUIT, integrator, finisher?)
      // otherwise stateful
        // if greedy
        Gatherer.of(initializer, GREEDY, integrator, combiner, finisher)
        // otherwise short-circuit
        Gatherer.of(initializer, SHORT_CIRCUIT, integrator, combiner, finisher)

The second issue is that it's hard to implement a Gatherer that itself need a Gatherer as parameter (like andThen/compose). This is due to the fact that querying if a Gatherer is SEQUENTIAL, STATELESS or GREEDY is far for obvious. Instead of having a method characteristics() like a Collector, the way to query the characteristics is very add-hoc, using either the default combiner/initializer or instanceof on the integrator.

  SEQUENTIAL: combiner == defaultCombiner
  STATELESS:  initializer == defaultInitializer
  GREEDY: integrator instanceof Greedy

I think the API should be simple if the default combiner/initializer and finisher were removed a method characteristics (and a default method hasCHaracteristics) were added.
It would be more like a Collector, simpler to user, and the default implementation of the combiner/initializer/finisher could be an impleemntation detail instead of being part of the API.

As a side note: the exact semantics of == on a lambda is not specified so the actual implementation ask users to rely on the way the current OpenJDK implementation works.

They are more mails to come on two other issues not fully realated to the public API of the Gatherer, so i'm trying to keep the mail short.

regards,
Rémi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20240117/9495be72/attachment.htm>


More information about the core-libs-dev mailing list