Improving the Gatherer public API

Remi Forax forax at univ-mlv.fr
Wed Jan 17 15:38:56 UTC 2024


Hello,
i've played quite a lot with the Gatherer API and overall, i quite like how things work conceptually.
But i think the public API can be improved.


First, the documentation is not fully clear that a Gatherer has 3 characteristics,
1/ Is it sequential or parallelizable (can be parallel is ask)
2/ Is it stateless or stateful
3/ Is the integrator greedy or not.

There is also, is there a finisher or not, but this is less important.

When creating a Gatherer, a user can find the right method "Gatherer.of" by following this state diagram

    // if sequential
      // if stateless
        // if greedy
        Gatherer.ofSequential(Gatherer.Integrator.ofGreedy(integrator), finisher?)
        // otherwise short-circuit
        Gatherer.ofSequential(integrator, finisher?)
      // otherwise statefull
        // if greedy
        Gatherer.ofSequential(initializer, Gatherer.Integrator.ofGreedy(integrator), finisher?)
        // otherwise short-circuit
        Gatherer.ofSequential(initializer, integrator, finisher?)
    // otherwise parallelizable
      // if stateless
        // if greedy
        Gatherer.of(Gatherer.Integrator.ofGreedy(integrator), finisher?)
        // otherwise short-circuit
        Gatherer.of(integrator, finisher?)
      // otherwise stateful
        // if greedy
        Gatherer.of(initializer, Gatherer.Integrator.ofGreedy(integrator), combiner, finisher)
        // otherwise short-circuit
        Gatherer.of(initializer, integrator, combiner, finisher)

The first issue I stumble several time is that i've forgotten to use Integrator.ofGreedy(). I'm not the only one, at least both Viktor and Nicolai Parlog had the same issue. It's to easy to forget to call Integrator.ofGreedy().

I think the API should be slightly modified to force the user to make a choice between greedy or short-cuircuit.

I'm proposing to add a new parameter for all the factory methods, in front of the integrator, forcing the user to make a choice
  enum IntegratorKind {
    GREEDY, SHORT_CIRCUIT
  }

so the state diagram becomes

    // if sequential
      // if stateless
        // if greedy
        Gatherer.ofSequential(GREEDY, integrator, finisher?)
        // otherwise short-circuit
        Gatherer.ofSequential(SHORT_CIRCUIT, integrator, finisher?)
      // otherwise stateful
        // if greedy
        Gatherer.ofSequential(initializer, GREEDY, integrator, finisher?)
        // otherwise short-circuit
        Gatherer.ofSequential(initializer, SHORT_CIRCUIT, integrator, finisher?)
    // otherwise parallelizable
      // if stateless
        // if greedy
        Gatherer.of(GREEDY, integrator, finisher?)
        // otherwise short-circuit
        Gatherer.of(SHORT_CIRCUIT, integrator, finisher?)
      // otherwise stateful
        // if greedy
        Gatherer.of(initializer, GREEDY, integrator, combiner, finisher)
        // otherwise short-circuit
        Gatherer.of(initializer, SHORT_CIRCUIT, integrator, combiner, finisher)  

The second issue is that it's hard to implement a Gatherer that itself need a Gatherer as parameter (like andThen/compose). This is due to the fact that querying if a Gatherer is SEQUENTIAL, STATELESS or GREEDY is far for obvious. Instead of having a method characteristics() like a Collector, the way to query the characteristics is very add-hoc, using either the default combiner/initializer or instanceof on the integrator. 

  SEQUENTIAL: combiner == defaultCombiner
  STATELESS:  initializer == defaultInitializer
  GREEDY: integrator instanceof Greedy

I think the API should be simple if the default combiner/initializer and finisher were removed a method characteristics (and a default method hasCHaracteristics) were added.
It would be more like a Collector, simpler to user, and the default implementation of the combiner/initializer/finisher could be an impleemntation detail instead of being part of the API.

As a side note: the exact semantics of == on a lambda is not specified so the actual implementation ask users to rely on the way the current OpenJDK implementation works.

They are more mails to come on two other issues not fully realated to the public API of the Gatherer, so i'm trying to keep the mail short.

regards,
Rémi


More information about the core-libs-dev mailing list