RFR: JDK-8170945: Collectors$Partition should override more Map methods
Viktor Klang
duke at openjdk.org
Tue Apr 11 08:47:34 UTC 2023
On Tue, 11 Apr 2023 01:18:53 GMT, Stuart Marks <smarks at openjdk.org> wrote:
>> Adds overrides for common Map operations to avoid having to allocate the entrySet for Collectors$Partition maps.
>
> I don't think there is any useful relationship between this work at SequencedCollections / SequencedMap. The return type of this collector is simply `Map<...>` so the type information won't be visible and the new methods won't be visible either.
>
> It would be useful to examine actual code to see how this is used. SourceGraph to the rescue:
>
> https://sourcegraph.com/search?q=context%3Aglobal+lang%3Ajava+Collectors.partitioningBy&patternType=standard&sm=1&groupBy=repo
>
> I looked at most of the results on the first page of hits, probably about 20-30 or so. Almost every single one was followed by a `map.get(true)` or `map.get(false)`, sometimes immediately, sometimes later in the method, or sometimes in a caller which accepted the partition map as a return value. (The only exception was Google Error Prone, which iterated over the map's keys not using its `keySet()` but using a for-each loop over `ImmutableList.of(false, true)` or `ImmutableList.of(true, false)` and then getting the value, which seemed kind of ... overkill to me.)
>
> Anyway what this tells me is that `get()` is the most important thing to optimize, and filling out things like the various Map views is probably wasted effort.
>
> As an aside, it's interesting to look at how code uses `partitioningBy`. For example, I saw some graph algorithm that was processing edges based on whether the edge was incoming or outgoing. The code looked something like this:
>
> Map<Boolean, List<Edge>> partitionedEdges = edges.stream()....collect(partitioningBy(Edge::isOutgoing));
> for (var edge : partitionedEdges.get(Boolean.TRUE)) { // outgoing edges
> ... many lines of code ...
> }
> ... many lines of code ...
> for (var edge : partitionedEdges.get(Boolean.FALSE)) { // incoming edges
> ...
> }
>
> (I note that lots of code used `Boolean.TRUE` and `Boolean.FALSE`, presumably to avoid boxing overhead, which is small, because it doesn't allocate anything, but which is a method call instead of a field load.)
>
> Anyway though my conclusion is that having a boolean map is a lousy way to represent the return value. If you have a map, which elements are under true and which are false, if there's no obvious correlation between true/false and the property you're partitioning by, such as incoming/outgoing? Maybe you'd want something like this:
>
> enum Direction { INCOMING, OUTGOING }
>
> and then you could do a `groupingBy` to get a `Map<Direction, List<Edge>>` but dealing with a map is still kind of clunky.
>
> Maybe what you want is a record instead!
>
> record EdgesByDirection(List<Edge> incoming, List<Edge> outgoing) { }
>
> OK, enough digression. If anybody is going to put effort in this area, I think it would be better to explore alternative directions such as the above instead of optimizing the "map" that's returned by the `partitioningBy` collector.
>
> --
>
> For this change, I'd definitely keep the `get` override. You have to override `entrySet` because it's abstract in `AbstractMap`, so something like this would work:
>
> return Set.of(Map.entry(false, forFalse), Map.entry(true, forTrue));
>
> I'm not sure it's worth overriding even `size` and `isEmpty`; those just get delegated to the `entrySet` anyway (so there is some creation overhead) but nobody seems to use them anyway.
@stuart-marks Yes, it would perhaps make more sense to have a dedicated structure to represent the return value. But as you say, it'd be a different issue altogether. I'll leave it at the proposed changes and hope we can integrate this. :)
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13347#issuecomment-1502924810
More information about the core-libs-dev
mailing list