RFR: JDK-8170945: Collectors$Partition should override more Map methods
Stuart Marks
smarks at openjdk.org
Tue Apr 11 01:22:42 UTC 2023
On Wed, 5 Apr 2023 09:19:57 GMT, Viktor Klang <duke at openjdk.org> wrote:
> Adds overrides for common Map operations to avoid having to allocate the entrySet for Collectors$Partition maps.
I don't think there is any useful relationship between this work at SequencedCollections / SequencedMap. The return type of this collector is simply `Map<...>` so the type information won't be visible and the new methods won't be visible either.
It would be useful to examine actual code to see how this is used. SourceGraph to the rescue:
https://sourcegraph.com/search?q=context%3Aglobal+lang%3Ajava+Collectors.partitioningBy&patternType=standard&sm=1&groupBy=repo
I looked at most of the results on the first page of hits, probably about 20-30 or so. Almost every single one was followed by a `map.get(true)` or `map.get(false)`, sometimes immediately, sometimes later in the method, or sometimes in a caller which accepted the partition map as a return value. (The only exception was Google Error Prone, which iterated over the map's keys not using its `keySet()` but using a for-each loop over `ImmutableList.of(false, true)` or `ImmutableList.of(true, false)` and then getting the value, which seemed kind of ... overkill to me.)
Anyway what this tells me is that `get()` is the most important thing to optimize, and filling out things like the various Map views is probably wasted effort.
As an aside, it's interesting to look at how code uses `partitioningBy`. For example, I saw some graph algorithm that was processing edges based on whether the edge was incoming or outgoing. The code looked something like this:
Map<Boolean, List<Edge>> partitionedEdges = edges.stream()....collect(partitioningBy(Edge::isOutgoing));
for (var edge : partitionedEdges.get(Boolean.TRUE)) { // outgoing edges
... many lines of code ...
}
... many lines of code ...
for (var edge : partitionedEdges.get(Boolean.FALSE)) { // incoming edges
...
}
(I note that lots of code used `Boolean.TRUE` and `Boolean.FALSE`, presumably to avoid boxing overhead, which is small, because it doesn't allocate anything, but which is a method call instead of a field load.)
Anyway though my conclusion is that having a boolean map is a lousy way to represent the return value. If you have a map, which elements are under true and which are false, if there's no obvious correlation between true/false and the property you're partitioning by, such as incoming/outgoing? Maybe you'd want something like this:
enum Direction { INCOMING, OUTGOING }
and then you could do a `groupingBy` to get a `Map<Direction, List<Edge>>` but dealing with a map is still kind of clunky.
Maybe what you want is a record instead!
record EdgesByDirection(List<Edge> incoming, List<Edge> outgoing) { }
OK, enough digression. If anybody is going to put effort in this area, I think it would be better to explore alternative directions such as the above instead of optimizing the "map" that's returned by the `partitioningBy` collector.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13347#issuecomment-1502548705
More information about the core-libs-dev
mailing list