RFR 9: JDK-8040892 Incorrect message in Exception thrown by Collectors.toMap(Function, Function)

Fri Apr 25 17:31:52 UTC 2014

On 04/25/2014 06:13 PM, Peter Levart wrote:
> Hi Paul,
>
> On 04/25/2014 02:12 PM, Paul Sandoz wrote:
>> On Apr 23, 2014, at 11:06 AM, Peter Levart<peter.levart at gmail.com>  wrote:
>>
>>> Hi,
>>>
>>> I propose a patch for:
>>>
>>>     https://bugs.openjdk.java.net/browse/JDK-8040892
>>>
>> +1, nice use of putIfAbsent. Just double checking, did you run the stream unit tests?
>
> Yes, all 60 /java/util/stream jtreg tests pass.
>
>> I think key step was to separate map merge and accumulate functions for the toMap* methods not accepting a value merge function. Now that is the case one could still use merge, but that would be less efficient than putIfAbsent since that does not require a lambda capturing the key.
>>
>>
>> That fix caused me to think that we should update the value merge function accepting toMap* methods for the case when the merge function returns null, as removing an entry would be most undesirable.
>
> Yes, removing doesn't make sense. It's undeterministic. For example, 
> if the merge function always returns null, then duplicate keys result 
> in entry being absent, triplicate keys result in entry (the last one 
> merged) being present, quadruple keys result in entry being absent 
> again, etc ...

Hi Paul,

To make any sense of null return from the mergeFunction, which could 
mean that an entry with that key is absent from resulting map, such 
Collector would have to keep some more state - the collecting map (which 
is also returned at the end) and the set of removed keys. For example:

     public static <T, K, U, M extends Map<K, U>>
     Collector<T, ?, M> toMap(Function<? super T, ? extends K> keyMapper,
                              Function<? super T, ? extends U> valueMapper,
                              BinaryOperator<U> mergeFunction,
                              Supplier<M> mapSupplier) {

         class State {
             final M map = mapSupplier.get();
             final Set<K> removedKeys = new HashSet<>();
             M map() { return map; }
         }

         BiConsumer<State, T> accumulator = (state, element) -> {
                 K k = keyMapper.apply(element);
                 if (!state.removedKeys.contains(k)) {
                     U u = state.map.merge(k, 
valueMapper.apply(element), mergeFunction);
                     if (u == null) state.removedKeys.add(k);
                 }
         };

         BinaryOperator<State> merger = (state1, state2) -> {
state1.map.keySet().removeAll(state2.removedKeys);
state2.map.keySet().removeAll(state1.removedKeys);
             state1.removedKeys.addAll(state2.removedKeys);
             for (Map.Entry<K,U> e : state2.map.entrySet()) {
                 U u = state1.map.merge(e.getKey(), e.getValue(), 
mergeFunction);
                 if (u == null) state1.removedKeys.add(e.getKey());
             }
             return state1;
         };

         return new CollectorImpl<>(State::new, accumulator, merger, 
State::map, CH_ID);
     }

...but the concurrent variant would be tricky to implement.

Regards, Peter

>
>> A simple fix would be to throw an NPE if the merge function returns null, if it is acceptable to induce a side-effect of a removal, which seems reasonable given a partial result would any be returned if a null check is performed on the result of the merge function, otherwise that merge function requires wrapping e.g. mergeFunction.andThen(Objects::requireNonNull).
>
> I think that allowing null return from mergeFunction to remove the 
> entry from the (maybe intermediate) map is just confusing. Throwing 
> NPE is the most sensible thing we can do here.
>
> Regards, Peter
>
>> Paul.
>>
>>
>>> Here's the webrev:
>>>
>>> http://cr.openjdk.java.net/~plevart/jdk9-dev/Collectors.duplicateKey/webrev.02/
>>>
>>> There has already been a discussion on the lambda-dev about this bug:
>>>
>>> http://mail.openjdk.java.net/pipermail/lambda-dev/2014-April/012005.html
>>>
>>>
>>> Initially a fix has been proposed which used a private RuntimeException subclass thrown by the merger function in order to transport the values that attempted to be merged out of Map.merge() method to an outer context where they could be combined with the key to produce a friendlier "duplicate key" message. It turns out that there is an alternative approach that doesn't require exception acrobatics - using Map.putIfAbsent() instead of Map.merge(). I checked both methods in Map defaults and in HashMap implementations for possible semantic differences. Same behaviour can be achieved by requiring that the value passed to Map.putIfAbsent(key, value) is non-null and throwing NPE if it isn't.
>>>
>>>
>>> Regards, Peter
>