BiCollector

Paul Sandoz paul.sandoz at oracle.com
Thu Jun 14 15:36:16 UTC 2018


Hi Peter,

I am not concerned about performance of Map.Entry. I find its use awkward for similar reasons as Brian outlined. Tagir’s approach using a finisher nicely side steps this at the expense of another function. If in the future we have an officially blessed pair/2-tuple class we can overload the collector factory method.

Regarding BitStream. A Stream could be bisected into a BiStream, then merged back into a Stream or bi-collect (something which i did not get time to look at). Having gone back and looked at the prototype work I tend to see a bisecting collector as complementary in this respect since we have similar operations for collection as we do on Stream.

Paul. 

> On Jun 14, 2018, at 12:29 AM, Peter Levart <peter.levart at gmail.com> wrote:
> 
> Hi Paul,
> 
> On 06/11/18 19:10, Paul Sandoz wrote:
>> Hi Peter,
>> 
>> I like it and can see it being useful, thanks for sharing. 
>> 
>> I am hesitating a little about it being in the JDK because there is the larger abstraction of a BiStream, where a similar form of collection would naturally fit (but perhaps without the intersection constraints for the characteristics?). We experimented a few times with BiStream and got quite far but decided pull back due to the lack of value types and specialized generics. So i dunno how this might turn out in the future and if your BiCollector fits nicely into such a future model.
>> 
>> What are you thoughts on this?
> 
> Well, I don't see the need to pack the two results into a Map.Entry (or any similar) container as a drawback. It's not a performance drawback for sure, because this is not happening on the stream-element scale, but on the final result or intermediate accumulation results scale (the later only in parallel non-CONCURRENT scenario). In non-parallel scenario, only a single (or two for non-IDENTITY_FINISH) Map.Entry objects are created.
> 
> I also don't see a larger abstraction like BiStream as a natural fit for a similar thing. As I understand BiStream attempts (maybe I haven't seen the right ones?), they are more about passing pairs of elements down the pipeline. BiCollector OTOH is "splitting" the single element pipeline at the final collection stage, with the     purpose of constructing two independent collections from a single pass of the original single-element Stream. This is more about single pass than anything else. And single pass, I think, is inevitably such that it can only execute a single collection strategy (CONCURRENT vs. not CONCURRENT), regardless of the type of     the stream (Stream vs. BiStream). Or have you prototyped a combined strategy in BiStream?
> 
> Regards, Peter
> 
>> FWIW i would call it a “splitting” or “bisecting" collector e.g. “s.collect(bisecting(…))”
>> 
>> Paul.
>> 
>> 
>> 
>> 
>>> On Jun 11, 2018, at 5:39 AM, Peter Levart <peter.levart at gmail.com> <mailto:peter.levart at gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> Have you ever wanted to perform a collection of the same Stream into two different targets using two Collectors? Say you wanted to collect Map.Entry elements into two parallel lists, each of them containing keys and values respectively. Or you wanted to collect elements into  groups by some key, but also count them at the same time? Currently this is not possible to do with a single Stream. You have to create two identical streams, so you end up passing Supplier<Stream> to other methods instead of bare Stream.
>>> 
>>> I created a little utility Collector implementation that serves the purpose quite well:
>>> 
>>> /**
>>>  * A {@link Collector} implementation taking two delegate Collector(s) and producing result composed
>>>  * of two results produced by delegating collectors, wrapped in {@link Map.Entry} object.
>>>  *
>>>  * @param <T> the type of elements collected
>>>  * @param <K> the type of 1st delegate collector collected result
>>>  * @param <V> tye type of 2nd delegate collector collected result
>>>  */
>>> public class BiCollector<T, K, V> implements Collector<T, Map.Entry<Object, Object>, Map.Entry<K, V>> {
>>>     private final Collector<T, Object, K> keyCollector;
>>>     private final Collector<T, Object, V> valCollector;
>>> 
>>>     @SuppressWarnings("unchecked")
>>>     public BiCollector(Collector<T, ?, K> keyCollector, Collector<T, ?, V> valCollector) {
>>>         this.keyCollector = (Collector) Objects.requireNonNull(keyCollector);
>>>         this.valCollector = (Collector) Objects.requireNonNull(valCollector);
>>>     }
>>> 
>>>     @Override
>>>     public Supplier<Map.Entry<Object, Object>> supplier() {
>>>         Supplier<Object> keySupplier = keyCollector.supplier();
>>>         Supplier<Object> valSupplier = valCollector.supplier();
>>>         return () -> new AbstractMap.SimpleImmutableEntry<>(keySupplier.get(), valSupplier.get());
>>>     }
>>> 
>>>     @Override
>>>     public BiConsumer<Map.Entry<Object, Object>, T> accumulator() {
>>>         BiConsumer<Object, T> keyAccumulator = keyCollector.accumulator();
>>>         BiConsumer<Object, T> valAccumulator = valCollector.accumulator();
>>>         return (accumulation, t) -> {
>>>             keyAccumulator.accept(accumulation.getKey(), t);
>>>             valAccumulator.accept(accumulation.getValue(), t);
>>>         };
>>>     }
>>> 
>>>     @Override
>>>     public BinaryOperator<Map.Entry<Object, Object>> combiner() {
>>>         BinaryOperator<Object> keyCombiner = keyCollector.combiner();
>>>         BinaryOperator<Object> valCombiner = valCollector.combiner();
>>>         return (accumulation1, accumulation2) -> new AbstractMap.SimpleImmutableEntry<>(
>>>             keyCombiner.apply(accumulation1.getKey(), accumulation2.getKey()),
>>>             valCombiner.apply(accumulation1.getValue(), accumulation2.getValue())
>>>         );
>>>     }
>>> 
>>>     @Override
>>>     public Function<Map.Entry<Object, Object>, Map.Entry<K, V>> finisher() {
>>>         Function<Object, K> keyFinisher = keyCollector.finisher();
>>>         Function<Object, V> valFinisher = valCollector.finisher();
>>>         return accumulation -> new AbstractMap.SimpleImmutableEntry<>(
>>>             keyFinisher.apply(accumulation.getKey()),
>>>             valFinisher.apply(accumulation.getValue())
>>>         );
>>>     }
>>> 
>>>     @Override
>>>     public Set<Characteristics> characteristics() {
>>>         EnumSet<Characteristics> intersection = EnumSet.copyOf(keyCollector.characteristics());
>>>         intersection.retainAll(valCollector.characteristics());
>>>         return intersection;
>>>     }
>>> }
>>> 
>>> 
>>> Do you think this class is general enough to be part of standard Collectors repertoire?
>>> 
>>> For example, accessed via factory method Collectors.toBoth(Collector coll1, Collector coll2), bi-collection could then be coded simply as:
>>> 
>>>         Map<String, Integer> map = ...
>>> 
>>>         Map.Entry<List<String>, List<Integer>> keys_values =
>>>             map.entrySet()
>>>                .stream()
>>>                .collect(
>>>                    toBoth(
>>>                        mapping(Map.Entry::getKey, toList()),
>>>                        mapping(Map.Entry::getValue, toList())
>>>                    )
>>>                );
>>> 
>>> 
>>>         Map.Entry<Map<Integer, Long>, Long> histogram_count =
>>>             ThreadLocalRandom
>>>                 .current()
>>>                 .ints(100, 0, 10)
>>>                 .boxed()
>>>                 .collect(
>>>                     toBoth(
>>>                         groupingBy(Function.identity(), counting()),
>>>                         counting()
>>>                     )
>>>                 );
>>> 
>>> 
>>> Regards, Peter
>>> 
> 



More information about the core-libs-dev mailing list