Possible groupingBy simplification?
Brian Goetz
brian.goetz at oracle.com
Wed Apr 10 11:11:19 PDT 2013
After staring at groupingBy and toMap for a while, I think there's a
nice middle ground which should address the key use cases while reducing
a little bit of the "which one do I use":
groupingBy(f)
groupingBy(f, downstreamCollector)
groupingBy(f, mapSupplier, downstreamCollector)
toMap(keyFn, valFn)
toMap(keyFn, valFn, mergeFn)
toMap(keyFn, valFn, mergeFn, mapSupplier)
This cuts variants of each from 4 to 3, but more importantly, orders
them into a nice telescoping set.
Those wanting the groupingBy(f, mapSUpplier) version should be able to
figure out easily (with aid from doc) that they can use groupingBy(f,
mapSUpplier, toList()).
On 4/10/2013 1:10 PM, Remi Forax wrote:
> Joe,
> collect(toList(), groupingBy(f));
> => how do you express the fact that you may want to group in cascade ?
>
> collect(groupingBy(f)).toList()
> => what is the resulting type of collect(groupingBy(f)) ?
> is it a super-type of Stream ?
>
> Brian,
> I'm fine with the proposed changes.
>
> Rémi
>
> On 04/10/2013 06:42 PM, Joe Bowbeer wrote:
>>
>> Correction: All the grouping(f) should be groupingBy(f)
>>
>> On Apr 10, 2013 9:37 AM, "Joe Bowbeer" <joe.bowbeer at gmail.com
>> <mailto:joe.bowbeer at gmail.com>> wrote:
>>
>> For consistency with minBy and friends, all the 'By' methods
>> should take a single argument: f. Hence grouping(f).
>>
>> No-arg and one-arg forms are the easiest to use and maintain. Just
>> the additional comma, and which pair of parens contains it, is a
>> significant burden.
>>
>> The most readable forms of collect that have an explicit toList()
>> would be of the form:
>>
>> collect(grouping(f)).toList();
>>
>> or maybe
>>
>> collect(toList(), groupingBy(f));
>>
>> Joe
>>
>> On Apr 10, 2013 2:35 AM, "Paul Sandoz" <paul.sandoz at oracle.com
>> <mailto:paul.sandoz at oracle.com>> wrote:
>>
>>
>> On Apr 9, 2013, at 11:56 PM, Joe Bowbeer
>> <joe.bowbeer at gmail.com <mailto:joe.bowbeer at gmail.com>> wrote:
>>
>> > I like the most popular form. In fact, I think it's the
>> only one that I've
>> > used.
>> >
>> > The argument that users will gain by removing their most
>> common form seems
>> > kind of far-fetched.
>> >
>>
>> If each method in Collectors does just one conceptual thing we
>> can concisely express in documentation it is easier to
>> remember and therefore easier to read the code, easier to find
>> in documentation be it using the IDE or otherwise. Thus to me
>> that suggests removing conceptual variants or renaming them.
>>
>> If the list variants were called say groupingByToList that
>> would ensure the "one conceptual thing": classifies elements
>> by key, and collects elements associated with that key to a
>> list. But i suspect we might not require those methods if the
>> leap of stream.collector(toList()) can be grasped.
>>
>> The same applies to toMap. I think it is easier to
>> understand/read if it does just one conceptual thing: elements
>> are keys, elements are mapped to values, conflicting keys
>> result in an exception. If that does not fit ones requirements
>> use groupingBy.
>>
>> Paul.
>>
>> > In my experience, I do a ctrl-space and look for my target
>> return type on
>> > the right-hand-side of the IDE popup, and then I try to fill
>> in the missing
>> > information, such as parameters. In this case, having to
>> provide toList()
>> > would probably be a stumbling block for me, as the IDE is
>> not as good when
>> > it comes to suggesting expressions for parameters.
>> >
>> > I sort of like the symmetry with collect(toList()) but not
>> enough to make
>> > up for the loss.
>> >
>> >
>> >
>> > On Tue, Apr 9, 2013 at 2:16 PM, Brian Goetz
>> <brian.goetz at oracle.com <mailto:brian.goetz at oracle.com>> wrote:
>> >
>> >> Paul suggested the following possible simplification for
>> groupingBy. It
>> >> is somewhat counterintuitive at first glance, in that it
>> removes the most
>> >> commonly used form (!), but might make things easier to
>> grasp in the long
>> >> run (aided by good docs.)
>> >>
>> >> Recall we currently have four forms of groupingBy:
>> >>
>> >> // classifier only -- maps keys to list of matching
>> elements
>> >> Collector<T, Map<K, List<T>>>
>> >> groupingBy(Function<? super T, ? extends K> classifier)
>> >>
>> >> // Like above, but with explicit map ctor
>> >> <T, K, M extends Map<K, List<T>>>
>> >> Collector<T, M>
>> >> groupingBy(Function<? super T, ? extends K> classifier,
>> >> Supplier<M> mapFactory)
>> >>
>> >> // basic cascaded form
>> >> Collector<T, Map<K, D>>
>> >> groupingBy(Function<? super T, ? extends K> classifier,
>> >> Collector<T, D> downstream)
>> >>
>> >> // cascaded form with explicit ctor
>> >> <T, K, D, M extends Map<K, D>>
>> >> Collector<T, M>
>> >> groupingBy(Function<? super T, ? extends K> classifier,
>> >> Supplier<M> mapFactory,
>> >> Collector<T, D> downstream)
>> >>
>> >> Plus four corresponding forms for groupingByConcurrent.
>> >>
>> >> The first form is likely to be the most common, as it is
>> the traditional
>> >> "group by". It is equivalent to:
>> >>
>> >> groupingBy(classifier, toList());
>> >>
>> >> The proposal is: Drop the first two forms. Just as users
>> can learn that
>> >> to collect elements into a list, you do:
>> >>
>> >> collect(toList())
>> >>
>> >> people can learn that to do the simple form of groupBy, you
>> can do:
>> >>
>> >> collect(groupingBy(f, toList());
>> >>
>> >> Which also reads perfectly well.
>> >>
>> >> By cutting the number of forms in half, it helps users to
>> realize that
>> >> groupingBy does just one thing -- classifies elements by
>> key, and collects
>> >> elements associated with that key. Obviously the docs for
>> groupingBy can
>> >> show examples of the simple grouping as well as more
>> sophisticated
>> >> groupings.
>> >>
>> >>
>>
>
More information about the lambda-libs-spec-observers
mailing list