summary of point lambdafication survey

Tue Sep 27 12:51:44 PDT 2011

I think your're right about everything there.

Imho (and the h is important here!), convenience functions are a Good Thing
for a few reasons:
  * they're convenient
  * they encourage certain paradigms (ie, "use this map function rather than
writing a loop yourself")
  * they provide a common language, which makes it easier to read other
people's code
  * they provide people with example code for "free"

By the last point, I mean that someone who wants to understand how map can
be written in terms of reduce has an obvious place to look -- the source
code for map. This is especially useful when you're trying to teach people a
new concept such as functional programming.

I realize now that my original email could have been construed as
questioning whether you should include a map-written-in-terms-of-reduce
convenience. If so, my apologies -- that wasn't my intent. I was just trying
to get clarification on whether map would be "its own thing," or whether it
would be written in terms of reduce (which I think it should be, if only for
that fourth bullet point).

On Tue, Sep 27, 2011 at 1:56 PM, Stuart Marks <stuart.marks at oracle.com>wrote:

> After mulling this over for a couple days, I realized I wasn't getting
> anywhere staring at the Haskell code you wrote, inasmuch as I don't know
> Haskell. :-) So I decided to rewrite scenario #2 ("groupBy") in Java, using
> fold (which is called "reduce" in the current prototype).
>
>        Map<String,List<String>> map =
>            list.reduce(new HashMap<String,List<String>>()**, (map, str) ->
> {
>                String initial = str.substring(0, 1);
>                List<String> val = map.get(initial);
>                if (val == null) {
>                    val = new ArrayList<String>();
>                    map.put(initial, val);
>                }
>                val.add(str);
>                return map;
>            });
>
> Ugh! So this is what you mean by "a bit more work." Most of this is because
> Java lacks things like tuples or a MultiMap. So let's assume the existence
> of some kind of MultiMap:
>
>        MultiMap<String,String> map =
>            list.reduce(new MultiMap<String,List<String>>(**), (map, str)
> -> {
>                map.put(str.substring(0, 1), str);
>                return map;
>            });
>
> Now as you say if map.put() were to return the map itself (it doesn't in
> any of the multimaps that I've seen, but let's run with this) we could
> simply write:
>
>        MultiMap<String,String> map =
>            list.reduce(new MultiMap<String,List<String>>(**),
>                (map, str) -> map.put(str.substring(0, 1), str));
>
> This is nice, but perhaps not quite as nice as:
>
>    MultiMap<String, String> map = groupBy(list, s -> s.substring(0, 1));
>
> or even
>
>    Map<String, List<String>> map = groupBy(list, s -> s.substring(0, 1));
>
> So, back to your point. It does seem that a fold ("reduce") operation is
> fairly general, and that other operations such as toMap and groupBy can be
> implemented in terms of fold. Does that mean it's not necessary to have
> these other operations in the library? Or are they common enough that
> they're worthwhile to have as a convenience?
>
> s'marks
>
>
>
>
>
>
>
> On 9/23/11 4:01 PM, Yuval Shavit wrote:
>
>> In a world where the result of inserting into a map is the map that you've
>> just
>> inserted into (Java isn't that world, but it can be made into that world
>> in a
>> couple lines), I think both scenarios you describe can be implemented with
>> a
>> fold. That said, in your #2 scenario, if you really have a Map<K,V> and
>> not a
>> MultiMap<K,V>, then the fold's function does have to do a bit more work.
>>
>> Forgive me if this is overreaching myself, but since I'm reading up on
>> Haskell,
>> I gave both of your directions a go:
>>
>>     import Data.Map as M
>>     toMap1 :: (Ord k) => (e->k) -> (e->v) -> Map k v -> [e] -> Map k v
>>     toMap1 kf vf = foldr (\e-> M.insert (kf e) (vf e))
>>
>>     toMap2 :: (Ord k) => (e->k) -> (e->v) -> Map k [v] -> [e] -> Map k [v]
>>     toMap2 kf vf = foldr (doinsert kf vf)
>>
>>     doinsert :: (Ord k) => (e->k) -> (e->v) -> e -> Map k [v] ->  Map k
>> [v]
>>     doinsert kf vf e m =
>>         let k = kf e
>>             v = vf e
>>             m' = case (M.lookup k m) of
>>                 Nothing     -> M.insert k [v] m
>>                 (Just vs)   -> M.insert k (v:vs) m
>>             in m'
>>
>> *Main> toMap1 show (*2) empty [1..5]
>> fromList [("1",2),("2",4),("3",6),("4",**8),("5",10)]
>>
>> *Main> toMap2 odd id empty [1..5]
>> fromList [(False,[2,4]),(True,[1,3,5])]
>>
>> -Yuval
>>
>> On Fri, Sep 23, 2011 at 5:55 PM, Stuart Marks <stuart.marks at oracle.com
>> <mailto:stuart.marks at oracle.**com <stuart.marks at oracle.com>>> wrote:
>>
>>    On 9/23/11 1:39 PM, Yuval Shavit wrote:
>>
>>        Collections.toMap sounds like a convenience function for a fold, is
>>        that about
>>        right?
>>
>>
>>    I don't think so. This might be because of how heavily I had condensed
>> the
>>    original submission. Let me explain.
>>
>>    As it stood, the original submission was interesting but not quite
>> right in
>>    my view. It seemed to lead in either of two different directions, both
>> of
>>    which are potentially very useful in their own right. Perhaps the
>> submitter
>>    could post here to clarify, in case I've misinterpreted something.
>>
>>    1) The first direction is, given a Set and a Mapper, return a Map:
>>
>>    <K,V> Map<K,V> toMap(Set<K> set, Mapper<K,V> mapper)
>>
>>    Conceptually this applies the mapper function to every element K of the
>>    set, giving a value V; these (K,V) pairs are stored into a map which is
>>    then returned. It's an open question whether this is done immediately,
>> done
>>    lazily, whether results are cached, etc.
>>
>>    2) The other direction is, given a collection of values and a Mapper
>> that
>>    generates keys, return a Map that maps each key to the values from
>> which it
>>    was generated.
>>
>>    I've seen this called groupBy elsewhere. The problem is, how to
>> represent
>>    multiple values for a single key. Other systems use tuples or
>> multi-valued
>>    maps. For this example I'll use List<V> as the value of each map entry:
>>
>>    <K,V> Map<K,List<V>> groupBy(Collection<V> coll, Mapper<V,K> mapper)
>>
>>    (Proper wildcards elided.)
>>
>>    For each value in the collection, run the mapper on it to get a key;
>> then
>>    put this key-value pair into a map. If there are multiple values
>>    corresponding to the same key, the values get put into a list. For
>>    uniformity you'd always put the values into a list, even if there's
>> only
>>    one value.
>>
>>    For example,
>>
>>        List<String> list = Arrays.asList("Beulah", "Maude", "Gertrude",
>>    "Beatrice", "Gladys", "Georgia", "Mildred", "Mabel");
>>        Map<String, String> map = groupBy(list, s -> s.substring(0, 1));
>>
>>    would result in map containing
>>
>>    "B" => ["Beulah", "Beatrice"]
>>    "G" => ["Gertrude", "Gladys", "Georgia"]
>>    "M" => ["Maude", "Mildred", "Mabel"]
>>
>>    s'marks
>>
>>
>>