summary of point lambdafication survey

Tue Sep 27 10:56:04 PDT 2011

After mulling this over for a couple days, I realized I wasn't getting anywhere 
staring at the Haskell code you wrote, inasmuch as I don't know Haskell. :-) So 
I decided to rewrite scenario #2 ("groupBy") in Java, using fold (which is 
called "reduce" in the current prototype).

         Map<String,List<String>> map =
             list.reduce(new HashMap<String,List<String>>(), (map, str) -> {
                 String initial = str.substring(0, 1);
                 List<String> val = map.get(initial);
                 if (val == null) {
                     val = new ArrayList<String>();
                     map.put(initial, val);
                 }
                 val.add(str);
                 return map;
             });

Ugh! So this is what you mean by "a bit more work." Most of this is because 
Java lacks things like tuples or a MultiMap. So let's assume the existence of 
some kind of MultiMap:

         MultiMap<String,String> map =
             list.reduce(new MultiMap<String,List<String>>(), (map, str) -> {
                 map.put(str.substring(0, 1), str);
                 return map;
             });

Now as you say if map.put() were to return the map itself (it doesn't in any of 
the multimaps that I've seen, but let's run with this) we could simply write:

         MultiMap<String,String> map =
             list.reduce(new MultiMap<String,List<String>>(),
                 (map, str) -> map.put(str.substring(0, 1), str));

This is nice, but perhaps not quite as nice as:

     MultiMap<String, String> map = groupBy(list, s -> s.substring(0, 1));

or even

     Map<String, List<String>> map = groupBy(list, s -> s.substring(0, 1));

So, back to your point. It does seem that a fold ("reduce") operation is fairly 
general, and that other operations such as toMap and groupBy can be implemented 
in terms of fold. Does that mean it's not necessary to have these other 
operations in the library? Or are they common enough that they're worthwhile to 
have as a convenience?

s'marks

On 9/23/11 4:01 PM, Yuval Shavit wrote:
> In a world where the result of inserting into a map is the map that you've just
> inserted into (Java isn't that world, but it can be made into that world in a
> couple lines), I think both scenarios you describe can be implemented with a
> fold. That said, in your #2 scenario, if you really have a Map<K,V> and not a
> MultiMap<K,V>, then the fold's function does have to do a bit more work.
>
> Forgive me if this is overreaching myself, but since I'm reading up on Haskell,
> I gave both of your directions a go:
>
>      import Data.Map as M
>      toMap1 :: (Ord k) => (e->k) -> (e->v) -> Map k v -> [e] -> Map k v
>      toMap1 kf vf = foldr (\e-> M.insert (kf e) (vf e))
>
>      toMap2 :: (Ord k) => (e->k) -> (e->v) -> Map k [v] -> [e] -> Map k [v]
>      toMap2 kf vf = foldr (doinsert kf vf)
>
>      doinsert :: (Ord k) => (e->k) -> (e->v) -> e -> Map k [v] ->  Map k [v]
>      doinsert kf vf e m =
>          let k = kf e
>              v = vf e
>              m' = case (M.lookup k m) of
>                  Nothing     -> M.insert k [v] m
>                  (Just vs)   -> M.insert k (v:vs) m
>              in m'
>
> *Main> toMap1 show (*2) empty [1..5]
> fromList [("1",2),("2",4),("3",6),("4",8),("5",10)]
>
> *Main> toMap2 odd id empty [1..5]
> fromList [(False,[2,4]),(True,[1,3,5])]
>
> -Yuval
>
> On Fri, Sep 23, 2011 at 5:55 PM, Stuart Marks <stuart.marks at oracle.com
> <mailto:stuart.marks at oracle.com>> wrote:
>
>     On 9/23/11 1:39 PM, Yuval Shavit wrote:
>
>         Collections.toMap sounds like a convenience function for a fold, is
>         that about
>         right?
>
>
>     I don't think so. This might be because of how heavily I had condensed the
>     original submission. Let me explain.
>
>     As it stood, the original submission was interesting but not quite right in
>     my view. It seemed to lead in either of two different directions, both of
>     which are potentially very useful in their own right. Perhaps the submitter
>     could post here to clarify, in case I've misinterpreted something.
>
>     1) The first direction is, given a Set and a Mapper, return a Map:
>
>     <K,V> Map<K,V> toMap(Set<K> set, Mapper<K,V> mapper)
>
>     Conceptually this applies the mapper function to every element K of the
>     set, giving a value V; these (K,V) pairs are stored into a map which is
>     then returned. It's an open question whether this is done immediately, done
>     lazily, whether results are cached, etc.
>
>     2) The other direction is, given a collection of values and a Mapper that
>     generates keys, return a Map that maps each key to the values from which it
>     was generated.
>
>     I've seen this called groupBy elsewhere. The problem is, how to represent
>     multiple values for a single key. Other systems use tuples or multi-valued
>     maps. For this example I'll use List<V> as the value of each map entry:
>
>     <K,V> Map<K,List<V>> groupBy(Collection<V> coll, Mapper<V,K> mapper)
>
>     (Proper wildcards elided.)
>
>     For each value in the collection, run the mapper on it to get a key; then
>     put this key-value pair into a map. If there are multiple values
>     corresponding to the same key, the values get put into a list. For
>     uniformity you'd always put the values into a list, even if there's only
>     one value.
>
>     For example,
>
>         List<String> list = Arrays.asList("Beulah", "Maude", "Gertrude",
>     "Beatrice", "Gladys", "Georgia", "Mildred", "Mabel");
>         Map<String, String> map = groupBy(list, s -> s.substring(0, 1));
>
>     would result in map containing
>
>     "B" => ["Beulah", "Beatrice"]
>     "G" => ["Gertrude", "Gladys", "Georgia"]
>     "M" => ["Maude", "Mildred", "Mabel"]
>
>     s'marks
>
>