Design for collections upgrades

Sun Mar 13 13:15:19 PDT 2011

If we name List.filter() and Iterable.filter() the same thing there will be no way to get to the default lazy version of filter() on a List even if the Iterable version of filter() is lazy without either naming the methods differently or requiring something that changes a List into something that doesn't implement List but does implement Iterable or another streaming interface. Am I missing something? 

One solution might be to put those methods on Iterator and require .iterator() (simply another name for .asStream() really). This would be kind of nice in that the for() loop would now use the lazy versions by default.

Higher-level question: Does anyone really think that developers should use the eager versions or are we putting them in there as some kind of idiot proof implementation?

Sam

On Mar 13, 2011, at 12:47 PM, Rémi Forax wrote:

> asStream() is not required at all.
> 
> The question is more, should we mix lazy and eager method in the same 
> interface or not.
> If the answer is No. asStream() is a way to specify that you want to go 
> to the lazy world.
> If the answer is Yes. asStream is not needed and basically all static 
> methods can goes into their
> interface as extension method.
> 
> Rémi
> 
> 
> On 03/13/2011 08:01 PM, Sam Pullara wrote:
>> For the last year or so I have been using the Google Collections library, now distributed as part of Guava. It has most of the APIs that we are talking about and I think that regular Java programmers are using them. They separate out Lists, Iterables, Sets, Maps, Collections and Iterators. Most of the code that I have written would work great with the methods in these classes added as extension methods to the normal APIs. Function literals would of course slot right in. Here are some links to their APIs:
>> 
>> http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Iterables.html
>> http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Iterators.html
>> http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Lists.html
>> http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Sets.html
>> http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Maps.html
>> http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Collections2.html
>> 
>> My understanding is that they have an advantage over extension methods in the caller determines how to treat the target collection. For example, if you use:
>> 
>> public class Iterables {
>> public static<F,T>  Iterable<T>  transform(Iterable<F>  fromIterable, Function<? super F,? extends T>  function)...
>> }
>> 
>> You get streaming behavior whereas the equivalent in Lists is evaluated eagerly:
>> 
>> public class Lists {
>> public static<F,T>  List<T>  transform(List<F>  fromList, Function<? super F,? extends T>  function)
>> }
>> 
>> Apparently with extension methods we don't have this luxury as the type at the call site does not pick the extension method default but instead the underlying implementation does. Because of this it appears that the current proposals for things like .asStream() will actually end up being more verbose than using their Guava counterparts once function literals are available. As for the naming conventions, Guava went with filter/transform. Personally, I renamed them f/t since chaining together several in Guava gets a little silly with such long names since they show up next to one another. Here is some current code of mine:
>> 
>>                 return l(t(f(s.bagService.featured(0, 30L, category), new P<Row<Bag, byte[]>>() {
>>                   public boolean apply(@Nullable Row<Bag, byte[]>  input) {
>>                     return input != null&&  ids.add(input.value.owner());
>>                   }
>>                 }), new F<Row<Bag, byte[]>, BagCodeFactory.BagCode>() {
>>                   public BagCodeFactory.BagCode apply(Row<Bag, byte[]>  input) {
>>                     return new BagCodeFactory(s).getBag(page, input);
>>                   }
>>                 }), count);
>> 
>> l = limit, t = transform, f = filter
>> 
>> In JDK8 with function literals it would look like (featured() returns an Iterable):
>> 
>>                 return l(t(f(s.bagService.featured(0, 30L, category), #{ input ->  input != null&&  ids.add(input.value.owner())}),
>> 			#{ input ->  new BagCodeFactory(s).getBag(page, input)}), count);
>> 
>> Now, using some of the suggestions on the list so far, it could look like:
>> 
>>                 return s.bagService.featured(0, 30L, category).asStream().fiter(#{ input ->  input != null&&  ids.add(input.value.owner())})
>> 			.transform(#{ input ->  new BagCodeFactory(s).getBag(page, input)}).limit(count);
>> 
>> Which isn't too bad though since featured() is already returning an Iterable it feels redundant to asStream() it especially since it really is an Iterable only and doesn't use an underlying Java collection at all. It concerns me that the call site can't control behavior through typing. If we only put the extension methods on Iterable/Iterator it would be pretty easy for the developer to see that the operations are lazy and there would never be any confusion as to the behavior you would get at the call site.
>> 
>> Sam
>> 
>> On Mar 13, 2011, at 4:12 AM, Steven Simpson wrote:
>> 
>>> On 10/03/11 13:57, Rémi Forax wrote:
>>>>  Yes, we need to provide methods that filter/map directly
>>>> the content of a collections.
>>>> But I don't think it's a good idea to name them filter or map.
>>>> 
>>>> Why not filterAll and mapAll ?
>>> I'm starting to lose track of this thread, but I recall the following
>>> points:
>>> 
>>>    * Lazy operations are desirable, as are eager ones, and in-place ones.
>>>    * There could be a type or family thereof for lazy collections, e.g.
>>>      Stream or Iterable.
>>>    * Collections currently don't have methods for generating new
>>>      collections, so adding them is a considerable shift.  They do have
>>>      in-place methods, and new methods like 'filter' share the same
>>>      tense, suggesting that they should be in-place methods too.
>>> 
>>> In case it hasn't yet been mentioned, I'd like to add that collections
>>> also support /views/ in several places, e.g. subList, entrySet, etc.  Do
>>> we need a lazy type when we can have views?  I haven't pinned it down in
>>> my mind yet, but I suspect that views are slightly different to streams,
>>> in that they also allow us to modify the original collection.  Or, to
>>> put it another way, they differ from mutating methods, in that we can
>>> choose not to perform any mutation.
>>> 
>>> When we want to delete a portion of a list, we write:
>>> 
>>>  list.subList(a, b).clear()
>>> 
>>> ...because the sublist view allows us to modify the original list.  But
>>> if we want to extract a sublist, and leave the original untouched, we write:
>>> 
>>>  new ArrayList(list.subList(a, b))
>>> 
>>> List.subList could be regarded as lazy, until we apply a mutating method
>>> to it like clear().  And if we never do that, it stays lazy even while
>>> we do an eager copy.
>>> 
>>> Can we do the same with filter (using a noun like "selection" to be
>>> consistent with subList)?
>>> 
>>>  list.selection(pred).clear(); // mutating original; removeAll(pred)?
>>>  new ArrayList(list.selection(pred)); // preserving original
>>> 
>>> Views, having ordinary collection types, remain Iterable too:
>>> 
>>>  for (Element elem : list.selection(pred)) { ... }
>>> 
>>> I suppose my point is that there is already a precedent for (mutable)
>>> laziness in the collection framework, in the form of views.
>>> 
>>> Cheers,
>>> 
>>> Steven
>>> 
>>> -- 
>>> 
>>> 
>>> 
>>> 
>> 
> 
>