haskell prelude

Fri Jun 22 15:16:56 PDT 2012

As a preview of the next iteration of the streams library (you may have 
noticed the checkins in the it2-bootstrap branch), one of the things we 
may do (depending on whether performance testing shows it to be worth 
the cost) is to maintain a set of stream "state" bits to represent 
interesting properties -- such as elements are sorted, elements are 
unique, elements are known to be non-null, the size of the stream is 
known and finite.  This can be used to optimize pipelines as they are 
being built.  For example, a SortedSet is known to be unique and sorted; 
filtering is known to preserve uniqueness and sortedness but not size; 
mapping is known to preserve size but not uniqueness or sortedness.

We can use this to optimize the execution based on the dynamic 
properties of the stream pipeline.  For example, given:

   printSorted(StreamSource<T> s) {
     s.sorted(comparator).forEach(System.out::println);
   }

and you call it with a SortedSet, when the method builds the pipeline, 
by the time it gets to the sorted() stage, it knows that the upstream 
source is sorted (known dynamic property of SortedSet) and the sorted() 
stage becomes a no-op.  Similarly, if you know the size of a stream 
source (say, it is a collection and not a random Iterable) and you pipe 
it through a chain of size-preserving stages:

   coll.map(...).sorted(...).toArray();

we can exactly size the array since we know the size of the source and 
we know the stages don't mess with the size.

But this is done not with static types, but with adding a 
bitmap-returning "getStreamState()" method from the stream sources (and 
a bitmap-modifying getStreamState(upstreamState) method to the pipeline 
stages.)

We still have to be careful in what directions we define the bits; they 
need to all be of the form "has known good property" rather than "has 
dangerous property".  That way, if the pipeline goes through a stage we 
know nothing about, we can clear the bit and only lose optimizations, 
rather than make assumptions that might be false.  (So "has infinite 
size" is not as useful as you might think, since if the bit "washes 
off", we might make the mistake of passing it to forEach.  And "has 
finite size" is not all that useful, since we wouldn't want to reject a 
stream from forEach that is not known to be finite, since it might still 
be finite (and the user might know that.))

On 6/22/2012 5:27 PM, Brian Goetz wrote:
> Such "marker" interfaces mostly just give you a false sense of
> confidence, because markers can easily "wash off".  For example, let's
> say you transform an Iterator<T>  to an Iterator<U>  via a mapping
> function from T ->  U.  (See Iterators.map for example code.)  That's
> trivial to write, but now the resulting Iterator no longer extends
> InfiniteIterator.  Oops.
>
> Trying to use the static type system to express dynamic properties is a
> losing game.  (As a cautionary tale, see serialization, which has played
> this game and lost.)
>
>
>
> On 6/22/2012 5:00 PM, David Conrad wrote:
>> On Fri, 22 Jun 2012 09:29:19 +0300, Olexandr Demura<
>> oleksander.demura at gmail.com>   wrote:
>>
>>>
>>> that library written with haskell laziness in mind which is not present in
>>> java.
>>> you can add additional laziness by replacing `fibs.eval(y, z + y)`
>>> with Iterable `() ->   fibs.eval(y, z + y).iterator()`,
>>> but that would not help, since `cons` requests iterator immediately,
>>> causing the same problem.
>>> so you need one more additional layer - lazy Iterator.
>>> but Iterator is not a SAM, so it can't be named elegant:
>>>
>>>
>>
>> It is possible, and maybe even advisable, to make a SAM
>> type which is a subtype of Iterator, for infinite sequences:
>>
>>
>> interface InfiniteIterator<E>   extends Iterator<E>   {
>>       boolean hasNext() default {
>>           return true;
>>       }
>> }
>>
>> interface InfiniteIterable<E>   extends Iterable<E>   {
>>       InfiniteIterator<E>   iterator();
>> }
>>
>>
>> This has the great advantage that it also provides a tagging
>> Interface that one can check for in Iterables::count or
>> Iterables::forEach or a hypothetical Iterables::last(int n) or
>> other places where blindly consuming an infinite sequence
>> would do Bad Things(TM).
>>
>> If this became idiomatic and standard APIs looked for it, it
>> might head off some problems.
>>
>> David
>>
>