Natural Encounter Order (was RE: Stream#generate() vs. iterate())
Brian Goetz
brian.goetz at oracle.com
Sun Oct 6 02:38:06 PDT 2013
Yes, you've hit the point exactly. If you don't care about order in the answer, you can always call unordered - and this is cheap - if you are going to do a parallel op that might be constrained by ordering. But the key is, you, the user , are the only one who knows whether you care about ordering or not. So we can't guess. If you don't care, tell us so, and we can give you a possibly faster execution.
Sent from my iPhone
On Oct 6, 2013, at 9:44 AM, Samir Talwar <samir at noodlesandwich.com> wrote:
> A few things.
>
> Firstly, in your example you used the `range` method, which is both ordered
> and sorted by definition. While it hasn't been explicitly sorted, not
> keeping it sorted as a steam would confuse and surprise pretty much
> everyone.
>
> On that topic: "ordered" and "sorted" mean different things, "ordered"
> meaning the same thing as "indexed", although with less attention paid to
> implementation. As you saw, a TreeSet is both ordered and sorted, whereas a
> HashSet is neither. Lists and arrays are ordered but not necessarily
> sorted; for example, `ImmutableList.of(1, 5, 2, 6, 3)` is ordered, but
> clearly unsorted.
>
> Lastly, there's no harm in throwing in a call to `unordered` whenever
> you're not fussed about order for whatever reason. My understanding is that
> if the stream is already unordered, it will just return itself.
>
> – Samir.
> On 6 Oct 2013 09:28, "Millies, Sebastian" <Sebastian.Millies at softwareag.com>
> wrote:
>
>> Thanks for that information. Part of my confusion may be just a language
>> thing. I'd call the property
>> of having a first, second etc. element "indexed", not "ordered". In
>> German, "ordered" is more akin to
>> "sorted". (But perhaps even in English, cf. SQL ORDER BY clause?).
>>
>> For that reason, I'd not have expected to have to unorder() a stream that
>> had never been
>> explicitly sort() 'ed.
>>
>> Or in other words, I'd have expected sorted()/unordered() and
>> sequential()/parallel() to
>> be pairs of operations that converted along the two dimensions, with
>> "unordered sequential" always
>> the default unless explicitly coded otherwise.
>>
>> Setting aside the choice of words, is there some rigorous definition of
>> "natural encounter order" that
>> will let me know whether a stream will in fact be ordered or unordered
>> when I create it?
>> Consider the following:
>>
>> TreeSet<String> t = new TreeSet<>();
>> HashSet<String> h = new HashSet<>();
>> Set<String> s;
>>
>> s = t;
>> Stream<String> tStream = s.parallelStream();
>>
>> assert(tStream.spliterator().hasCharacteristics(Spliterator.ORDERED)) ;
>>
>> s = h;
>> Stream<String> hStream = s.parallelStream();
>> assert( !
>> hStream.spliterator().hasCharacteristics(Spliterator.ORDERED));
>>
>> Usually having a Set reference I will not know whether the object has been
>> created
>> a TreeSet or HashSet. Similarly, having a Map.EntrySet I will usually not
>> know whether the
>> underlying Map was a LinkedHashMap. So for good parallel performance I
>> will be forced
>> to always explicitly call unordered(), or conduct that complicated test on
>> the spliterator's
>> characteristics. Unless I'm overlooking something, I am not happy with
>> the concept "natural
>> encounter order", because it seems unknowable.
>>
>> -- Sebastian
>>
>>> -----Original Message-----
>>> From: Brian Goetz [mailto:brian.goetz at oracle.com]
>>> Sent: Saturday, October 05, 2013 9:56 PM
>>> To: Millies, Sebastian
>>> Cc: lambda-dev at openjdk.java.net
>>> Subject: Re: Stream#generate() vs. iterate()
>>>
>>> Sequential and unordered are orthogonal characteristics.
>>>
>>> Ordered/unordered means that the source has a natural encounter order,
>> and that
>>> stream operations should respect that order. For example, lists and
>> arrays have a
>>> natural encounter order; they have a first element, a second element,
>> etc. Whereas
>>> a HashSet has no natural encounter order; processing the elements in one
>> order is as
>>> good as any other.
>>>
>>> Sequential/parallel has to do with whether the stream operations will
>> execute
>>> sequentially or in parallel when the terminal operation is initiated.
>>>
>>> generate() returns a sequential, unordered stream. You can turn that
>> into a parallel,
>>> unordered stream with generate(f).parallel().
>>>
>>> On Oct 5, 2013, at 6:54 PM, Millies, Sebastian wrote:
>>>
>>>> I'm a bit confused, perhaps it's just terminology:
>>>>
>>>> Looking at the Javadoc (in b106) for Stream#generate(Supplier) it says
>> it returns a
>>> sequential stream.
>>>> In your post you say it returns an unordered stream. In what way can a
>> sequential
>>> stream be unordered?
>>>>
>>>> -- Sebastian
>>>>
>>>> -----Original Message-----
>>>> From: lambda-dev-bounces at openjdk.java.net [mailto:lambda-dev-
>>> bounces at openjdk.java.net] On Behalf Of Brian Goetz
>>>> Sent: Saturday, October 05, 2013 6:13 PM
>>>> To: Arne Siegel
>>>> Cc: lambda-dev at openjdk.java.net
>>>> Subject: Re: stream.parallel().limit() not usable
>>>>
>>>> [snip]
>>>>
>>>> You might also do better with Stream.generate, since it creates an
>> unordered
>>> stream:
>>>>
>>>> Stream.generate(generatorFunction)
>>>> .parallel()
>>>> ...
>>>>
>>>>
>>>>
>>>>
>>>> Software AG – Sitz/Registered office: Uhlandstraße 12, 64297
>> Darmstadt, Germany
>>> – Registergericht/Commercial register: Darmstadt HRB 1562 -
>> Vorstand/Management
>>> Board: Karl-Heinz Streibich (Vorsitzender/Chairman), Dr. Wolfram Jost,
>> Arnd
>>> Zinnhardt; - Aufsichtsratsvorsitzender/Chairman of the Supervisory
>> Board: Dr. Andreas
>>> Bereczky - http://www.softwareag.com
>
More information about the lambda-dev
mailing list