[concurrency-interest] class SplittableRandom

Fri Jul 12 03:59:32 PDT 2013

On 07/12/2013 02:56 AM, Aaron Grunthal wrote:
> On 11.07.2013 16:29, Peter Levart wrote:
>> On 07/11/2013 01:14 AM, Aaron Grunthal wrote:
>>> Would this be of any use in a case like this?
>>>
>>> List<List<Int>> a;
>>>
>>> a.parallelStream().filter(...).map(e->e.get(random.nextInt(e.size()))).reduce(...) 
>>>
>>>
>>>
>>> i.e. a filter, random sample, reduce chain?
>>>
>>> I don't see a way to basically use two (or N) streams (the source and
>>> the random numbers) and merge them in lockstep at one stage.
>>>
>>
>> In the absence of Stream.join(), we would need something like the
>> following in the Stream API:
>
> Thinking about about .join() and parallel processing: this would only 
> seem possible [assuming it's order-preserving] with an trySplit(int) 
> aiming for specific sizes that match the splitting of the other 
> stream. I.e. it would require sources that allow arbitrary seeks.
>
> That made me think about seekable PRNGs, such as salsa20. Wouldn't it 
> be better to have a seekable random and perform splits like one would 
> on an ordered, finite-sized, random-access list? An infinite version 
> without seek offset wraparound would still be obtainable from that 
> with a per-split reseed.

The following library:

http://www.iro.umontreal.ca/~simardr/ssj/doc/html/index.html?umontreal/iro/lecuyer/rng/RandomStream.html

Takes a different approach. The API allows for the full period of the 
PRNG to be split into adjacent streams and each such stream to be 
further split into sub-streams (the later can be rewind/skipped). The 
lengths and numbers of streams are usually very large so there's no 
danger of wraparound or overlap. Each separate generator instance 
generates numbers from a separate stream. But there is some central 
point where contention can occur when creating individual instances. The 
SplittableRandom only has contention for constructing root instances, 
it's split() on the other hand has no contention.

>
> As I understand it SplittableRandom - when run as parallel stream - 
> doesn't offer the same sequence as the sequential version. And even 
> the splits would depend on the thread pool size, so it'll be difficult 
> to re-run some stream with the same pseudorandom sequence, which 
> sometimes is necessary, e.g. to debug things.

This could be "fixed" by providing another constructor that could set 
the initial "gamma" value for the root instance in addition to seed. The 
beauty of fork-join is that although individual tasks can run at 
different speeds/times each time, the decomposition is typically data 
driven and stable. And if SplittableRandom.split()s follow Task.fork()s, 
so are the random sequences consumed in each task.

Regards, Peter

>
>
>
>
>
>>
>> Stream<T> {
>>
>>          <R, S> Stream<R> splitMap(S seed,
>>                                    UnaryOperator<S> splitter,
>>                                    BiFunction<? super T, ? super S, ?
>> extends R> mapper);
>>
>>
>> Your example would then read:
>>
>> List<List<Int>> a = ...;
>>
>> a.parallelStream()
>> .filter(...)
>> .splitMap(
>> new SplittableRandom(),
>>          sr -> sr.split(),
>> (list, sr) -> list.get(sr.nextInt(list.size()))
>> )
>> .reduce(...)
>>
>>
>> But I don't know if such API is usable in any other scenarios though.
>>
>> Regards, Peter
>>
>>
>>> On 10.07.2013 21:13, Doug Lea wrote:
>>>> [Note: I'm also posting this on the openjdk core-libs-dev list.]
>>>>
>>>>
>>>> We expect that using random numbers in parallel Stream computations
>>>> will be common. (We know it is common in parallel computing in
>>>> general.) But we had left support for it in an unsatisfactory state.
>>>> If you want to create a stream of random numbers to drive a parallel
>>>> computation, you'd choose among two options, neither of them providing
>>>> what you probably want: (1) Use a stream based on a single shared
>>>> java.util.Random object, in which case your program will encounter
>>>> stunning slowdowns when run with many cores; or (2) Use a stream based
>>>> on ThreadLocalRandom, which avoids contention, but gives you no
>>>> control over the use or properties of the per-thread singleton Random
>>>> object. While the ThreadLocalRandom option is great for many purposes,
>>>> you wouldn't want to use it in, say, a high-quality Monte Carlo
>>>> simulation.
>>>
>>>
>>> _______________________________________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.oswego.edu
>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest