Cancelation -- use cases

Sat Dec 29 17:51:06 PST 2012

In my parallel library I use throw BREAK and it has the same meaning as break in a for loop. It wasn't difficult to terminate the other parallel tasks and discard their results if already computed. 

For a map like operation in the example a1, a2, a3, etc,. if a3 threw BREAK then a1 and a2 would be returned and the rest discarded regardless of the parallel split for evaluation. 

For a reduce like operation I construct a binary tree that is evaluated in parallel via fork/join:

1  2  3  4  5  6  7  8
 12    34    56    78
   1234        5678
       12345678

If 3 threw BREAK then 12 would be returned, if 4 threw break 123 would be returned, etc. Therefore reduce is harder to handle BREAK and might involve additional computation e.g. for 4 throwing a further reduce of 12 with 3 is required (56, 78, and 5678 are discarded if already evaluated). 

I have a number of use cases, all numerical, that are evaluating whether the result has converged or not. They look at the current reduce value and see if a further reduce is significant, if not they terminate the reduction. The parallel reductions are speculative. You can view 4 throwing as saying that I, 4, are not significant. Note that 3 might not be significant as well and it might throw when the reduction 12 and 3 is attempted, in which case 12 is returned. 

Sent from my iPad

On 29/12/2012, at 7:18 AM, Jim Mayer <jim at pentastich.org> wrote:

> Hi Brian,
> 
> I suspect that using limit as you suggest would be just fine.  Another
> consideration is that an order preserving cancel is a potentially heavy
> weight operation.  I believe that most users would expect "cancel" to have
> an immediate effect.
> 
> Jim Mayer
> On Dec 28, 2012 12:07 PM, "Brian Goetz" <brian.goetz at oracle.com> wrote:
> 
>> Right, or "find me the first N primes / first N solutions to this
>> equation."
>> 
>> So, the question is, are these examples outside of what we mean for this
>> 'cancel' facility to be?  Would a straight limit(n) (which is
>> order-respecting) do the job in this kind of case, freeing up cancel/while
>> to handle only unordered (temporal-based, quality-based) restrictions?  Or
>> are problems like "find me as many contiguous primes as you can in 5
>> minutes" important enough to try to support through streams?
>> 
>> (My gut says no.  I think people just want to do things like event
>> processing, where you listen for interesting stuff, until you're told to
>> stop listening, at which point you don't care about the information
>> whizzing past.)
>> 
>> On 12/28/2012 11:37 AM, Sam Pullara wrote:
>> 
>>> I can see that if you were doing an expensive calculation that is an
>>> infinite series of terms and you cancel after some condition you may have
>>> to keep all the terms that match before the condition. Maybe something like
>>> calculating Pi that stops after the term is less than a certain size would
>>> be a reasonable example? That could be done in parallel but would need to
>>> gather all the terms up to the cut off.
>>> 
>>> Sam
>>> 
>>> On Dec 28, 2012, at 11:20 AM, Brian Goetz <brian.goetz at oracle.com> wrote:
>>> 
>>> I've been working through some alternatives for cancellation support in
>>>> infinite streams.  Looking to gather some use case background to help
>>>> evaluate the alternatives.
>>>> 
>>>> In the serial case, the "gate" approach works fine -- after some
>>>> criteria transpires, stop sending elements downstream.  The pipeline
>>>> flushes the elements it has, and completes early.
>>>> 
>>>> In the parallel unordered case, the gate approach similarly works fine
>>>> -- after the cancelation criteria occurs, no new splits are created, and
>>>> existing splits dispense no more elements.  The computation similarly
>>>> quiesces after elements currently being processed are completed, possibly
>>>> along with any up-tree merging to combine results.
>>>> 
>>>> It is the parallel ordered case that is tricky.  Supposing we partition
>>>> a stream into
>>>>  (a1,a2,a3), (a4,a5,a6)
>>>> 
>>>> And suppose further we happen to be processing a5 when the bell goes
>>>> off.  Do we want to wait for all a_i, i<5, to finish before letting the
>>>> computation quiesce?
>>>> 
>>>> My gut says: for the things we intend to cancel, most of them will be
>>>> order-insensitive anyway.  Things like:
>>>> 
>>>> - Find the best possible move after thinking for 5 seconds
>>>> - Find the first solution that is better than X
>>>> - Gather solutions until we have 100 of them
>>>> 
>>>> I believe the key use case for cancelation here will be when we are
>>>> chewing on potentially infinite streams of events (probably backed by IO)
>>>> where we want to chew until we're asked to shut down, and want to get as
>>>> much parallelism as we can cheaply.  Which suggests to me the intersection
>>>> between order-sensitive stream pipelines and cancelable stream pipelines is
>>>> going to be pretty small indeed.
>>>> 
>>>> Anyone want to add to this model of use cases for cancelation?
>>>