C/P/N/Q par vs. seq break-even analysis with 10ms think time

Wed Oct 17 07:31:26 PDT 2012

FYI, I had updated the fjp-trace to track UNPARK->UNPARKED edges. This
allows to infer the cost of unparking the thread, please see updated
chart at [4]. Notice the red edges there, their sources are at UNPARK
request for a thread, and destinations are at UNPARKED, when thread has
indeed woken up.

It looks very like a FJP rampup lags, when we try to wake up many
threads exponentially (but still too slow).

-Aleksey.

On 10/16/2012 07:27 PM, Aleksey Shipilev wrote:
> Hi,
> 
> This is more thorough analysis on what's going on at the break-even
> point in C/P/N/Q experiment [1]. I've took the fjp-trace [2] profiling
> at the break-even point, and the results are here [3]. The new feature
> for fjp-trace can reconstruct the entire decomposition tree, which you
> might want to peek here [4].
> 
> Observations:
>  - notice that the handoff from the submitter to FJP takes quite a bit
> of time, somewhat 70us in this case;
>  - the entire task finishes in ~500us, but the trace shows execution for
> only ~310us. This is due to fjp-trace architecture which can not record
> the JOIN in the external submitters (yet). This might very well mean the
> handoff back to the blocked submitter takes another 100us.
>  - threads are waking up rather slow (on this timescale), full-blown
> parallelism lasts for somewhat 50us.
> 
> So, here's what we got on the table. If I understand this data
> correctly, then the 500us execution divides as:
>    ~70us: handoff to FJP
>   ~200us: FJP rampup
>    ~50us: FJP steady (even though lots of balancing)
>   ~100us: result handoff
> 
> That means if we want to pursue parallel decompositions on smaller
> scale, we need to figure out the rampup effects first. I have yet to
> figure out if the rampup effects is due to sequential decomposition in
> lambda code, or that is the genuine threading lags.
> 
> Another thing is the interface between submitter and the FJP. I vaguely
> recall the infrastructure for allowing submitters to run the tasks
> themselves in in place, but how much effort that would take to get to at
> least experimental readiness? (Also, I don't see how/if the
> CountedCompleters could interoperate with submitters in this case, is
> there any option to make submitter to be the last completer?).
> 
> Thanks,
> Aleksey.
> 
> [1]
> http://mail.openjdk.java.net/pipermail/lambda-dev/2012-October/006150.html
> [2] https://github.com/shipilev/fjp-trace
> [3] http://shipilev.net/pub/jdk/lambda/20121003-fjpwakeup/
> [4]
> http://shipilev.net/pub/jdk/lambda/20121003-fjpwakeup/forkjoin.trace.p24-subtrees.png
>