ForkJoin wakeup update
Doug Lea
dl at cs.oswego.edu
Thu Oct 18 11:03:42 PDT 2012
On 10/18/12 13:29, Aleksey Shipilev wrote:
> Following Doug's advice, I've did a few experimental changes, described
> below:
> * vanilla: plain vanilla FJP
> * wakeup-3: wake up 3 threads instead of 2 during wakeups
> * wakeup-4: wake up 4 threads instead of 2 during wakeups
> * scan-busyloop: replace park in scan() with yield
>
> On the same machine [4], the scores are:
> vanilla: 429 +- 22 us/op
> wakeup-3: 415 +- 15 us/op
> wakeup-4: 390 +- 18 us/op
> scan-busyloop: 150 +- 15 us/op
>
Thanks! This leads to a much more precise version of
my main observations about FJ in other contexts:
* The break-even point (in terms of elements * cost per element)
can be reduced by about a factor of three at the expense of wasting
the world's power by keeping parallel worker threads spinning.
Which we really do not want to do.
* Short of that, the case for being only a bit more aggressive
(and possibly wasteful) in initial startup wakeups isn't all that
compelling but might be worth more tuning effort.
* The main systems-level goal should be to find ways to make
wakeups faster. Possibly entailing support of some sort of bulk
signalling facilities?
-Doug
> The fjp-trace results with tracing enabled are here [3]. The analysis
> for their subtrees follows:
>
> vanilla:
> +60us: first handoff (dominated by unpark)
> +240us: fjp rampup (all threads to wake up)
> +20us: full parallelism
> +15us: catch up (the final completer lags behind?)
> +10us: result handoff
>
> wakeup-3:
> +80us: first handoff (dominated by unpark)
> +180us: fjp rampup (all threads to wake up)
> +40us: full parallelism
> +20us: catch up (the final completer lags behind?)
> +10us: result handoff
>
> wakeup-4:
> +70us: first handoff (dominated by unpark)
> +170us: fjp rampup (all threads to wake up)
> +50us: full parallelism
> +30us: catch up (the final completer lags behind?)
> +10us: result handoff
>
> scan-busyloop:
> +5us: first handoff (nothing to unpark)
> +10us: fjp rampup (pre-balancing)
> +55us: full parallelism
> +10us: catch up (the final completer lags behind?)
> +10us: result handoff
>
> Thanks,
> Aleksey.
>
> [1]
> http://mail.openjdk.java.net/pipermail/lambda-dev/2012-October/006167.html
> [2] https://github.com/shipilev/fjp-trace
> [3] http://shipilev.net/pub/jdk/lambda/20121017-fjpwakeup/
> [4] 2x8x2 Xeon E5-2680 (SandyBridge) running Solaris 11, and 20121003
> lambda nightly with -d64 -XX:-TieredCompilation -XX:+UseParallelOldGC
> -XX:+UseNUMA -XX:-UseBiasedLocking -XX:+UseCondCardMark
>
More information about the lambda-dev
mailing list