Danger of Memory Reachability with Recursive Fiber Code

Mon May 6 20:04:44 UTC 2019

Understood, but a lot of people see concurrency equals parallelism.  They
have seen nonsense demos by prominent Java luminaries that show speed ups
of 1000x.  Well, when all you do is sleep, it’s easy to speed up :-P

I’m speaking at JAX in Germany tomorrow afternoon about how to do divide
and conquer with CompletableFutures.  So I’m readying myself for the
inevitable question of “wouldn’t this be MUCH faster with Fibers?”

The factorial example can be written easily with parallel streams and
map/reduce.  However, I’m currently using ForkJoin with other reclusive
algorithms like ToomCook3 for BigInteger and I’d like to try do that with
fibers. It can be fairly coarse grained.

Heinz

On Mon, 06 May 2019 at 22:24, Ron Pressler <ron.pressler at oracle.com> wrote:

> P.S.
>
> Not related to the memory issue but to my point #2:
>
> Fibers are designed to make blocking cheap, and the more you block the
> more they’re helpful.
> As a rule of thumb, if your fiber never blocks that’s a sign that you
> probably shouldn’t use a fiber.
> Maybe in the future fibers will be useful even for such use cases, but
> that’s not our current focus.
>
>
> On May 6, 2019 at 8:19:52 PM, Ron Pressler (ron.pressler at oracle.com)
> wrote:
>
> Hi Heinz. A few of things:
>
> 1. The main component contributing to a fiber’s footprint is the stack.
> The stack can be cleared and collected
> as soon as the fiber is terminated (even long before it is joined), but
> the code to do that is disabled so we could do some tests.
> (If you look at Continuation.postYieldCleanup you can clearly see the line
> `this.stack = null;` and some others commented out,
> and a comment saying "The following are disabled just for some testing”)
>
> 2. While it is possible that fibers will eventually be lightweight enough
> to be used for parallelism, streams are the preferred approach for
> parallelism.
> Fibers are designed and optimized for concurrency, i.e., IO or complex
> interactions between units (e.g. with message passing), not
> for pure computation.
>
> 3. Speaking of coding idioms, structured concurrency is explicitly
> designed to disallow the use of methods that spawn fibers and return. While
> this could be useful for parallelism, fibers are intended for concurrency,
> and we believe that this style is an anti-pattern for concurrency.
>
> Ron
>
>
> On May 6, 2019 at 4:32:35 PM, Dr Heinz M. Kabutz (heinz at javaspecialists.eu)
> wrote:
>
> Let's start with a very basic single-threaded Factorial function. I use
> divide and conquer to avoid having a stack that is too deep:
>
> import java.math.*;
> import java.util.function.*;
>
> public class FactorialBasic implements IntFunction<BigInteger> {
> public BigInteger apply(int n) {
> return apply(0, n);
> }
>
> private BigInteger apply(int from, int to) {
> if (from == to) {
> if (from == 0) return BigInteger.ONE;
> else return BigInteger.valueOf(from);
> }
> int mid = (from + to) >>> 1;
>
> int leftFrom = from;
> int leftTo = mid;
> int rightFrom = mid + 1;
> int rightTo = to;
>
> var left = apply(leftFrom, leftTo);
> var right = apply(rightFrom, rightTo);
>
> return left.multiply(right);
> }
> }
>
> To parallelize this with CompletableFuture is trivial:
>
> import java.math.*;
> import java.util.concurrent.*;
> import java.util.function.*;
>
> public class FactorialCompletableFuture implements IntFunction<BigInteger>
> {
> public BigInteger apply(int n) {
> return apply(0, n).join();
> }
> private CompletableFuture<BigInteger> apply(int from, int to) {
> if (from == to) {
> if (from == 0) return CompletableFuture.completedFuture(BigInteger.ONE);
> else return CompletableFuture.completedFuture(BigInteger.valueOf(from));
> }
> int mid = (from + to) >>> 1;
>
> int leftFrom = from;
> int leftTo = mid;
> int rightFrom = mid + 1;
> int rightTo = to;
>
> var left = apply(leftFrom, leftTo);
> var right = apply(rightFrom, rightTo);
>
> return left.thenCombineAsync(right, BigInteger::multiply);
> }
> }
>
> However, when we now write this with Fibers, we have several choices of
> how to write it. For example, we could write it like this, where we do
> the recursive calls inside the Fiber:
>
> import java.math.*;
> import java.util.concurrent.*;
> import java.util.function.*;
>
> public class FactorialFibers1 implements IntFunction<BigInteger> {
> public BigInteger apply(int n) {
> return apply(0, n).join();
> }
>
> public Fiber<BigInteger> apply(int from, int to) {
> if (from == to) {
> if (from == 0) return Fiber.schedule(() -> BigInteger.ONE);
> else return Fiber.schedule(() -> BigInteger.valueOf(from));
> }
> int mid = (from + to) >>> 1;
>
> int leftFrom = from;
> int leftTo = mid;
> int rightFrom = mid + 1;
> int rightTo = to;
>
> return Fiber.schedule(ForkJoinPool.commonPool(), () -> {
> var left = apply(leftFrom, leftTo);
> var right = apply(rightFrom, rightTo);
> return left.join().multiply(right.join());
> });
> }
> }
>
> Or like this, where we do the recursive calls before the Fiber.schedule():
>
> import java.math.*;
> import java.util.concurrent.*;
> import java.util.function.*;
>
> public class FactorialFibers2 implements IntFunction<BigInteger> {
> public BigInteger apply(int n) {
> return apply(0, n).join();
> }
>
> public Fiber<BigInteger> apply(int from, int to) {
> if (from == to) {
> if (from == 0) return Fiber.schedule(() -> BigInteger.ONE);
> else return Fiber.schedule(() -> BigInteger.valueOf(from));
> }
> int mid = (from + to) >>> 1;
>
> int leftFrom = from;
> int leftTo = mid;
> int rightFrom = mid + 1;
> int rightTo = to;
>
> var left = apply(leftFrom, leftTo);
> var right = apply(rightFrom, rightTo);
>
> return Fiber.schedule(ForkJoinPool.commonPool(),
> () -> left.join().multiply(right.join()));
> }
> }
>
> Seems almost the same, right? In terms of concurrency, there is not a
> big difference. But in terms of memory reachability, there seems to be.
>
> In the case of FactorialFibers1, the left and right fiber objects could
> be collected once we call join() on them. In the case of
> FactorialFibers2, there is an implicit memory reference to them from
> within the lambda object so that they cannot be collected. In other
> words, FactorialFibers1 has only local variables pointing to the left
> and right fibers, whereas FactorialFibers2 has fields pointing to them.
> In the case of local variables, Java is smart enough to recognize when
> these will never be used again and can clear them. It cannot do so with
> fields.
>
> Now for some results.
>
> Here is a test class:
>
> import java.math.*;
> import java.util.*;
> import java.util.function.*;
> import java.util.stream.*;
>
> // -XX:+UseParallelGC -Xmx8g -verbose:gc
> public class FactorialAllTest {
> public static void main(String... args) {
> Stream.of(
> new FactorialBasic(),
> new FactorialCompletableFuture(),
> new FactorialFibers1(),
> new FactorialFibers2()
> ).forEach(
> fact ->
> {
> for (int j = 0; j < 3; j++) System.gc();
> check(fact, 100_000, 708_218);
> Timer gcInvoker = new Timer(true);
> gcInvoker.schedule(new TimerTask() {
> public void run() {
> // Cause full GC every 5 seconds during run
> // to get rid of prematurely tenured objects
> System.gc();
> }
> }, 5000, 5000);
> long time = System.nanoTime();
> System.out.println(fact.getClass().getSimpleName());
> check(fact, 5_000_000, 49_524_628);
> time = System.nanoTime() - time;
> gcInvoker.cancel();
> System.out.println(time / 1_000_000 + "ms");
> System.out.println();
> }
> );
> }
>
> protected static void check(IntFunction<BigInteger> func, int n, int
> bitcount) {
> int actual = func.apply(n).bitCount();
> if (actual != bitcount)
> throw new AssertionError(func.getClass().getSimpleName() +
> "(" + n + ") incorrect. " +
> "expected=" + bitcount + ", actual=" + actual);
> System.out.printf("%s(%d) OK%n", func.getClass().getSimpleName(), n);
> }
> }
>
> It is not supposed to test the performance of the various approaches at
> this point. I have other test classes for that. Since Fibers are still
> very much in development, I don't think it would be fair to let them
> face-off against an established construct like CompletableFutures.
>
> All I'm interested at this point is the astounding difference in
> resident set size in the memory between the various approaches. A small
> change can cause a whole bunch of memory to stay strongly reachable.
>
> As we continue along the path of Project Loom, I believe it will be
> necessary to establish good coding idioms that others can follow.
> Letting them code whatever they want and then hoping it will perform
> well is going to lead to disappointment.
>
> I will mention my thoughts on the "structured concurrency" approaches in
> another email :-)
>
> Regards
>
> Heinz
> --
> Dr Heinz M. Kabutz (PhD CompSci)
> Author of "The Java™ Specialists' Newsletter" - www.javaspecialists.eu
> Java Champion - www.javachampions.org
> JavaOne Rock Star Speaker
> Tel: +30 69 75 595 262
> Skype: kabutz
>
> --
Dr Heinz M. Kabutz (PhD CompSci)
Author of "The Java(tm) Specialists' Newsletter"
Sun/Oracle Java Champion
JavaOne Rockstar Speaker
http://www.javaspecialists.eu
Tel: +30 69 75 595 262
Skype: kabutz