slow performance of loom continuations

Fri Sep 7 18:42:41 UTC 2018

> Could I recommend JMH?

this isn't intended as a benchmark (kilim does have some but i haven't
looked at them). but this example has been representative of how i use
kilim (mostly simple state machines). performance of this example in kilim
is roughly 25% slower than "pure" java, and that's what i see in my state
machines too. and that's a price i'm happy to pay in exchange for scalable
imperative code

it seems premature to benchmark project loom at this point - it isn't
feature complete yet (i'm guessing that tail calls will dramatically help
this example). i assume that when it is, a suite of some sort will be
developed. i'm just trying to verify that i'm using the api correctly, and
to understand whether this is a use case that the team considers important
long term

On Fri, Sep 7, 2018 at 3:27 AM, ags <andrzej.grzesik at gmail.com> wrote:

> Could I recommend JMH?
>
> On Fri, 7 Sep 2018 at 05:48, seth lytle <seth.lytle at gmail.com> wrote:
>
>> i ported a Xorshift implementation from kilim to project loom using the
>> prototype that was announced in august. the Continuation apis are similar
>> -
>> the only changes are ctu.run() instead of run()
>> and Continuation.yield(SCOPE) instead of kilim.Fiber.yield(), and the code
>> runs and produces the same answers. however, performance with loom is on
>> the order of 500x slower. i tried both f46dc5c01b7d (2% faster) and
>> 544bfe4ccd84
>>
>> kilim: 32.30      nanos/op,           -4971871656801550503
>> kilim: 36.99      nanos/op,            2357119851256129241
>> kilim: 32.60      nanos/op,            8340372355868382387
>>
>> loom : 18303.18   nanos/op,           -4971871656801550503
>> loom : 18107.61   nanos/op,            2357119851256129241
>> loom : 18184.15   nanos/op,            8340372355868382387
>>
>>
>> with -XX:+UnlockExperimentalVMOptions -XX:+UseNewCode
>>
>> loom : 2072.82    nanos/op,           -4971871656801550503
>> loom : 1879.37    nanos/op,            2357119851256129241
>> loom : 1841.58    nanos/op,            8340372355868382387
>>
>>
>> am i using the api correctly ? i do use exceptions in other continuations,
>> so UseNewCode isn't really an option for me
>>
>> i realize that this is an early prototype. do you have any idea what
>> performance you're ultimately shooting for for this sort of loop ?
>>
>> is there a tradeoff between performance in these simple cases and being
>> able to weave "most" production code ?
>>
>>
>>
>> here's the full example:
>>
>>
>> public class Xorshift {
>>     static final ContinuationScope SCOPE = new ContinuationScope() {};
>>     Continuation ctu = new Continuation(SCOPE,this::execute);
>>     long result;
>>
>>     void warmup(long num) {
>>         long warmup = 5000000000L;
>>         final long start = System.nanoTime();
>>         long dummy = 0;
>>         while (System.nanoTime() - start < warmup)
>>             dummy = dummy ^ loop(num);
>>         System.out.println("warmup: " + dummy);
>>     }
>>     void cycle(long num) {
>>         final long start = System.nanoTime();
>>         long val = loop(num);
>>         long duration = System.nanoTime() - start;
>>         System.out.format("loom : %-10.2f nanos/op, %30d\n",
>> 1.0*duration/num, val);
>>     }
>>     public long loop(long num) {
>>         long val = 0;
>>         for (int ii=0; ii < num; ii++) {
>>             ctu.run();
>>             val = val ^ result;
>>         }
>>         return val;
>>     }
>>     public void execute() {
>>         long x, y, s0=103, s1=17;
>>         while (true) {
>>             x = s0;
>>             y = s1;
>>             s0 = y;
>>             x ^= (x << 23);
>>             s1 = x ^ y ^ (x >> 17) ^ (y >> 26);
>>             result = (s1 + y);
>>             Continuation.yield(SCOPE);
>>         }
>>     }
>>
>>     public static void main(String[] args) {
>>
>>         long cycles = 200000;
>>         int reps = 10;
>>         if (args.length == 0) {
>>             System.out.println("args: number of cycles, number of
>> repeats");
>>             System.out.format("\t no args provided using defaults: %d
>> %d\n",cycles,reps);
>>         }
>>         try { cycles = Long.parseLong(args[0]); } catch (Exception ex) {}
>>         try { reps = Integer.parseInt(args[1]); } catch (Exception ex) {}
>>
>>         new Xorshift().warmup(cycles);
>>         Xorshift xor = new Xorshift();
>>
>>         for (int jj=0; jj < reps; jj++)
>>             xor.cycle(cycles);
>>     }
>> }
>>
> --
> ags
>