slow performance of loom continuations
seth lytle
seth.lytle at gmail.com
Fri Sep 7 18:42:41 UTC 2018
> Could I recommend JMH?
this isn't intended as a benchmark (kilim does have some but i haven't
looked at them). but this example has been representative of how i use
kilim (mostly simple state machines). performance of this example in kilim
is roughly 25% slower than "pure" java, and that's what i see in my state
machines too. and that's a price i'm happy to pay in exchange for scalable
imperative code
it seems premature to benchmark project loom at this point - it isn't
feature complete yet (i'm guessing that tail calls will dramatically help
this example). i assume that when it is, a suite of some sort will be
developed. i'm just trying to verify that i'm using the api correctly, and
to understand whether this is a use case that the team considers important
long term
On Fri, Sep 7, 2018 at 3:27 AM, ags <andrzej.grzesik at gmail.com> wrote:
> Could I recommend JMH?
>
> On Fri, 7 Sep 2018 at 05:48, seth lytle <seth.lytle at gmail.com> wrote:
>
>> i ported a Xorshift implementation from kilim to project loom using the
>> prototype that was announced in august. the Continuation apis are similar
>> -
>> the only changes are ctu.run() instead of run()
>> and Continuation.yield(SCOPE) instead of kilim.Fiber.yield(), and the code
>> runs and produces the same answers. however, performance with loom is on
>> the order of 500x slower. i tried both f46dc5c01b7d (2% faster) and
>> 544bfe4ccd84
>>
>> kilim: 32.30 nanos/op, -4971871656801550503
>> kilim: 36.99 nanos/op, 2357119851256129241
>> kilim: 32.60 nanos/op, 8340372355868382387
>>
>> loom : 18303.18 nanos/op, -4971871656801550503
>> loom : 18107.61 nanos/op, 2357119851256129241
>> loom : 18184.15 nanos/op, 8340372355868382387
>>
>>
>> with -XX:+UnlockExperimentalVMOptions -XX:+UseNewCode
>>
>> loom : 2072.82 nanos/op, -4971871656801550503
>> loom : 1879.37 nanos/op, 2357119851256129241
>> loom : 1841.58 nanos/op, 8340372355868382387
>>
>>
>> am i using the api correctly ? i do use exceptions in other continuations,
>> so UseNewCode isn't really an option for me
>>
>> i realize that this is an early prototype. do you have any idea what
>> performance you're ultimately shooting for for this sort of loop ?
>>
>> is there a tradeoff between performance in these simple cases and being
>> able to weave "most" production code ?
>>
>>
>>
>> here's the full example:
>>
>>
>> public class Xorshift {
>> static final ContinuationScope SCOPE = new ContinuationScope() {};
>> Continuation ctu = new Continuation(SCOPE,this::execute);
>> long result;
>>
>> void warmup(long num) {
>> long warmup = 5000000000L;
>> final long start = System.nanoTime();
>> long dummy = 0;
>> while (System.nanoTime() - start < warmup)
>> dummy = dummy ^ loop(num);
>> System.out.println("warmup: " + dummy);
>> }
>> void cycle(long num) {
>> final long start = System.nanoTime();
>> long val = loop(num);
>> long duration = System.nanoTime() - start;
>> System.out.format("loom : %-10.2f nanos/op, %30d\n",
>> 1.0*duration/num, val);
>> }
>> public long loop(long num) {
>> long val = 0;
>> for (int ii=0; ii < num; ii++) {
>> ctu.run();
>> val = val ^ result;
>> }
>> return val;
>> }
>> public void execute() {
>> long x, y, s0=103, s1=17;
>> while (true) {
>> x = s0;
>> y = s1;
>> s0 = y;
>> x ^= (x << 23);
>> s1 = x ^ y ^ (x >> 17) ^ (y >> 26);
>> result = (s1 + y);
>> Continuation.yield(SCOPE);
>> }
>> }
>>
>> public static void main(String[] args) {
>>
>> long cycles = 200000;
>> int reps = 10;
>> if (args.length == 0) {
>> System.out.println("args: number of cycles, number of
>> repeats");
>> System.out.format("\t no args provided using defaults: %d
>> %d\n",cycles,reps);
>> }
>> try { cycles = Long.parseLong(args[0]); } catch (Exception ex) {}
>> try { reps = Integer.parseInt(args[1]); } catch (Exception ex) {}
>>
>> new Xorshift().warmup(cycles);
>> Xorshift xor = new Xorshift();
>>
>> for (int jj=0; jj < reps; jj++)
>> xor.cycle(cycles);
>> }
>> }
>>
> --
> ags
>
More information about the loom-dev
mailing list