Threaded interpreter in Java

Fri Aug 5 00:36:50 UTC 2016

Hi Krys, 

On August 3, 2016 3:41:46 PM PDT, Krystal Mok <rednaxelafx at gmail.com> wrote:
>(Having missed last night's dinner and all the fun there...)
>
>When Rémi started talking about this interpreter today, I thought it'd
>be
>tailcall related.

I would love that it was tailcall related too. people seams to reduce tailcall to a feature of some obscure languages but in reality each time you have to do some argument transformations/injections/checking in a generic way, it screams for tailcall.

>
>In college, my introductory course to computer systems was based on the
>book "Introduction to Computing Systems: From Bits & Gates to C &
>Beyond",
>where it introduces a small educational 16-bit architecture called
>LC-2. We
>were given a LC-2 simulator to run our "assembly" programs.
>Later on, when I've learned enough Java to be dangerous, I wanted to
>write
>a more powerful simulator (interpreter) of the LC-2 ISA in Java.
>
>My first version was a most straightforward switch-threaded one.
>Nothing
>much to be said about that.
>
>The next version, wanting to mimic a indirect-threaded style, was
>structured in very much the same way Rémi presented in the link, but
>MethodHandle was still years from available so I didn't have the luxury
>of
>using it. Instead, I modeled the opcode handlers as "single-method
>objects", and do a virtual call at the end of each handler.
>(So, instead of Rémi's MH calls, I'm just using a table lookup +
>virtual
>call)
>
>public class NopHandler extends AbstractHandler // or implements
>Handler
>{
>  public void execute(State s) {
>    HANDLERS[s.code[s.pc++]].execute(s); // dispatch
>  }
>}
>
>That's didn't work out quite well. The performance was okay-ish with
>small
>program inputs. But HotSpot's JIT compilers doesn't implement tail call
>optimizations (well, recursive inlining sort of counts...), so with a
>long-enough program the stack blows up. And it didn't take that long of
>a
>LC-2 assembler program to make it blow up.
>

To avoid of blowing the stack, the trick is to stop the "trace" at the end of a basic block, i.e. when you have a jump or a return. It also means that things like null check, things that should not jump, has to be encoded with a different bytecode that conditional jump because you don't want to stop the trace for them.

>I realized that I had to have proper tail calls to make it work. So I
>added
>a central dispatch loop to serve as a trampoline, e.g.
>
>Handler next = getNextHandler();
>while (next != null) {
>  next = next.execute(s);
>}
>
>where my Handler.execute() would still perform the lookup, but not the
>actual invocation. As expected, the performance was no where as good as
>directly making the virtual call at the tail position in each handler.
>This
>central dispatch loop makes the unpredictable-ness of the
>Handler.execute()
>significant.
>
>Need a good trampoline. Need proper tail calls.

YES !

>
>- Kris

Rémi 

>
>On Wed, Aug 3, 2016 at 3:14 PM, John Rose <john.r.rose at oracle.com>
>wrote:
>
>> On Aug 3, 2016, at 1:22 PM, Remi Forax <forax at univ-mlv.fr> wrote:
>> >
>> > Charles ask if we can make a fast register interpreter in Java,
>> > here is my take on a threaded-like interpreter
>> >
>> > https://gist.github.com/forax/f38b533e089217cfc4d0ae3c6e2de9c9
>>
>> Nicely done.  We were talking last night at dinner about making
>> bytecode interpreters go fast, and this is helpful.
>>
>> I think there may be a combinator here, for bytecode dispatch loops.
>> (This feels like the PIC combinator, both fundamental and tricky to
>get
>> right.)
>> A "bytecode" dispatch combinator might provide a pattern like the
>> following:
>>
>> MethodHandle init, pred, step, fini, index;
>> @NonNull MethodHandle[] tab;
>> R dispatch(A… a) {
>>    V v = init(a…);  // one here; there might be many v…
>>    while (pred(a…)) {
>>       v = step(v, a…);
>>       // execute one "instruction":
>>       int i = index(v, a…);
>>       tab[i].invokeExact(v, a…);
>>    }
>>    return fini(V, a…);
>> }
>>
>> The explicit table would be recopied internally to an @Stable array
>for
>> constant folding.
>> The various index values could be tracked and turned into traces (or
>some
>> other profile
>> driven structure) which would be compiled together, in specialized
>> segments of the unrolled
>> interpreter loop.
>>
>> Or maybe just the table lookup part alone makes a good combinator, to
>be
>> paired
>> with the loop combinators?
>>
>> Comments welcome…
>>
>> — John
>>
>> P.S. Another new RFE:
>> experimental hook for creating heisenboxes
>> https://bugs.openjdk.java.net/browse/JDK-8163133
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>mlvm-dev mailing list
>mlvm-dev at openjdk.java.net
>http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.