CFV: New Project: Mobile: JDK Ports to Modern Mobile Platforms

Tue Nov 10 20:35:14 UTC 2015

> On Nov 10, 2015, at 5:42 AM, Edward Nevill <edward.nevill at gmail.com> wrote:
> 
> On Thu, 2015-11-05 at 14:50 +0000, Andrew Haley wrote:
>> On 11/05/2015 01:25 PM, Bob Vandette wrote:
>>> This eliminates the possibility of using the Hotspot JIT (Just-in-
>>> time compiler) but since the Hotspot template interpreter is also
>>> dynamically generated, we can’t use that form of the interpreter.
>>> We have enhanced our closed ARM ports to statically generate the
>>> interpreter for iOS but since these sources are not available in the
>>> open source forest, we’ll use Zero initially to provide a working
>>> solution for the Mobile project.  I welcome the maintainers of the
>>> open aarch64 port to enhance that port to enable static code
>>> generation of your interpreter so that we won’t have to use Zero.
>>> The shared code changes required for static code generation has
>>> already been integrated into the JDK9 master sources.
>> 
>> OK, thanks.  It's definitely worth us having a look at that.

This sounds a bit like RewriteByteCodes and RewriteFrequentParis taken to the
extreem.   This would be a useful general Hotspot enhancement if it weren’t for the fact
that interpreter performance matters less and less each day.  With TieredCompilation
and the our eventual AOT implementation, we’ll be spending less time interpreting.

As a performance improvement for iOS it might be worth doing on it’s own but if
we were to implement something like this, why not go straight to Ahead-of-time compilation
that leverages existing JITs.  You have to be careful what you compile but you can get a nice 
performance improvement with very little code generation.   

Bob.

> 
> One possibility would be to use a JIT (C1, C2 or the ARM microJIT) to
> compile to some intermediate representation which would be more amenable
> to interpretation.
> 
> I am thinking of an intermediate representation where each
> opcode/operand pair is 128 bits and the dispatch code is simply.
> 
> ldp Ropcode, Roperand, [Rpc], #16
> br Ropcode
> 
> IE. The opecode is simply the address of a static routine to handle that
> opcode.
> 
> So, for example you code fold the sequence
> 
> iload N
> iload M
> iadd
> istore O
> 
> to a single opcode
> 
> &iadd_three_op <O><M><N>
> 
> where <O><M> and <N> are encoded in the 2nd 64 bit word.
> 
> If you further allowed for some simple register allocation (as per the
> ARM microJIT) with 4 registers assigned to the top 4 stack locations and
> 4 registers assigned to 4 locals (probably, but not necessarily local_0
> to local_3) then the above sequence could be reduced to
> 
> &iadd_O_M_N
> 
> IE there would be a dedicated opcode for oadd O, M, N which simply does
> 
> mov Ropcode, Roperand ; no operand, so operand becomes next opcode
> ldr Roperand, [Rpc], #8
> add RO, RM, RN ; do the op
> br Ropcode
> 
> I think it would be possible to get quite good performance using this
> technique, especially if you could statically compile longer code
> sequences based on profiling information.
> 
> All the best,
> Ed.
> 
>