Improving the performance of OpenJDK

Wed Feb 18 02:57:43 PST 2009

Gary Benson wrote:
> Hi Ed,
> 
> I haven't looked into the code particularly -- it's pretty difficult
> to locate your stuff in that massive patch -- but here are my initial
> thoughts.
> 
> Edward Nevill wrote:
>> Splitting the loop like this improves the code generated by gcc in a
>> number of ways. Firstly it improves register allocation because the
>> compiler is not trying to allocate registers across complex
>> code. This code is infrequently executed, but the compiler has no
>> way of knowing, and tends to give the complex code more priority for
>> register allocations (since it is the deepest, most nested piece of
>> code, it must be the most frequently executed, right? Wrong!!!).
> 
> I don't know if this would make a huge difference, but there's a
> conditional, LOTS_OF_REGS, defined in bytecodeInterpreter_zero.hpp,
> that specifies register keywords for several variables in the
> bytecode interpreter's main loop.  It might be worth turning it on
> for ARM and seeing if it has an effect.

I suspect it'd make things worse.  ARM has only 16  registers, and some
of those are fixed by the ABI.

The idea of separating frequently-executed code from stuff that is only
used occasionally is a good one.  Every compiler, and certainly gc,
finds it difficult to do a good job of allocating registers in a large
routine.  It's especially hard for ARM, which is register-starved.

>> get_native_u2() and get_Java_u2() ... This seems to be a misguided
>> attempt of the original authors to optimised reading of halfwords
>> (judging by the comment immediate preceding the code).
> 
> It's not an optimization, it's to do unaligned access on hardware that
> doesn't support it.  I'm guessing ARM does allow unaligned access by
> the fact that your code didn't segfault instantly ;)

ARM doesn't support unaligned loads.  The new ARM code as posted is

	ldrsb	r0, [java_pc, #0]
	ldrb	r1, [java_pc, #1]
	orr	r1, r1, r0, lsl #8

i.e two byte loads.

Andrew.