[aarch64-port-dev ] C2 error on ARM sim only regardingcall to runtime from non-compiled code

Fri Oct 4 09:40:55 PDT 2013

When running on the ARM sim Ed found that routine check_compiled_frame
throws an error claiming that a call out to the runtime is occuring from
a non-compiler generated address. This happens when a stub enters the
runtime from one fo the optoRuntime stubs (see opto/runtime.cpp). These
stubs save only the stack pointer in the thread's frame anchor i.e
_anchor.last_Java_sp==0xfff..... but _anchor.last_Java_fp==0x0 and
_anchor.last_Java_pc=0x0.

Now check_compiled_frame relies on thread->last_frame() to identify the
start of the Java frame chain. It uses frame::frame(long sp, long fp) to
create the frame.

If fp is 0 then the frame code picks up the pc by evaluating sp[-1].
i.e. it assumes that the VM (C) function called from the stub will
create a frame just above (negatively) the saved stack pointer with the
return PC in its first (64-bit) word. This allows the code buffer to be
retrieved and thence the frame size. This always works on x86 since the
hw call pushes the return address.

On our sim this is not true by default -- recall our blrt call is
actually handled a callout from the sim.  When transitioning from
simulator execution to x86 code we push some intermediate data on the
Java thread stack before entering the called routine via a linking
function which also switches stacks. So, I fiddled things by pushing LR
and a magic word (0xdeadbede) onto the Java thread stack before pushing
the transition data and then entering the x86-compiled C function.

Ok, so here's the terrible confession! I just assumed that on the real
hardware when calling the C code directly from the stub the compiled
routine would build a frame with lr and sp as its first two elements.
That's what we do when building a Java frame. Alas, this is not a valid
assumption.

The first callout I encounter when running on the real hardware is to
OptoRuntime::new_array_nozero_C which is called from the C2 generated
new_array_nozero_Java stub. DIsassembly reveals the following

(gdb) x/i OptoRuntime::new_array_nozero_C
   0x7fb79f1f10 <OptoRuntime::new_array_nozero_C(Klass*, int,
JavaThread*)>:  stp      x29, x30, [sp,#-128]!

i.e the return address is at offset 128. This is rather unfortunate
because looking at some of the other stubs we have e.g.

(gdb) x/i OptoRuntime::multianewarray2_C
   0x7fb79f2948 <OptoRuntime::multianewarray2_C(Klass*, int, int,
JavaThread*)>:       stp     x29, x30, [sp,#-80]!
(gdb)  x/i OptoRuntime::multianewarray3_C
   0x7fb79f2d70 <OptoRuntime::multianewarray3_C(Klass*, int, int, int,
JavaThread*)>:  stp     x29, x30, [sp,#-112]!
(gdb)

. . .

In other words the return address is not at a fixed offset from SP. So,
we cannot use this trick to identify the frame's code buffer and hence
it's size.

I think it may be possible to fix this by making the OptoRuntime stubs
write the current PC to the thread anchor. I am not certain why the C2
stubs don't only write SP. The stubs are generated by building the
required ideal graph and it looks to me like ideal code is not able to
express what is needed i.e. loading the current code address in to a
register as a constant. So, this may be why the stub leaves it to the
call routine and the frame code to obtain the callee code address.
Alternatively, it may just have been omitted because loading via sp[-1]
is cheaper and quicker. I'll post when I find out more.

regards,

Andrew Dinn
-----------