Mac OS X i486 ABI -- 16-byte stack alignment

Fri Nov 9 11:46:08 PST 2007

On Nov 9, 2007, at 9:18 AM, steve goldman wrote:

> Paul Hohensee wrote:
>> I didn't say it'd be easy. :)
>> btw, Steve of course knows this :) , but sparc doesn't cut back  
>> the stack on calls
>> because of the register save area at the top of the frame, which  
>> latter doesn't
>> exist on x86.
>>
> The sparc version of the c++ interpreter does cutback and deals  
> with the window area, template could too. Although the truth be  
> told back in the last millenium when I did the initial  
> implementation of the c++ interpreter on sparc I had it cut back  
> the stack because I mistakenly thought that the template  
> interpreter did it that way and I wanted it to use similar amounts  
> of stack space. :-)

On sparc the register window save area is completely volatile
from the user point of view, so it's easy to move it:  Just bump
the native %sp.  The place where it *was* may contain old spills
and/or garbage, and the place it *now* points can collect spills
on any interrupt.

To avoid copying incoming arguments, interpreters usually
create their locals arrays (on method entry) by cutting back
the unused part of the caller's JVM stack and also adding
space for non-argument locals.  After that comes any
CPU-required stuff (like caller's register window save area),
and then more JVM state of the callee, including JVM stack,
and finally any additional CPU-required stuff (like callee's
RWSA on sparc).

See the ASCII-grams before generate_method_entry
in interpreter_sparc.cpp.

On sparc, the callee changes the caller's native stack pointer.
There's an odd handshake where the caller saves his
%sp in 'I5_savedSP', and the callee reasserts it on exit.
So the callee can optionally move the caller's RWSA.
Later on, exit paths back to the interpreter move
'savedSP' back to %sp.

The alternatives to this clever reuse of the caller's JVM stack
would be copying incoming arguments, or segmenting
the callee's locals array (requiring an extra range check
for local reference bytecodes).  Yes, interpreter speed
is not our primary goal, but this is probably one of those
high-leverage decisions that makes a significant difference
in start-up performance.

Because sparc has lots of registers, it can afford to have
an unaligned Lesp just for the JVM, while keeping various
strong invariants true on the native %sp.

For 32-bit Intel, because there are not enough registers
to have a separate JVM sp and native esp, I think the
interpreter needs to have an unaligned calling convention,
with alignment fixups (esp invariant reassertion) on calls
to C code. I suppose 64-bit Intel could do it either way.

-- John