Hello, and other things

Fri Mar 14 16:47:37 PDT 2008

On Feb 29, 2008, at 4:53 PM, Jason Fordham wrote:

> I started thinking about targeting GCC for the JVM last week.

That's a neat project!

I have heard of JVMs being used to simulate very small assembly-level  
systems,
on the order of 16-bit computers.  The challenges with this come from  
building
in a second level of virtualization.  The execution of the simulated  
unsafe
CPU is hard to integrate with the JVM's libraries.

> It quickly became clear that the JVM instruction set is designed to  
> make
> the C programming model difficult: the separation of bytecodes,  
> stacks,
> frames, and object space, and the generally unconvertible addressType
> quickly led me to a model where the JVM stacks are ignored except for
> primitive operations, while memory - for data, bss and heap - is  
> modeled
> in a large array. In order to model C's function calls by pointer, I
> figured a handle pair, class and method, hashing the strings, with a
> linking stage after compilation to perform fixup - much as I imagine
> slide 17 in the LangNet presentation implies.

I agree that method handles will help with this sort of thing.

The hard part, though, is the essentially untyped nature of C memory.
I've seen C implementations that run over typed heaps, but they
are artful compromises, rather than simple ports to a new backend.
Centerline C and Zeta-C come to mind.  (Both are old projects, that
may pre-date the Google cache.  I don't have references handy.)

The latter was a C compiler for the Symbolic Lisp Machine which
used ordered pairs (cons cells) for all C pointers, to represent the
combination of a base address and an arbitrary offset.
A similar product was Bounds-Check C, which widened
pointers into little 3-tuples (min, max, cur).  The idea is
that a tuple-based pointer will never be allowed to "reach
beyond" the heap object it was created for; such operations
are always indeterminate, since there is no guaranteed
distance (or ordering) of heap objects, from one instruction
to the next, in a system like the Symbolics with a powerful GC.

That would work very nicely on the JVM also.  You could use
the sun.misc.Unsafe API (with great care!) to handle punning
among memory-resident primitive types.  You must avoid
using Unsafe to pun between primitives and references, because
there is absolutely no way to control when the GC might want
to move things around underneath your code.

> The key obstacles I see are that the instruction set makes  
> implementing
> a C-like stack expensive: there are no neat push and pop operations  
> for
> this memory model, it feels like microcoding. Though I understand the
> motivation, which is to protect the bytecodes from malicious or  
> lazy use
> of buffer overflows, and other mechanisms for executing data.

The stack is really just a shorthand for operand renaming.
Feel free to generate code to a register-to-register machine,
and map your virtual registers to JVM locals.

> I like the method handle mechanism, for a variety of reasons, and I
> would like to see some easing up on where the a stack is located so  
> that
> operations which index into the stack are more flexible, and fast. Is
> this possible?

If you need a memory-resident stack, you can just build an array
to hold it, can't you?  I'm not sure where the pain point is here, yet.

Best wishes,
-- John