Hello, and other things

Sun Mar 16 08:54:19 PDT 2008

Hi John,

On 3/14/2008 4:47 PM, John Rose wrote:
> The hard part, though, is the essentially untyped nature of C memory.
> I've seen C implementations that run over typed heaps, but they
> are artful compromises, rather than simple ports to a new backend.
> Centerline C and Zeta-C come to mind.  (Both are old projects, that
> may pre-date the Google cache.  I don't have references handy.)
>
>   
It seems to me that the ability of (machineRadix *)pointers to overrun - 
above and below - the arrays they were based on is a feature of C. The 
memory model I'm proposing makes it possible to leverage the existing 
code generation models, and the libraries.
> The latter was a C compiler for the Symbolic Lisp Machine which
> used ordered pairs (cons cells) for all C pointers, to represent the
> combination of a base address and an arbitrary offset.
> A similar product was Bounds-Check C, which widened
> pointers into little 3-tuples (min, max, cur).  The idea is
> that a tuple-based pointer will never be allowed to "reach
> beyond" the heap object it was created for; such operations
> are always indeterminate, since there is no guaranteed
> distance (or ordering) of heap objects, from one instruction
> to the next, in a system like the Symbolics with a powerful GC.
>
>   

While I understand that many C programmers have a secret wish that the 
GC in GCC could stand for Garbage Collection, it doesn't: I think that 
it's OK to avoid the Java GC; philosophically, I regard the ability to 
leave malloced objects on the heap without references to them as a C 
"feature", just like buffer over/underruns.

> That would work very nicely on the JVM also.  You could use
> the sun.misc.Unsafe API (with great care!) to handle punning
> among memory-resident primitive types.  You must avoid
> using Unsafe to pun between primitives and references, because
> there is absolutely no way to control when the GC might want
> to move things around underneath your code.
>
>   

I hadn't come across this before, and it doesn't seem to have any 
documentation! Given your limited description of the features, it sounds 
as though it would be very easy to leave a gap where the compiler could 
be used to break Java protection, which I would not want to do.

>> The key obstacles I see are that the instruction set makes  
>> implementing
>> a C-like stack expensive: there are no neat push and pop operations  
>> for
>> this memory model, it feels like microcoding. Though I understand the
>> motivation, which is to protect the bytecodes from malicious or  
>> lazy use
>> of buffer overflows, and other mechanisms for executing data.
>>     
>
> The stack is really just a shorthand for operand renaming.
> Feel free to generate code to a register-to-register machine,
> and map your virtual registers to JVM locals.
>
>   

Again, I'm inclined to retain the classic stack-based calling pragma in 
the memory model, because it makes it trivial to construct and 
manipulate pointers to C objects allocated in the local frame - they're 
the same as pointers to objects on the heap, because they're in the same 
untyped array - machineRadix[] memory.
>> I like the method handle mechanism, for a variety of reasons, and I
>> would like to see some easing up on where the a stack is located so  
>> that
>> operations which index into the stack are more flexible, and fast. Is
>> this possible?
>>     
>
> If you need a memory-resident stack, you can just build an array
> to hold it, can't you?  I'm not sure where the pain point is here, yet.
>
>   

Stack operations - manipulating and indexing the BP and SP - will be 
frequent multi-bytecode operations. I don't know how well the JIT 
compiler will work out what's going on.

Jason