Some words about Cibyl (MIPS to Java bytecode binary translation)
Simon Kagstrom
simon.kagstrom at gmail.com
Sun Mar 16 11:56:51 PDT 2008
Hello!
I'm the author of Cibyl, which translates MIPS binaries into Java
bytecode. Patrick Wright pointed me to this list and the discussion
about compiling C into Java bytecode (thanks!), so I thought I'd share
some comments about how this is done in Cibyl. Most of it is also
applicable to NestedVM, which does essentially the same thing with a
set of implementation differences. NestedVM also predates Cibyl, so the
origin of the idea should be attributed to them.
Cibyl targets portability of C and C++ applications to J2ME devices, so
it also provides an interface to the MIDP API. The translation is
fairly straight-forward. Cibyl depends on GCC to generate an ELF binary
(with symbol and relocation information intact), and the translation is
done with a 1-1 mapping between C functions (call destinations in the
ELF binary) and static Java methods in a class. Most MIPS instructions
can be translated pretty much 1-1 to Java bytecode.
NestedVM does this a bit different and does not have the 1-1-mapping.
Both methods have benefits and disadvantages. With the NestedVM
approach, it's easier to support e.g., longjmp, while the Cibyl
approach makes the class look more like a "real" Java class for example
in crash dumps or profilers. From benchmarks I've made, the Cibyl
approach also seems easier to achive good performance with, mostly
because it always uses Java local variables for the MIPS register
representation throughout.
So to the interesting part :-). While implementing the translation has
mostly been pretty straight-forward, there are two cases where Java
bytecode poses some problems:
* The 64KB method size limit, which is perhaps the largest issue. If
the bytecode had not had this limitation, the translation would be
done to a single method, which would improve performance and simplify
the implementation quite a bit. Cibyl also allows co-locating
multiple C functions in a single Java method, which can improve
performance quite a bit.
This is of course also a problem with very big C functions. In
practice, it has only been a problem in one application so far (the
fetch-and-decode loop of an emulator). Cibyl currently does not
handle this situation automatically, and I guess this would also be
an issue for a JBC compiler backend.
* Untyped memory, which I also saw you took up. In Cibyl, I've used a
big int-array as the "memory" representation. This fits MIPS quite
well, since unaligned memory access is limited to special
instructions, and most accesses tend to be 32-bit accesses. However,
when 8- or 16-bit loads and stores are done there is a significant
performance hit because of this.
Since Cibyl targets embedded (J2ME) devices, it will just allocate a
fixed amount of memory for the C program at startup (for stack/heap).
NestedVM targets other systems and uses a two-level structure that
allows a sparse memory layout.
Obviously there are also some MIPS instructions which are a bit tricky
to translate, but that's not really the fault of JBC. So if I could
have one wish for Java bytecode, it would be to lift the 64KB method
size limit (I'm pretty sure the NestedVM developers agree with this).
I understand that the type-safety will not be lifted, so I guess that
untyped memory will be a problem for any C backend.
Sorry for the long mail :-). I'll follow Jason's work on a Java GCC
backend, that would be quite nice to have. I guess you are also
familiar with LLVM, which perhaps could be an easier starting point
than plain GCC?
--
// Simon
More information about the mlvm-dev
mailing list