greetings here...
BGB
cr88192 at hotmail.com
Sun Oct 12 17:12:24 PDT 2008
well, ok, I will say I am new to this list, but am hoping for interesting
conversations.
sorry if in being new here, I am being an ignorant troll.
main reason:
well, mostly I am doing "my own thing", but if possible would like to be
operating within the confines of 'the community'.
I will admit that at this point in time I am not really that much of a Java
developer, so my familiarity with the community or with specific
technologies in this area is limited.
basically, my purpose and efforts (general overview):
I have over the past few years implemented several languages and VMs of
different varieties (typically dynamic).
more recently (ok, over the last 1.5 years or so), I went and wrote a
dynamic C compiler framework (basically, it allows dynamically taking C
source and compiling it to machine code at runtime, making use of a number
of hacks for allowing dynamic relinking and relatively seamless integration
with the host app). C is good, yes, but by itself if does not do
"everything". the sad thing is, although I can dynamically compile, C is not
so ideal of a language for this, and more so there is "the problem of the
headers", namely in that I know of no "good" way to escape header
processing, meaning that typically there is some time overhead WRT
dynamically compiling modules.
another sad point is that C is, as is, largely incapable of proper "eval"
(of the sort typically done in JavaScript and friends). now, yes, one can
compile some functions and then invoke them, but this sucks (and anything
that can eval in some useful way, is technically no longer C...).
it is also the case that at present dynamically-compiled C code is not
garbage collected (ok, my framework has many "unorthodox" features and
compiler extensions, among them, is that I also make use of optional dynamic
typing, and have a conservative concurrent garbage-collector, ...). but,
otherwise, I have a decent chunk of C99 implemented and 'most' C code should
work ok, so it is probably good enough.
also, generating native machine code is not always the optimal solution, as
for many things, bytecode is preferable (I will make the analogy of using a
tank as ones' main vehicle... it is hard-core and powerful, but not so good
for short trips and really tears up ones' lawn...).
recently, I had been looking to "absorb" Java support into my project as
well, and more so, to make use of Java's bytecode format as probably the
primary bytecode format (namely, there are cases when using something
"standard" makes sense, and I don't feel there is need for "yet another
non-standard bytecode format"...). this will possibly allow leveraging some
amount of existing stuff, and make it so that people can more readily make
use of my stuff.
basically, when needed my project could "emulate" a more traditional JVM,
and as much as is reasonable remain compatible with other JVM's.
partly, I am now under the opinion that Java is better for some of my tasks
than C is, but I still like mostly keeping C around (actually C and C++ are
still my primary languages...).
so, unlike C, it is a lot better suited to dynamic loading and modular
systems, and is also a lot easier to verify. unlike JavaScript, performance
and garbage generation issues are likely to be far better (after all, for
most things, the language still is statically typed).
I also intend to tightly integrate it with my existing framework, and
actually use it for many other tasks as well (basically, offloading tasks
not as well suited to native code generation). I may also use it as a target
for JavaScript as well (I have an existing partial JS implementation, but
targeting JBC would probably be a better and more general solution).
a JIT compiler may also be added at some point...
however, I will probably not implement JNI unless I have some good reason
(me considering alternative and more desirable options...).
progress thus far (JBC support):
well, most of the "more general" stuff exists within my framework already.
I have a class/instance system in place, which mostly overlaps with the JVM
in terms of functionality (it is being implemented for this purpose, as
formerly I had been using a prototype-based object system, like in
JavaScript, but this would not be very good for implementing a JVM
performance-wise).
a lot of the core interpreter functionality has been written recently, but
otherwise I have not had much free time for coding as of late...
actually, I came up with the idea of writing a JVM like several weeks ago,
but haven't had much time to do so (actually, thus far it has been less
painful than expected, but other stuff uses up most of my time...).
I have yet to implement a class-loader or similar functionality. I had
decided actually to implement the interpreter first, and then make the
loader target it, rather than implementing the loader first and building an
interpreter around it (this is what the JVM spec makes me think...).
at this point, I have not actually tested any of the interpreter machinery
(things are still very preliminary).
a minor complaint:
the JVM spec (or at least the one I found) makes some things rather annoying
to figure out, such as which opcodes have arguments and what they are, what
exactly each opcode does, ...
an instruction listing similar to that found in the Intel docs (or many
other processor-specific references) would make this less annoying. so,
maybe a slightly more formal structure could help (dunno if anyone here is
in a position to effect this though).
namely: parts of my interpreter I generate with tools, and it is preferable
to go and fill out some tables, rather than have to dig around and figure
things out.
for example, when I wrote an x86/x86-64 assembler before, I just sort of
scanned through the instruction reference and transcribed all the stuff into
my own table formats (and similar would also work with the PPC spec), but
the JVM as-is, requires a bit more digging (a lot of this is maybe more
useful for people targetting the VM, but not as much for writing one).
specific thoughts:
the idea of "invokedynamic" seems interesting.
however, this does not seem to be a complete solution IMO (everything would
be done via method invokation, which in many cases would not be ideal
performance wise, and would limit utilization of things we can know about
dynamic operations).
for my efforts, I had considered the possibility of further extensions, in
particular:
dynamicly typed arithmetic, comparrision, conversion, ... operations;
potentially, a dynamic type system like that in many dynamic languages
(fixnums, flonums, lists, ...);
..
granted, this may not be the reasonable solution 'in general', since it
would imply adding a good deal of functionality to existing VMs.
related to this, had been to also consider adding features to better
facilitate languages like C, such as the ability to make use of "unsafe"
memory access and pointer operations (the purpose would be to allow
"acceptably" targetting a C compiler to it while still retaining many of the
capabilities of C, along with its ability to play along "acceptably" with
its natively-compiled counterpart).
actually, this much (even within my project) would likely require special
permission to use. I have actually for a fairly long time mentally been
considering the idea of a multi-layered security model, where certain access
to certain things is granted to certain code, but thus far don't have any
specific plans (this is especially the case in natively-compiled C, as any
such policy is made technically almost impossible to enforce...).
my idea here would be to find a hopefully unused opcode (or opcodes), and
then use it/them as an extended opcode space.
254 and 255 are possible, but if there is any real possibility of
"cooperation" a different opcode number may make sense (given 254 and 255
are reserved for implementation dependent features), allowing other
implementations to potentially implement some of these. these would then be
features the VM is not required to support (or allow), and so a piece of
code could be safely rejected for using any of this functionality.
for example, the opcodes would be "specified but optional".
maybe a kind of "extended opcode blocks", where opcodes can be added to
specific blocks absent risking clashing with other blocks (basically, some
people can "own" certain areas, and so if they have not specified an opcode,
they can know it is safe to do so).
hypothetically:
240-243 <byte>: a group of 64 blocks of 16-opcodes
244-247 <short>: 1024 blocks of 256 opcodes
certain blocks could be assigned for "development by various implementors",
and others could be regarded as "experimental".
these would differ from the opcodes 254 and 255, in that they could
potentially be shared between implementations, and used as a form of
"de-facto" extension mechanism (one implementation or group can specify
opcodes, and another can safely implement opcodes that another has
specified, and if they want/need to add new opcodes, they can have their own
block).
as such, it would be implicitly assumed that once an opcode is specified, it
can not be readily changed (maybe deprecated or replaced). however,
experimental blocks could be free from this restriction (so, implementation
A could develop features within their implementation, and group B could
later decide to move them to a more permanent home).
or, maybe it is just the case that there is not enough activity to justify
this?...
however, at this point I am mostly just asking for peoples' opinions on all
this...
More information about the mlvm-dev
mailing list