greetings here...

Sun Oct 12 17:12:24 PDT 2008

well, ok, I will say I am new to this list, but am hoping for interesting 
conversations.
sorry if in being new here, I am being an ignorant troll.

main reason:
well, mostly I am doing "my own thing", but if possible would like to be 
operating within the confines of 'the community'.

I will admit that at this point in time I am not really that much of a Java 
developer, so my familiarity with the community or with specific 
technologies in this area is limited.

basically, my purpose and efforts (general overview):

I have over the past few years implemented several languages and VMs of 
different varieties (typically dynamic).

more recently (ok, over the last 1.5 years or so), I went and wrote a 
dynamic C compiler framework (basically, it allows dynamically taking C 
source and compiling it to machine code at runtime, making use of a number 
of hacks for allowing dynamic relinking and relatively seamless integration 
with the host app). C is good, yes, but by itself if does not do 
"everything". the sad thing is, although I can dynamically compile, C is not 
so ideal of a language for this, and more so there is "the problem of the 
headers", namely in that I know of no "good" way to escape header 
processing, meaning that typically there is some time overhead WRT 
dynamically compiling modules.

another sad point is that C is, as is, largely incapable of proper "eval" 
(of the sort typically done in JavaScript and friends). now, yes, one can 
compile some functions and then invoke them, but this sucks (and anything 
that can eval in some useful way, is technically no longer C...).

it is also the case that at present dynamically-compiled C code is not 
garbage collected (ok, my framework has many "unorthodox" features and 
compiler extensions, among them, is that I also make use of optional dynamic 
typing, and have a conservative concurrent garbage-collector, ...). but, 
otherwise, I have a decent chunk of C99 implemented and 'most' C code should 
work ok, so it is probably good enough.

also, generating native machine code is not always the optimal solution, as 
for many things, bytecode is preferable (I will make the analogy of using a 
tank as ones' main vehicle... it is hard-core and powerful, but not so good 
for short trips and really tears up ones' lawn...).

recently, I had been looking to "absorb" Java support into my project as 
well, and more so, to make use of Java's bytecode format as probably the 
primary bytecode format (namely, there are cases when using something 
"standard" makes sense, and I don't feel there is need for "yet another 
non-standard bytecode format"...). this will possibly allow leveraging some 
amount of existing stuff, and make it so that people can more readily make 
use of my stuff.

basically, when needed my project could "emulate" a more traditional JVM, 
and as much as is reasonable remain compatible with other JVM's.

partly, I am now under the opinion that Java is better for some of my tasks 
than C is, but I still like mostly keeping C around (actually C and C++ are 
still my primary languages...).

so, unlike C, it is a lot better suited to dynamic loading and modular 
systems, and is also a lot easier to verify. unlike JavaScript, performance 
and garbage generation issues are likely to be far better (after all, for 
most things, the language still is statically typed).

I also intend to tightly integrate it with my existing framework, and 
actually use it for many other tasks as well (basically, offloading tasks 
not as well suited to native code generation). I may also use it as a target 
for JavaScript as well (I have an existing partial JS implementation, but 
targeting JBC would probably be a better and more general solution).

a JIT compiler may also be added at some point...

however, I will probably not implement JNI unless I have some good reason 
(me considering alternative and more desirable options...).

progress thus far (JBC support):

well, most of the "more general" stuff exists within my framework already.

I have a class/instance system in place, which mostly overlaps with the JVM 
in terms of functionality (it is being implemented for this purpose, as 
formerly I had been using a prototype-based object system, like in 
JavaScript, but this would not be very good for implementing a JVM 
performance-wise).

a lot of the core interpreter functionality has been written recently, but 
otherwise I have not had much free time for coding as of late...

actually, I came up with the idea of writing a JVM like several weeks ago, 
but haven't had much time to do so (actually, thus far it has been less 
painful than expected, but other stuff uses up most of my time...).

I have yet to implement a class-loader or similar functionality. I had 
decided actually to implement the interpreter first, and then make the 
loader target it, rather than implementing the loader first and building an 
interpreter around it (this is what the JVM spec makes me think...).

at this point, I have not actually tested any of the interpreter machinery 
(things are still very preliminary).

a minor complaint:

the JVM spec (or at least the one I found) makes some things rather annoying 
to figure out, such as which opcodes have arguments and what they are, what 
exactly each opcode does, ...

an instruction listing similar to that found in the Intel docs (or many 
other processor-specific references) would make this less annoying. so, 
maybe a slightly more formal structure could help (dunno if anyone here is 
in a position to effect this though).

namely: parts of my interpreter I generate with tools, and it is preferable 
to go and fill out some tables, rather than have to dig around and figure 
things out.

for example, when I wrote an x86/x86-64 assembler before, I just sort of 
scanned through the instruction reference and transcribed all the stuff into 
my own table formats (and similar would also work with the PPC spec), but 
the JVM as-is, requires a bit more digging (a lot of this is maybe more 
useful for people targetting the VM, but not as much for writing one).

specific thoughts:

the idea of "invokedynamic" seems interesting.

however, this does not seem to be a complete solution IMO (everything would 
be done via method invokation, which in many cases would not be ideal 
performance wise, and would limit utilization of things we can know about 
dynamic operations).

for my efforts, I had considered the possibility of further extensions, in 
particular:
dynamicly typed arithmetic, comparrision, conversion, ... operations;
potentially, a dynamic type system like that in many dynamic languages 
(fixnums, flonums, lists, ...);
..

granted, this may not be the reasonable solution 'in general', since it 
would imply adding a good deal of functionality to existing VMs.

related to this, had been to also consider adding features to better 
facilitate languages like C, such as the ability to make use of "unsafe" 
memory access and pointer operations (the purpose would be to allow 
"acceptably" targetting a C compiler to it while still retaining many of the 
capabilities of C, along with its ability to play along "acceptably" with 
its natively-compiled counterpart).

actually, this much (even within my project) would likely require special 
permission to use. I have actually for a fairly long time mentally been 
considering the idea of a multi-layered security model, where certain access 
to certain things is granted to certain code, but thus far don't have any 
specific plans (this is especially the case in natively-compiled C, as any 
such policy is made technically almost impossible to enforce...).

my idea here would be to find a hopefully unused opcode (or opcodes), and 
then use it/them as an extended opcode space.

254 and 255 are possible, but if there is any real possibility of 
"cooperation" a different opcode number may make sense (given 254 and 255 
are reserved for implementation dependent features), allowing other 
implementations to potentially implement some of these. these would then be 
features the VM is not required to support (or allow), and so a piece of 
code could be safely rejected for using any of this functionality.

for example, the opcodes would be "specified but optional".

maybe a kind of "extended opcode blocks", where opcodes can be added to 
specific blocks absent risking clashing with other blocks (basically, some 
people can "own" certain areas, and so if they have not specified an opcode, 
they can know it is safe to do so).

hypothetically:
240-243 <byte>: a group of 64 blocks of 16-opcodes
244-247 <short>: 1024 blocks of 256 opcodes

certain blocks could be assigned for "development by various implementors", 
and others could be regarded as "experimental".

these would differ from the opcodes 254 and 255, in that they could 
potentially be shared between implementations, and used as a form of 
"de-facto" extension mechanism (one implementation or group can specify 
opcodes, and another can safely implement opcodes that another has 
specified, and if they want/need to add new opcodes, they can have their own 
block).

as such, it would be implicitly assumed that once an opcode is specified, it 
can not be readily changed (maybe deprecated or replaced). however, 
experimental blocks could be free from this restriction (so, implementation 
A could develop features within their implementation, and group B could 
later decide to move them to a more permanent home).

or, maybe it is just the case that there is not enough activity to justify 
this?...

however, at this point I am mostly just asking for peoples' opinions on all 
this...