Da Vinci MLVM: Interest for an very old language running on an extended JVM
Francis ANDRE
francis.andre at easynet.fr
Thu Apr 17 10:12:05 PDT 2008
John
> Interesting; I'm curious what are the "pain points", the features of
> Cobol that cause the most difficulty when rendering them as JVM code.
>
The main "pain points" for cobol are the following:
1/ Record definition, allocation and member reference
2/ Packed decimal format and zoned decimal format
3/ Unsigned binary integers
4/ Exception handling on underflow/oveflow for all computational types
5/ Native support of collating sequence other than ASCII (EBCDIC as
for example)
6/ Specific move and compare semantic with an implicit padding with space.
I will expose you the point 1 first and the others in following mails.
1/ Problem of accessing unaligned data type member in a cobol record
The cobol language provides the capability to define record much alike
the struct contruction of the C language and also something called 'redefines'
that is similar to the union of the C language.
So in Cobol, one could define the user record as
1 user.
2 name pic x(5). // his name on 5 bytes
2 age pic s9(9) binary // his age as a 4 bytes integer
Similar a struct in C would be
typedef struct {
char[5] name;
int age;
} user;
but there is a major difference between the cobol user and he C user
is that in cobol the age member is not aligned on its natural
boundary. So in cobol, one have
CobolOffset(user, name) = 0;
CobolOffset(user, age) = 5;
CobolSizeof(user) = 9;
while most C compiler would produce (unless under user's specific
allocation rules as allowed by the MSVC VC compiler)
COffset(user, name) = 0;
COffset(user, age) = 5;
CSizeof(user) = 12;
An ideal JVM like program for setting and getting value of the
user.age in cobol would be
0: bipush 9
2: newarray byte
4: astore_1
5: aload_1
6: iconst_5
7: bipush 32
9: iastore
LocalVariableTable:
Start Length Slot Name Signature
0 11 0 args [Ljava/lang/String;
5 6 1 user [B
But this code is invalid because the iastore instruction does not
comply with the constraint
on the target array type that according the spec should be an array of int.
So currently, for storing of a simple int value in the cobol user.age,
the generated code is equivalent to (considering big endianess):
byte[] user;
int value = 32;
user = new byte[9];
user[5] = (byte)((value & 0xFF000000) >> 24);
user[6] = (byte)((value & 0x00FF0000) >> 16);
user[7] = (byte)((value & 0x0000FF00) >> 8);
user[8] = (byte)((value & 0x000000FF) >> 0);
which ends to the bytecodes
0: bipush 32
2: istore_2
3: bipush 9
5: newarray byte
7: astore_1
8: aload_1
9: iconst_5
10: iload_2
11: ldc #16; //int -16777216
13: iand
14: bipush 24
16: ishr
17: i2b
18: bastore
19: aload_1
20: bipush 6
22: iload_2
23: ldc #17; //int 16711680
25: iand
26: bipush 16
28: ishr
29: i2b
30: bastore
31: aload_1
32: bipush 7
34: iload_2
35: ldc #18; //int 65280
37: iand
38: bipush 8
40: ishr
41: i2b
42: bastore
43: aload_1
44: bipush 8
46: iload_2
47: sipush 255
50: iand
51: iconst_0
52: ishr
53: i2b
54: bastore
55: return
Start Length Slot Name Signature
0 56 0 args [Ljava/lang/String;
8 48 1 user [B
3 53 2 value I
So, as you can see, it is quite not performant, takes a lot of bytecode space
for just a simple assignement. As the loading of a int gives the
equivalent size of code, the equivalent cobol code to int = int + 1
is highly inefficient.
2/ Proposal
My proposal is to relax the type array constraint on all xALOAD and
all xASTORE JVM instructions so that the first list of bytecode as
0: bipush 9
2: newarray byte
4: astore_1
5: aload_1
6: iconst_5
7: bipush 32
9: iastore
be valid. Upon executing the xALOAD or xASTORE instruction, the JVM
should verify that the accessed bytes not be outside the target byte
array and throw an OutOfArrayMemory exception otherwise.
3/ CobolVirtualExtension and CobolVirtualMachine
One could think this extension in term of extending the current JVM
and allocating a specific range of major/minor classes for VM
supporting this extension. A Cobol class would be allowed to execute
the relaxed xALOAD/xASTORE while a JVM would not.
This insure that the Java security currently in place by the JVM would
not be threaded while it would be relaxed for Cobol class only.
Francis
PS: I am preparing a more formal proposal based on the original JVM
specs seconf edition, but that's the idea!
John Rose <John.Rose at Sun.COM> a écrit :
> On Mar 31, 2008, at 3:55 AM, Francis ANDRE wrote:
>
>> My primary business area is the modernization of legacy application
>> running mainly on mainframes and mostly written in Cobol. I
>> already developed a working prototype of a native cobol
>> compiler&run time that generates standard JVM classes. But due to
>> the nature of the Cobol language itself on one side, and due the
>> specification of the JVM that is bundle with the Java language on
>> the other side, there is a lot of inefficiency/penalties both in
>> term of runtime design and performance of execution of the
>> resulting compiled Cobol application.
>
> Interesting; I'm curious what are the "pain points", the features of
> Cobol that cause the most difficulty when rendering them as JVM code.
>
>> That is why I am wondering if the Da Vinci Machine could be the
>> place to extend the JVM to something like a CobolVirtualMachine.
>> Would you be interested in such extensions? What would be your
>> position regarding this project?
>
> I am most interested in experimenting with JVM extensions that will
> help a variety of languages. It seems likely to me that Cobol will not
> have unique difficulties with the JVM, but instead will shed light on
> how to make the JVM into a multi-language substrate.
>
>> I presume you know already the figures about Cobol applications but
>> just as a remainder:
>>
>> Arranga (2000) estimates between *18 billion* and *200 billion*
>> lines of COBOL code are running production applications worldwide
>>
>> IMO, those figures could justify a interest by the JVM community
>> (or Sun itself?) to get a industrial Cobol environment running as
>> the Java one. Moreover, it is a real trend that most of responsible
>> of large Cobol applications would like/want go to "Java" and
>> providing a unique VM that could run both a Java class with the
>> same security as the standard JVM and a Cobol class would IMHO have
>> a real appealing in term of business.
>
> That seems likely, for some Cobol users. Although the same legacy
> constraints that keep some people on Cobol may also prevent them from
> considering JVM technology. They may have a very low tolerance for
> change and risk. The question about the giant Cobol installed base is,
> how often do those users change their Cobol implementations (while
> keeping their old sources)? If their Cobol is not portable to start
> with, it seems a lost cause to convert it onto a new platform.
>
>> May be you are aware of tools that translate directly Cobol to
>> Java: yes they work... technically but in the reality, it does not
>> make it because the produced Java code is quite far from the
>> original code and thus unmaintainable (we are speaking there of
>> applications between 500 Kilos LOC and 10 Millions LOC or more).
>
> You could compile Cobol directly to bytecodes, and not compromise with
> a Java rendering. Would the bytecode architecture force distortions on
> the Cobol program structure? Could you compile a Cobol program into a
> package full of interlinked classes?
>
> For examples of distortions caused by misfit between source language
> and bytecode architecture, which I call "pain points" for language
> implementors, see
> http://openjdk.java.net/projects/mlvm/pdf/LangNet20080128.pdf .
>
> I'm looking forward to hearing more on this subject!
>
> Best wishes,
> -- John
More information about the mlvm-dev
mailing list