Da Vinci MLVM: Interest for an very old language running on an extended JVM

Thu Apr 17 10:12:05 PDT 2008

John

> Interesting; I'm curious what are the "pain points", the features of
> Cobol that cause the most difficulty when rendering them as JVM code.
>
The main "pain points" for cobol are the following:

1/ Record definition, allocation and member reference
2/ Packed decimal format and zoned decimal format
3/ Unsigned binary integers
4/ Exception handling on underflow/oveflow for all computational types
5/ Native support of collating sequence other than ASCII (EBCDIC as  
for example)
6/ Specific move and compare semantic with an implicit padding with space.

I will expose you the point 1 first and the others in following mails.

1/ Problem of accessing unaligned data type member in a cobol record
The cobol language provides the capability to define record much alike
the struct contruction of the C language and also something called 'redefines'
that is similar to the union of the C language.

So in Cobol, one could define the user record as
1 user.
   2 name pic x(5).               // his name on 5 bytes
   2 age  pic s9(9) binary        // his age as a 4 bytes integer

Similar a struct in C would be
typedef struct {
     char[5]   name;
     int       age;
} user;

but there is a major difference between the cobol user and he C user  
is that in cobol the age member is not aligned on its natural  
boundary. So in cobol, one have

CobolOffset(user, name) = 0;
CobolOffset(user, age)  = 5;
CobolSizeof(user)       = 9;

while most C compiler would produce (unless under user's specific  
allocation rules as allowed by the MSVC VC compiler)

COffset(user, name) = 0;
COffset(user, age)  = 5;
CSizeof(user)       = 12;

An ideal JVM like program for setting and getting value of the  
user.age in cobol would be

    0:   bipush  9
    2:   newarray byte
    4:   astore_1
    5:   aload_1
    6:   iconst_5
    7:   bipush  32
    9:   iastore
   LocalVariableTable:
    Start  Length  Slot  Name   Signature
    0      11      0    args       [Ljava/lang/String;
    5      6      1    user       [B

But this code is invalid because the iastore instruction does not  
comply with the constraint
  on the target array type that according the spec should be an array of int.

  So currently, for storing of a simple int value in the cobol user.age,
   the generated code is equivalent to (considering big endianess):

  	byte[]	user;
	int value = 32;

	user = new byte[9];
	user[5] = (byte)((value & 0xFF000000) >> 24);
	user[6] = (byte)((value & 0x00FF0000) >> 16);
	user[7] = (byte)((value & 0x0000FF00) >> 8);
	user[8] = (byte)((value & 0x000000FF) >> 0);

which ends to the bytecodes

	0:   bipush  32
	2:   istore_2
	3:   bipush  9
	5:   newarray byte
	7:   astore_1
	8:   aload_1
	9:   iconst_5
	10:  iload_2
	11:  ldc     #16; //int -16777216
	13:  iand
	14:  bipush  24
	16:  ishr
	17:  i2b
	18:  bastore
	19:  aload_1
	20:  bipush  6
	22:  iload_2
	23:  ldc     #17; //int 16711680
	25:  iand
	26:  bipush  16
	28:  ishr
	29:  i2b
	30:  bastore
	31:  aload_1
	32:  bipush  7
	34:  iload_2
	35:  ldc     #18; //int 65280
	37:  iand
	38:  bipush  8
	40:  ishr
	41:  i2b
	42:  bastore
	43:  aload_1
	44:  bipush  8
	46:  iload_2
	47:  sipush  255
	50:  iand
	51:  iconst_0
	52:  ishr
	53:  i2b
	54:  bastore
	55:  return
Start  Length  Slot  Name   Signature
0      56      0    args       [Ljava/lang/String;
8      48      1    user       [B
3      53      2    value       I

So, as you can see, it is quite not performant, takes a lot of bytecode space
  for just a simple assignement. As the loading of a int gives the  
equivalent size of code,  the equivalent cobol code to int = int + 1  
is highly inefficient.

  2/ Proposal
  My proposal is to relax the type array constraint on all xALOAD and  
all xASTORE JVM instructions so that the first list of bytecode as

    0:   bipush  9
    2:   newarray byte
    4:   astore_1
    5:   aload_1
    6:   iconst_5
    7:   bipush  32
    9:   iastore

be valid. Upon executing the xALOAD or xASTORE instruction, the JVM  
should verify that the accessed bytes not be outside the target byte  
array and throw an OutOfArrayMemory exception otherwise.

3/ CobolVirtualExtension and CobolVirtualMachine
One could think this extension in term of extending the current JVM  
and allocating a specific range of major/minor classes for VM  
supporting this extension. A Cobol class would be allowed to execute  
the relaxed xALOAD/xASTORE while a JVM would not.

This insure that the Java security currently in place by the JVM would  
not be threaded  while it would be relaxed for Cobol class only.

Francis

PS: I am preparing a more formal proposal based on the original JVM  
specs seconf edition, but that's the idea!
John Rose <John.Rose at Sun.COM> a écrit :

> On Mar 31, 2008, at 3:55 AM, Francis ANDRE wrote:
>
>> My primary business area is the modernization of legacy application  
>>  running mainly on mainframes and mostly written in Cobol. I  
>> already  developed a working prototype of a native cobol  
>> compiler&run time  that generates standard JVM classes. But due to  
>> the nature of the  Cobol language itself on one side, and due the  
>> specification of the  JVM that is bundle with the Java language on  
>> the other side, there  is a lot of inefficiency/penalties both in  
>> term of runtime design  and performance of execution of the  
>> resulting compiled Cobol  application.
>
> Interesting; I'm curious what are the "pain points", the features of
> Cobol that cause the most difficulty when rendering them as JVM code.
>
>> That is why I am wondering if the Da Vinci Machine could be the   
>> place to extend the JVM to something like a CobolVirtualMachine.   
>> Would you be interested in such extensions? What would be your   
>> position regarding this project?
>
> I am most interested in experimenting with JVM extensions that will
> help a variety of languages.  It seems likely to me that Cobol will not
> have unique difficulties with the JVM, but instead will shed light on
> how to make the JVM into a multi-language substrate.
>
>> I presume you know already the figures about Cobol applications but  
>>  just as a remainder:
>>
>> Arranga (2000) estimates between *18 billion* and *200 billion*   
>> lines of COBOL code are running production applications worldwide
>>
>> IMO, those figures could justify a interest by the JVM community   
>> (or Sun itself?) to get a industrial Cobol environment running as   
>> the Java one. Moreover, it is a real trend that most of responsible  
>>  of large Cobol applications would like/want go to "Java" and   
>> providing a unique VM that could run both a Java class with the   
>> same security as the standard JVM and a Cobol class would IMHO have  
>>  a real appealing in term of business.
>
> That seems likely, for some Cobol users.  Although the same legacy
> constraints that keep some people on Cobol may also prevent them from
> considering JVM technology.  They may have a very low tolerance for
> change and risk.  The question about the giant Cobol installed base is,
> how often do those users change their Cobol implementations (while
> keeping their old sources)?  If their Cobol is not portable to start
> with, it seems a lost cause to convert it onto a new platform.
>
>> May be you are aware of tools that translate directly Cobol to   
>> Java: yes they work... technically but in the reality, it does not   
>> make it because the produced Java code is quite far from the   
>> original code and thus unmaintainable (we are speaking there of   
>> applications between 500 Kilos LOC and 10 Millions LOC or more).
>
> You could compile Cobol directly to bytecodes, and not compromise with
> a Java rendering.  Would the bytecode architecture force distortions on
> the Cobol program structure?  Could you compile a Cobol program into a
> package full of interlinked classes?
>
> For examples of distortions caused by misfit between source language
> and bytecode architecture, which I call "pain points" for language
> implementors, see
> http://openjdk.java.net/projects/mlvm/pdf/LangNet20080128.pdf .
>
> I'm looking forward to hearing more on this subject!
>
> Best wishes,
> -- John