Da Vinci MLVM: Interest for an very old language running on an extended JVM

Thu Apr 24 08:13:47 PDT 2008

 >> I am not requesting new opcodes but just relaxing the array type contraint
 >> of the already existing xALOAD & xASTORE when the VM run a Cobol or C class.
 > It really amounts to the same complexity as a new opcode, since every
 > tool or JVM phase that looks at opcodes may need to know the new usages.

Yes, I understand the point of complexity, but all along the history of the
JVM, everything has been done for improving performance (as JIT and
others techniques)even with a high development cost. So, why not going
in this direction for the most heavily used data type in
Cobol(integer, packed and zoned decimal)!
 >
 >
 > It seems to me just as easy to align source code to a BCI which
 > contains an invokestatic as a BCI which contains an iaload.
Yes, generating an iaload or a invokestatic demands the same effort,
but IMHO, the iaload performance will always beat the invokestatic
performance (specially in long living loops where integer/indexes are
involved)

 >> The problem for packed/zoned data is identical as for integer: just
 >>   inefficiency
 >> and performance.
 >
 > With very few exceptions, the fastest way to get performance on some
 > operation is first to name it (as a method), then have the compiler
 > treat it as an intrinsic.  Here is a list of all the intrinsics in
 > HotSpot (start 50% down around line 416):

I did not know about those intrinsics in HotSpot. That is a great news
for getting performance on unaligned integer and packed/zoned decimal. But this 
is specific to HotSpot if I understand well, and thus, this kind of optimization 
may be not available on other VM like JRockIt or Harmony or more important on 
the IBM VM  running on z/OS. What makes me in favor of the relaxing option is
that, once agreed as a specification, the Cobol class can run on all
VMs without caring about the presence or absence of a specific
optimization. This separation of concern is a real good point from the
perspective of the compiler's builder and for getting an identical
Cobol class behavior among different VMs.

 >
 > And so also do compiled intrinsics provide a way to do these things
 > efficiently.  Think of bytecodes as low-level IR, which is very
 > expensive to extend, and intrinsic method calls as a higher-level
 > extensible part of the IR, which is it easy to extend compatibly.

Ok, I will go in this direction...seems promising for getting
performance on packed/zoned decimal reference and arithmetic
computation too!

 >> More on packed/zoned cobol data in a following mail...
Here is an extract from the IBM Principles of operations of any z/OS
IBM hardware

© copyright IBM, reprinted with permission from the IBM Principles of
Operations Manual

Decimal integers consist of one or more decimal digits and a sign.
Each digit and the sign are represented by a 4-bit code. The decimal
digits are in binary-coded decimal (BCD) form, with the values 0-9
encoded as 0000-1001. The sign is usually represented as 1100 (C hex)
for plus and 1101 (D hex) for minus. These are the preferred sign
codes, which are generated by the machine for the results of
decimal-arithmetic operations. There are also several alternate sign
codes (1010, 1110, and 1111 for plus; 1011 for minus). The alternate
sign codes are accepted by the machine as valid in source operands but
are not generated for results.

Decimal integers may have different lengths, from one to 16 bytes.
There are two decimal formats: packed and zoned. In the packed format,
each byte contains two decimal digits, except for the rightmost byte,
which contains the sign code in the right half. For decimal
arithmetic, the number of decimal digits in the packed format can vary
from one to 31. Because decimal integers must consist of whole bytes
and there must be a sign code on the right, the number of decimal
digits is always odd. If an even number of significant digits is
desired, a leading zero must be inserted on the left.

In the zoned format, each byte consists of a decimal digit on the
right and the zone code 1111 (F hex) on the left, except for the
rightmost byte where the sign code replaces the zone code. Thus, a
decimal integer in the zoned format can have from one to 16 digits.
The zoned format may be used directly for input and output in the
extended binary-coded-decimal interchange code (EBCDIC), except that
the sign must be separated from the rightmost digit and handled as a
separate character. For positive (unsigned) numbers, however, the sign
can simply be represented by the zone code of the rightmost digit
because the zone code is one of the acceptable alternate codes for plus.

In either format, negative decimal integers are represented in true
notation with a separate sign. As for binary integers, the radix point
(decimal point) of decimal integers is considered to be fixed at the
right, and any scaling is done by the programmer.

The following are some examples of decimal integers shown in
hexadecimal notation:

Decimal
Value        Packed Format     Zoned Format

+123         12 3C             F1 F2 C3
               or                or
               12 3F             F1 F2 F3

-4321        04 32 1D          F4 F3 F2 D1

+000050      00 00 05 0C       F0 F0 F0 F0 F5 C0
               or                or
               00 00 05 0F       F0 F0 F0 F0 F5 F0

-7           7D                D7

   00000       00 00 0C          F0 F0 F0 F0 C0
               or                or
               00 00 0F          F0 F0 F0 F0 F0

Here is a good summary for the interest of using zoned decimal to display
a decimal value and converting it from/to a packed one.

http://www.simotime.com/datapk01.htm

More on this and computing overflow/underflow in a next mail

Francis