PROPOSAL: Unsigned Integer Widening Operator

Bruce Chapman brucechapman at paradise.net.nz
Wed Mar 25 02:02:06 PDT 2009


Title
Unsigned Integer Widening Operator

latest html version at http://docs.google.com/Doc?id=dcvp3mkv_2k39wt5gf&hl

AUTHOR(S): Bruce Chapman

OVERVIEW



FEATURE SUMMARY:
Add an unsigned widening operator to convert bytes (in particular), 
shorts, and chars (for completeness) to int while avoiding sign extension.


MAJOR ADVANTAGE:

Byte manipulation code becomes littered with (b & 0xFF) expressions in 
order to reverse the sign extension that occurs when a byte field or 
variable or array access appears on either side of an operator and is 
thus subject to widening conversion with its implicit sign extension. 
This masking with 0xFF can detract from the clarity of the code by 
masking the actual algorithm. It is the Java Language's rules and not 
the algorithm itself that demand this masking operation which can appear 
to be a redundant operation to the uninitiated.


It is highly intentional that the new operator (+) can be read as a cast 
to a positive.



MAJOR DISADVANTAGE:
A new operator.


ALTERNATIVES:

explicit masking. If java.nio.ByteBuffer was extensible (it isn't) 
unsigned get and set methods could be added to hide the masking in an 
API. Extension methods could be employed to that end if they were 
implemented.

EXAMPLES
SIMPLE EXAMPLE:

            byte[] buffer = ...; int idx=...; int length=...;

            int value=0;

            for(int i = idx; i < idx + length; i++) {
                value = (value << 8) | (buffer[i] & 0xff);
            }

can be recoded as


            for(int i = idx; i < idx + length; i++) {
                value = (value << 8) | (+)buffer[i];
            }


ADVANCED EXAMPLE:

    private int getBerValueLength(byte[] contents, int idx) {
        if((contents[idx] & 0x80) == 0) return contents[idx];
        int lenlen = (+)contents[idx] ^ 0x80;  // invert high bit which = 1
        int result=0;
        for(int i = idx+1; i < idx + 1 + lenlen; i++ ) {
            result = (result << 8) | (+)contents[i];
        }
        return result;
    }

DETAILS
SPECIFICATION:

amend  15.15

The unary operators include +, -, ++, --, ~, !, unsigned integer 
widening operator and cast operators.



add the following to the grammars in 15.15



The following productions from §new section are repeated here for 
convenience:


UnsignedWideningExpression:

    UnsignedIntegerWideningOperator UnaryExpression


UnsignedIntegerWideningOperator:

        ( + )



Add a new section to 15 - between "15.15 Unary Expressions" and "15.16 
Cast Expressions" would seem ideal in terms of context and precedence level.


The unsigned integer widening operator is a unary operator which may be 
applied to expressions of type byte, short and char. It is a compile 
time error to apply this operator to other types.


UnsignedWideningExpression:

    UnsignedIntegerWideningOperator UnaryExpression


UnsignedIntegerWideningOperator:

        ( + )


The unsigned integer widening operator converts its operand to type int. 
Unary numeric promotion (§) is NOT performed on the operand. For a byte 
operand, the lower order 8 bits of the resultant have the same values as 
in the operand. For short and char operands, the resultant's lower order 
16 bits have the same value as the operand's. The remaining high order 
bits are set to zero. This is effectively a zero extend widening 
conversion and is equivalent to the following expression for byte operand x,

    x & 0xFF

and equivalent to the following for a short or char operand y

    y & 0xFFFF


Other sections have lists of operators for which various things apply. 
Add to these as appropriate - yet to be determined.

Note the specification above could also be ammended to allow the 
operator to zero extend an int to a long, however the utility value of 
this is uncertain.
COMPILATION:

Compilation may be equivalent to the masking operation above.  Hotspot 
could detect the redundant sign extend followed by masking out the sign 
extended bits and remove both. If that were the case the operator could 
be applied to every access of a byte field, variable or array to 
indicate treatment as unsigned byte, with no cost.


For a char, the operator is equivalent to a widening conversion to int. 
The new operator is permitted on a char expression because there is no 
reason to disallow it. However it would be equally effective if it did 
not apply to char.


TESTING:

There are no gnarly use cases, so testing is straight forward. It could 
be as simple as compiling and executing a main class with a handful of 
asserts.

LIBRARY SUPPORT:

No library support required.

REFLECTIVE APIS:

None

OTHER CHANGES:

none foreseen

MIGRATION:

COMPATIBILITY
BREAKING CHANGES:

None

EXISTING PROGRAMS:

Tools could detect the specific masking operations used to zero extend a 
previously sign extended byte or short, and replace that with the new 
operator.

REFERENCES
EXISTING BUGS:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4186775

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4879804

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4504839


URL FOR PROTOTYPE (optional):
None




More information about the coin-dev mailing list