PROPOSAL: Unsigned Integer Widening Operator
Bruce Chapman
brucechapman at paradise.net.nz
Wed Mar 25 02:02:06 PDT 2009
Title
Unsigned Integer Widening Operator
latest html version at http://docs.google.com/Doc?id=dcvp3mkv_2k39wt5gf&hl
AUTHOR(S): Bruce Chapman
OVERVIEW
FEATURE SUMMARY:
Add an unsigned widening operator to convert bytes (in particular),
shorts, and chars (for completeness) to int while avoiding sign extension.
MAJOR ADVANTAGE:
Byte manipulation code becomes littered with (b & 0xFF) expressions in
order to reverse the sign extension that occurs when a byte field or
variable or array access appears on either side of an operator and is
thus subject to widening conversion with its implicit sign extension.
This masking with 0xFF can detract from the clarity of the code by
masking the actual algorithm. It is the Java Language's rules and not
the algorithm itself that demand this masking operation which can appear
to be a redundant operation to the uninitiated.
It is highly intentional that the new operator (+) can be read as a cast
to a positive.
MAJOR DISADVANTAGE:
A new operator.
ALTERNATIVES:
explicit masking. If java.nio.ByteBuffer was extensible (it isn't)
unsigned get and set methods could be added to hide the masking in an
API. Extension methods could be employed to that end if they were
implemented.
EXAMPLES
SIMPLE EXAMPLE:
byte[] buffer = ...; int idx=...; int length=...;
int value=0;
for(int i = idx; i < idx + length; i++) {
value = (value << 8) | (buffer[i] & 0xff);
}
can be recoded as
for(int i = idx; i < idx + length; i++) {
value = (value << 8) | (+)buffer[i];
}
ADVANCED EXAMPLE:
private int getBerValueLength(byte[] contents, int idx) {
if((contents[idx] & 0x80) == 0) return contents[idx];
int lenlen = (+)contents[idx] ^ 0x80; // invert high bit which = 1
int result=0;
for(int i = idx+1; i < idx + 1 + lenlen; i++ ) {
result = (result << 8) | (+)contents[i];
}
return result;
}
DETAILS
SPECIFICATION:
amend 15.15
The unary operators include +, -, ++, --, ~, !, unsigned integer
widening operator and cast operators.
add the following to the grammars in 15.15
The following productions from §new section are repeated here for
convenience:
UnsignedWideningExpression:
UnsignedIntegerWideningOperator UnaryExpression
UnsignedIntegerWideningOperator:
( + )
Add a new section to 15 - between "15.15 Unary Expressions" and "15.16
Cast Expressions" would seem ideal in terms of context and precedence level.
The unsigned integer widening operator is a unary operator which may be
applied to expressions of type byte, short and char. It is a compile
time error to apply this operator to other types.
UnsignedWideningExpression:
UnsignedIntegerWideningOperator UnaryExpression
UnsignedIntegerWideningOperator:
( + )
The unsigned integer widening operator converts its operand to type int.
Unary numeric promotion (§) is NOT performed on the operand. For a byte
operand, the lower order 8 bits of the resultant have the same values as
in the operand. For short and char operands, the resultant's lower order
16 bits have the same value as the operand's. The remaining high order
bits are set to zero. This is effectively a zero extend widening
conversion and is equivalent to the following expression for byte operand x,
x & 0xFF
and equivalent to the following for a short or char operand y
y & 0xFFFF
Other sections have lists of operators for which various things apply.
Add to these as appropriate - yet to be determined.
Note the specification above could also be ammended to allow the
operator to zero extend an int to a long, however the utility value of
this is uncertain.
COMPILATION:
Compilation may be equivalent to the masking operation above. Hotspot
could detect the redundant sign extend followed by masking out the sign
extended bits and remove both. If that were the case the operator could
be applied to every access of a byte field, variable or array to
indicate treatment as unsigned byte, with no cost.
For a char, the operator is equivalent to a widening conversion to int.
The new operator is permitted on a char expression because there is no
reason to disallow it. However it would be equally effective if it did
not apply to char.
TESTING:
There are no gnarly use cases, so testing is straight forward. It could
be as simple as compiling and executing a main class with a handful of
asserts.
LIBRARY SUPPORT:
No library support required.
REFLECTIVE APIS:
None
OTHER CHANGES:
none foreseen
MIGRATION:
COMPATIBILITY
BREAKING CHANGES:
None
EXISTING PROGRAMS:
Tools could detect the specific masking operations used to zero extend a
previously sign extended byte or short, and replace that with the new
operator.
REFERENCES
EXISTING BUGS:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4186775
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4879804
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4504839
URL FOR PROTOTYPE (optional):
None
More information about the coin-dev
mailing list