Thoughts on unified integer literal improvements

Wed Jul 15 18:46:32 PDT 2009

Catching up on email...

Bruce Chapman wrote:
> Joe Darcy wrote:
>> Hello.
>>
>> On the set of improved integer literal features, I think combining 
>> the underscores as separators and binary literals is straightforward 
>> given separately correct grammars for each change.
>>
>> As an alternate to "y" and "s" suffices, I suggesting considering a 
>> "u" suffix to mean unsigned.  Literals with a trailing "u" would have 
>> type int; widening conversions of such literals would 0 extend and 
>> narrowing conversions would range check on the width of set bits.  
>> For example,
>>
> All,
>
> I have spent some time considering Joe's suggestions.
>
> While I really like the aesthetics of "u" means "unsigned" compared 
> with "y" suffix for "byte", I am also aware of the considerable extra 
> complexity of defining a new primitive type in the JLS. From a first 
> scan I have identified the most obvious change points which are 
> recorded in the google document 
> http://docs.google.com/View?id=dcvp3mkv_112567xb4n

Hi Bruce.

Thanks for starting a more detailed analysis and comparison of the "u" 
and "y" proposals.

Let's see, from the top in 3.10.1 "u" and "U" can be added to the list 
of Integer Type Suffixes with some explanatory text like:
"An integer literal is of type int if it is suffixed with "u" or "U"; 
the trailing "u" or "U" indicates an unsigned conversion process 
occurs.  Unsigned literals are converted as if int were an unsigned 
32-bit 2's complement format and different widening and narrowing 
primitive conversion rules are applied to unsigned literals.  For 
example, 2147483648u (2^31) is equal to -2147483648 and 2147483649u 
(2^31 + 1) is equal to -2147483647."

In terms of conversions, there are 11 categories of conversions and 5 
conversion contexts.

In 5.1.2, there could be three new widening primitive conversions:

    Unsigned int literal to long, float, or double

with appropriate rules to preserve the sign of the result: "A widening 
conversion of an unsigned int literal to long zero-extends the converted 
int value; meaning the low-order 32-bits of the long are equal to the 
int value and the high-order 32-bits of the long are zero.  Widening 
conversion of an unsigned int to float or double acts as if the value 
first went through a widening conversion of unsigned int to long and 
then a widening conversion of long to float or double, respectively."

And in 5.1.3, there could be three new narrowing primitive conversions:

    Usigned int to byte, short, or char

These would actually be the same as the current narrowing conversions on 
int; just grab the low-order n bits.  The real help would come in 
section 5.2, Assignment Conversion, to redefine what "a constant 
expression is representable in the type of the variable" for unsigned 
literals to allow things like "byte b = 0xFFu;"

The text of the method invocation conversion would remain unchanged; 
with the defined widening conversion, given the method declaration

    public void foo(long ell) { System.out.println(ell);}

the call

    foo(0xFFFFFFFFu)

would do the right thing, print out 4294967295.  Narrowing conversions 
do not take place in a method invocation context.  So to call

    public void bar(byte b) {...}

the call

    bar(0xFFu) // no bar(int)

would not work, but

    bar((byte)0xFFu)

would.

Unary numeric promotion (5.6.1) is not affected by unsigned literals 
since only values narrowing than int are promoted.  Given the earlier 
definitions of widening primitive conversion, I don't think any explicit 
changes are needed in binary numeric promotion (5.6.2).

There are some potential puzzlers here.  Since 2's complement divide 
doesn't give the same bit-wise result for divide, unlike for add, 
subtract, and multiply  "unsignedLiteral1 / unsignedLiteral2" may give a 
surprising answer and should be a compiler warning.  Another limitation 
of this proposal is that only unsigned literals are regarded as 
unsigned.  There is no analogy of an "unsigned expression" to parallel a 
"constant expression" (15.28).  That means

long value1 = 0xFFFFFFFFu;
long value2 = 0xFFFFFFFEu + 1u;

will give different results.  It would be possible to define a subset of 
the constant expressions to be unsigned constant expressions (starting 
with unsigned literals, operation on them with unary operations, binary 
+, -, and * (but not / ), shifts, bitwise logical operations, etc.).

Perhaps unsigned constant expression would need to be defined to reduce 
the "gotcha" factor of some of the current results.

-Joe