PROPOSAL: Binary literals (Version 2)

Joseph D. Darcy Joe.Darcy at Sun.COM
Thu May 7 23:39:42 PDT 2009


A few comments:

There are certainly niche applications where binary literals would be 
helpful!

The literals can be negated as usual.

Constraints for the "tiny" versions of Java are not a strong driver for 
JDK 7 language changes.  For example, while JavaCard 2.2.2 did not have 
String or Integer, the JavaCard 3.0 connected profile has both of those 
types, amongst other types used to provide JDK 5 language level support.

-Joe

Derek Foster wrote:
> This is a minor revision of my original proposal. It has been revised to add more examples and discussion of the various alternative attempts to solve the problems that this proposal is intended to address.
>
>
> Add binary literals to Java. (Version 2)
>
> AUTHOR(S): Derek Foster
>
> OVERVIEW
>
> In some programming domains, use of binary numbers (typically as bitmasks, bit-shifts, etc.) is very common. In particular, this type of usage may often be needed in low-level programming such as is typical in severely resource constrained devices such as J2ME, JavaCard, or JavaRing, although it can come up in a variety of larger-scale contexts as well, such as programming to match network packet protocols, compression algorithms, encryption algorithms, and so forth.
>
> However, Java code has traditionally forced programmers to represent numbers in only decimal, octal, or hexadecimal. (In practice, octal is rarely used, and is present mostly for backwards compatibility with C)
>
> When the data being dealt with is fundamentally bit-oriented, however, using hexadecimal to represent ranges of bits requires an extra degree of mental translation for the programmer, and this can often become a source of errors. For instance, if a technical specification lists specific values in binary (for example, in a compression encoding algorithm or in the specifications for a network protocol, or for communicating with a bitmapped hardware device) then a programmer coding to that specification must translate each key from its binary representation into hexadecimal. Checking to see if this translation has been done correctly is usually accomplished by back-translating the numbers. In most cases, programmers do these translations in their heads, and HOPEFULLY get them right. however, errors can easily creep in.
>
> Furthermore, in many cases, the binary representations of numbers makes it much more clear what is actually intended. For instance this:
>
> private static final int BITMASK = 0x1E;
>
> does not immediately make it clear that the bitmask being declared comprises a single contiguous range of four bits.
>
> In many cases, it would be more natural for the programmer to be able to write the numbers in binary in the source code, eliminating the need for manual translation entirely.
>
>
> FEATURE SUMMARY:
>
> In addition to the existing "1" (decimal), "01" (octal) and "0x1" (hexadecimal) form of specifying numeric literals, a new form "0b1" (binary) would be added.
>
> Note that this is the same syntax as has been used as an extension by the GCC C/C++ compilers for many years, and also is used in the Ruby language, as well as in the Python language.
>
>
> MAJOR ADVANTAGE:
>
> It is no longer necessary for programmers to translate binary numbers to and from hexadecimal in order to use them in Java programs.
>
>
> MAJOR BENEFIT:
>
> Code using bitwise operations is more readable and easier to verify against technical specifications that use binary numbers to specify constants.
>
>
> MAJOR DISADVANTAGE:
>
> Someone might incorrectly think that "0b1" represented the same value as hexadecimal number "0xB1". However, note that this problem has existed for octal/decimal for many years (confusion between "050" and "50") and does not seem to be a major issue. Also, it does not seem to be a major concern in the programming languages in which this construct already exists. (see below)
>
>
> ALTERNATIVES:
>
> Users could continue to write the numbers as decimal, octal, or hexadecimal, and would continue to have the problems observed in this document.
>
> Alternately, code could be written to translate at runtime from binary strings, such as:
>
>    int value = Integer.parseInt("00001110", 2) & (Integer.parseInt("00101110", 2) << amount);
>
> There are several problems with this style of coding:
>
> * First of all, the code is significantly more verbose than it would be if it were not written to use library calls. The basic purpose of the operation (number & (number << amount)) is obscured by the clutter associated with calling a library routine. This makes the code difficult to read at a glance.
>
> * There is a significant performance penalty. The call to Integer.parseInt takes significantly more execution time than simply using an integer constant would. This time can be a significant barrier in a resource-constrained device which may have a slow clock speed. Also, if this line of code is occurring in a tight loop (as is common in low-level programming), this extra time may have a dramatic effect on how long it takes to iterate through the loop, since this performance penalty has to be paid each time the line of code is executed, rather than just once, at compile time.
>
> * Calling a method such as Integer.parseInt at runtime will typically make it impossible for the compiler to inline the value of this constant, since its value has been taken from a runtime method call. Inlining is important, because code that does bitwise parsing is often very low-level code in tight loops that must execute quickly. (This is particularly the case for mobile applications and other applications that run on severely resource-constrained environments, where a JIT compiler like HotSpot is not an option.)
>
> * Values which are returned from evaluating methods at runtime cannot be used as selectors in 'switch' statements. A Java compiler won't allow code like this:
>
> switch (foo) {
>     case Binary.parseInt("000"): doThis(); break;
>     case Binary.parseInt("001"): doThat(); break;
>     ...
> }
>
> * Any errors in the string to be parsed (for instance, an extra space, confusion of alphabetic 'O' with numeric '0', etc.) will result in runtime exceptions, rather than compile-time errors as would have occurred in normal parsing. If such a value is declared 'static', this will result in some very ugly exceptions at runtime. Also, the application will then need to be written to handle these errors at runtime, when in principle they could all have been detected at compile time.
>
> * Classes java.lang.Integer and java.lang.String simply do not exist for a number of low-level Java platforms, such as JavaCard and JavaRing, since these platforms are used for low-level number crunching and do not normally have need of such high-level constructs. [Note: It might be possible for a compiler to synthesize these classes (basically, to pretend they existed at compile time but were optimized away at runtime due to method inlining). However, this would require a substantial effort on behalf of the compiler team and essentially the introduction of a new programming paradigm (the "exists only at compile time" class). The issues of code verbosity as well as how to handle possible exceptions that might be thown by these methods at would remain.]
>
> * Similarly, throwing of exceptions is often not supported in resource-constrained environments.
>
> * It is possible to address the code verbosity issue in the above code (by, for instance, creating a library method with a shorter name and taking only one argument), or the performance issue (by creating static named constants with the necessary values), but it is not possible to simultaneously address both issues in Java as it is currently defined.
>
>
> EXAMPLES:
>
> // An 8-bit 'byte' literal.
> byte aByte = (byte)0b00100001;
>
> // A 16-bit 'short' literal.
> short aShort = (short)0b1010000101000101;
>
> // Some 32-bit 'int' literals.
> int anInt1 = 0b10100001010001011010000101000101;
> int anInt2 = 0b101;
> int anInt3 = 0B101; // The B can be upper or lower case as per the x in "0x45".
>
> // A 64-bit 'long' literal. Note the "L" suffix.
> long aLong = 0b1010000101000101101000010100010110100001010001011010000101000101L;
>
> SIMPLE EXAMPLE:
>
> class Foo {
> public static void main(String[] args) {
>  System.out.println("The value 10100001 in decimal is " + 0b10100001);
> }
>
>
> ADVANCED EXAMPLE:
>
> // Boolean constants could be used in code that needs to be
> // easily checkable against a specifications document, such
> // as this simulator for a hypothetical 8-bit microprocessor:
>
> public State decodeInstruction(int instruction, State state) {
>   if ((instruction & 0b11100000) == 0b00000000) {
>     final int register = instruction & 0b00001111;
>     switch (instruction & 0b11110000) {
>       case 0b00000000: return state.nop();
>       case 0b00010000: return state.copyAccumTo(register);
>       case 0b00100000: return state.addToAccum(register);
>       case 0b00110000: return state.subFromAccum(register);
>       case 0b01000000: return state.multiplyAccumBy(register);
>       case 0b01010000: return state.divideAccumBy(register);
>       case 0b01100000: return state.setAccumFrom(register);
>       case 0b01110000: return state.returnFromCall();
>       default: throw new IllegalArgumentException();
>     }
>   } else {
>     final int address = instruction & 0b00011111;
>     switch (instruction & 0b11100000) {
>       case 0b00100000: return state.jumpTo(address);
>       case 0b01000000: return state.jumpIfAccumZeroTo(address);
>       case 0b01000000: return state.jumpIfAccumNonzeroTo(address);
>       case 0b01100000: return state.setAccumFromMemory(address);
>       case 0b10100000: return state.writeAccumToMemory(address);
>       case 0b11000000: return state.callTo(address);
>       default: throw new IllegalArgumentException();
>     }
>   }
> }
>
> // Binary literals can be used to make a bitmap more readable:
>
> public static final short[] HAPPY_FACE = {
>    (short)0b0000011111100000;
>    (short)0b0000100000010000;
>    (short)0b0001000000001000;
>    (short)0b0010000000000100;
>    (short)0b0100000000000010;
>    (short)0b1000011001100001;
>    (short)0b1000011001100001;
>    (short)0b1000000000000001;
>    (short)0b1000000000000001;
>    (short)0b1001000000001001;
>    (short)0b1000100000010001;
>    (short)0b0100011111100010;
>    (short)0b0010000000000100;
>    (short)0b0001000000001000;
>    (short)0b0000100000010000;
>    (short)0b0000011111100000;
> }   
>
> // Binary literals can make relationships
> // among data more apparent than they would
> // be in hex or octal.
> //
> // For instance, what does the following
> // array contain? In hexadecimal, it's hard to tell:
> public static final int[] phases = {
>     0x31, 0x62, 0xC4, 0x89, 0x13, 0x26, 0x4C, 0x98
> }
>
> // In binary, it's obvious that a number is being
> // rotated left one bit at a time.
> public static final int[] phases = {
>     0b00110001,
>     0b01100010,
>     0b11000100,
>     0b10001001,
>     0b00010011,
>     0b00100110,
>     0b01001100,
>     0b10011000,
> }
>
>
> DETAILS
>
> SPECIFICATION:
>
> Section 3.10.1 ("Integer Literals") of the JLS3 should be changed to add the following:
>
> IntegerLiteral:
>         DecimalIntegerLiteral
>         HexIntegerLiteral       
>         OctalIntegerLiteral
>         BinaryIntegerLiteral         // Added
>
> BinaryIntegerLiteral:
>         BinaryNumeral IntegerTypeSuffix_opt
>
> BinaryNumeral:
>         0 b BinaryDigits
>         0 B BinaryDigits
>
> BinaryDigits:
>         BinaryDigit
>         BinaryDigit BinaryDigits
>
> BinaryDigit: one of
>         0 1
>
> COMPILATION:
>
> Binary literals would be compiled to class files in the same fashion as existing decimal, hexadecimal, and octal literals are. No special support or changes to the class file format are needed.
>
> TESTING:
>
> The feature can be tested in the same way as existing decimal, hexadecimal, and octal literals are: Create a bunch of constants in source code, including the maximum and minimum positive and negative values for integer and long types, and verify them at runtime to have the correct values.
>
>
> LIBRARY SUPPORT:
>
> The methods Integer.decode(String) and Long.decode(String) should be modified to parse binary numbers (as specified above) in addition to their existing support for decimal, hexadecimal, and octal numbers.
>
>
> REFLECTIVE APIS:
>
> No updates to the reflection APIs are needed.
>
>
> OTHER CHANGES:
>
> No other changes are needed.
>
>
> MIGRATION:
>
> Individual constants in decimal, hexadecimal, or octal can be updated to binary as a programmer desires.
>
>
> COMPATIBILITY:
>
>
> BREAKING CHANGES:
>
> This feature would not break any existing programs, since the suggested syntax is currently considerd to be a compile-time error.
>
>
> EXISTING PROGRAMS:
>
> Class file format does not change, so existing programs can use class files compiled with the new feature without problems.
>
>
> REFERENCES:
>
> The GCC/G++ compiler, which already supports this syntax (as of version 4.3) as an extension to standard C/C++.
> http://gcc.gnu.org/gcc-4.3/changes.html
>
> The Ruby language, which supports binary literals:
> http://wordaligned.org/articles/binary-literals
>
> The Python language added binary literals in version 2.6:
> http://docs.python.org/dev/whatsnew/2.6.html#pep-3127-integer-literal-support-and-syntax
>
> EXISTING BUGS:
>
> "Language support for literal numbers in binary and other bases"
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5025288
>
> URL FOR PROTOTYPE (optional):
>
> None.
>
>
>   




More information about the coin-dev mailing list